[Project-ideas] GSOC 2013 Introduction

Thu Apr 11 19:50:30 PDT 2013

Hi Atanu,

On Thu, Apr 11, 2013 at 6:04 PM, Atanu Ghosh <atanu1991 at gmail.com> wrote:

> I have done a preliminary survey on the topic and have come up with a few
> points.

Neat. Thank you.

> As per the description of the project idea "Develop a language model for
> speech processing by extending a freely available corpus" I have come up
> with:
>
> We can go with CMUSphinx to build the language model for Bengali.This can be
> done as shown in Reference [1].
> Now one point is that CMUSphinx has laready been tried.To do something new
> we can use Julius as I dont think it has been tried with Bengali.It will be
> definitely something new.
>
> Next the problem is gathering data to train our system.I have found out to 2
> approaches to get data.One is to use the data available on the shruthi
> Bangla ASR site [2] or we can use the algorithm in this paper [3] to
> generate phonemes consonants etc.
>
> Third the actual STT can be done as mentioned in Reference [1] with the
> guidance of paper in Reference [4].Methods to reduce the noise and hence
> improve accuracy can be thought of (I havent research on it still).
>
> Also I was curious whether we can make a TTS system.I was looking up at
> Dhvani [5].They say the Bengali module needs a lot improvements [6].Using
> the large data we have if we train Dhvani to improve and recognize digits
> even a good TTS system can be obtained.
>
> Finally, a very complete and concise documentation with all source code,
> method of implementation can be released for STT and TTS or both, which can
> be used by others to develop a language model for any Indic script.The
> proof=of-concept as said, will be done in Bengali and demonstrated.
>
> Thank you for your patience to go through this rather long mail.Please
> suggest any new ideas/concepts wherein I can improve upon what I wrote in
> this mail and come with a basic draft of the final objective.

You make valid points.

With regards to the creation of the training corpus, I am not sure
about the license of the dataset for Shruti - is that free/libre?

Do you feel that you are at a stage where you can begin to take a stab
at creating a very first iteration of a proposal? If yes, please do
so. I would prefer that you share the link to the Google Doc with me
(off list) so that I can share it with the other mentors for them to
provide inputs.

I am not familiar with the modifications required in Dhvani to make it
work as a TTS, I am aware that Dhvani may be a good choice. Perhaps
Bhavani could provide an opinion.

--
sankarshan mukhopadhyay
<https://twitter.com/#!/sankarshan>