[Project-ideas] GSOC project application

Mon Apr 29 00:15:00 PDT 2013

Dear Sir,

I’m a final year student of Electronics and electrical engineering, Birla
Institute of technology and sciences Pilani. I’m interested in following
two projects mentored by Ankur, India.

1.   Speech based query and result retrieval system for Indian languages
 2.   Add a language model for speech recognition software for Bengali
language

I have been working in the field of speech processing from last 2 years
through following study oriented projects.

1.   Real time isolated word recognition and continuous word recognizer for
a vocabulary of 40 words. (Implemented on Dspace processor using Simulink
interface)

2.   Phoneme recognition system and spoken term detection by phonetic
string matching approach using HTK.

3. Continuous speech recognition system trained on Assamese data having a
vocabulary 3000 words – implemented in sphinx 3.

For last two semesters I worked under Dr. Solomon Raju, senior scientist,
CEERI Pilani. In the current semester I have been working ( as a part of my
final year project) in a speech recognition start up – Speechwarenet (TIC,
IIT Guwahati) under Dr. S. R. M. Prasanna, Professor, IIT Guwahati.   I
have also worked on Asterisks and developed a voicemail server to exchange
voicemails between different users using asterisk interface. I could send
the log files (10.falign_ci_hmm.zip)  and output file after running Hvite
decoder in HTK which could only be generated during training in in sphinx 3
and can’t be downloaded from anywhere else if required. As a part of my
final year project I’m working on a project sponsored by VoxEdu, on the
American pronunciation practise. The project would extend upto May 15 and
I’ll be able to commit my full time for your projects after then.

Here’s my interpretation in terms of implementation of speech based query
and result retrieval system.

System would contain 2 different modules

1.   Speech recognition system

2.     User interface using asterisk which could use TTS engine festival
for text to speech conversion

1.       Speech recognition system: To develop a speech recognizer for
recognizing user queries, an acoustic model has to be trained. This demands
large amount of speech data with corresponding transcription and I suppose
the data would be available while project implementation. To train a
language model text data is needed which could be readily available in the
local language in which system is supposed to be implemented.  System could
be trained either using HTK or sphinx, personally I would recommend sphinx
3 which is open source. Sphinx-3 or sphinx-4 decoder could be used for
recognizing the audio file using training model. Again in terms of
performance sphinx 4 is much better recognizer.

2.       User interface using Asterisk: Following tasks could be performed
using Asterisk

1.       Receive the call from user and generate an audio prompt (e.g. what
would you like to ask or Ask your query after the beep sound (play beep
sound)) either using festival or by directly playing the previously
recorded file.

2.       Wait for certain time (10 seconds or so) , receive and store the
input query from user in a wave file.

3.       Pass the wave file to sphinx recognizer and get the result. Search
in the database for result of sphinx recognized query and get the result
from database. Effectively searching the database demands implementation of
a speech tagging algorithm.

4.       Answer to user by playing the result using Festival.
Please send me your feedback so that we could discuss in terms of
implementation of the project.

-- 
Nikhil Bhendawade,
BITS Pilani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20130429/a71a6f67/attachment-0002.htm>