<div dir="ltr"><p>Dear Sir, </p>
<p>I’m a final year student of Electronics and electrical engineering,
Birla Institute of technology and sciences Pilani. I’m interested in following
two projects mentored by Ankur, India. </p>
<p style="line-height:normal"><a name="13e547f4b85247cf_speech-based-query-and-result-retrieval-"></a><span><span>1.<span style="font:7pt "Times New Roman""> </span></span></span><span> Speech based query and result retrieval system for Indian languages</span></p>
<span style="font-size:11pt;font-family:"Calibri","sans-serif";font-weight:normal"><span> 2.<span style="font:7pt "Times New Roman""> </span></span></span><span style="font-size:11pt;font-family:"Calibri","sans-serif";font-weight:normal">Add a language model for speech
recognition software for Bengali language</span>
<p>I have been working in the field of speech processing from
last 2 years through following study oriented projects.</p><p><span><span>1.<span style="font:7pt "Times New Roman""> </span></span></span>Real time isolated word recognition and
continuous word recognizer for a vocabulary of 40 words. (Implemented on Dspace
processor using Simulink interface) </p>
<p><span><span>2.<span style="font:7pt "Times New Roman""> <font> </font>
</span></span></span>Phoneme recognition system and spoken term
detection by phonetic string matching approach using HTK.</p>
<p><span><span>3.<span style="font:7pt "Times New Roman""> </span></span></span>Continuous speech recognition system trained on
Assamese data having a vocabulary 3000 words – implemented in sphinx 3.</p><p>For last two semesters I worked under Dr. Solomon Raju,
senior scientist, CEERI Pilani. In the current semester I have been working (
as a part of my final year project) in a speech recognition start up – Speechwarenet
(TIC, IIT Guwahati) under Dr. S. R. M. Prasanna, Professor, IIT Guwahati.<span> </span>I have also worked on Asterisks and
developed a voicemail server to exchange voicemails between different users
using asterisk interface. I could send the log files (10.falign_ci_hmm.zip) and output file after running Hvite decoder in HTK which could only be
generated during training in in sphinx 3 and can’t be downloaded from anywhere
else if required. As a part of my final year project I’m working on a project sponsored
by VoxEdu, on the American pronunciation practise. The project would extend
upto May 15 and I’ll be able to commit my full time for your projects after
then. </p><p>Here’s my interpretation in terms of implementation of
speech based query and result retrieval system. </p>
<p>System would contain 2 different modules </p>
<p><span><span>1.<span style="font:7pt "Times New Roman""> </span></span></span>Speech recognition system </p>
<p><span><span>2.<span style="font:7pt "Times New Roman""> </span></span></span>User interface using asterisk which could use
TTS engine festival for text to speech conversion </p>
<p> </p><span><span>1.<span style="font:7pt "Times New Roman"">
</span></span></span>Speech recognition system: To develop a speech
recognizer for recognizing user queries, an acoustic model has to be trained.
This demands large amount of speech data with corresponding transcription and I
suppose the data would be available while project implementation. To train a
language model text data is needed which could be readily available in the
local language in which system is supposed to be implemented.<span> </span>System could be trained either using HTK or
sphinx, personally I would recommend sphinx 3 which is open source. Sphinx-3 or
sphinx-4 decoder could be used for recognizing the audio file using training
model. Again in terms of performance sphinx 4 is much better recognizer.<span><span><br><br>2.<span style="font:7pt "Times New Roman"">
</span></span></span>User interface using Asterisk: Following tasks
could be performed using Asterisk
<p style="margin-left:72pt"><span><span>1.<span style="font:7pt "Times New Roman"">
</span></span></span>Receive the call from user and generate an audio
prompt (e.g. what would you like to ask or Ask your query after the beep sound
(play beep sound)) either using festival or by directly playing the previously
recorded file. </p>
<p style="margin-left:72pt"><span><span>2.<span style="font:7pt "Times New Roman"">
</span></span></span>Wait for certain time (10 seconds or so) ,
receive and store the input query from user in a wave file. </p>
<p style="margin-left:72pt"><span><span>3.<span style="font:7pt "Times New Roman"">
</span></span></span>Pass the wave file to sphinx recognizer and get
the result. Search in the database for result of sphinx recognized query and
get the result from database. Effectively searching the database demands
implementation of a speech tagging algorithm.</p>
<p style="margin-left:72pt"><span><span>4.<span style="font:7pt "Times New Roman"">
</span></span></span>Answer to user by playing the result using
Festival.</p>Please send me your feedback so that we could discuss in terms of implementation of the project.<br><br>-- <br>Nikhil Bhendawade,<br>BITS Pilani<br>
</div>