<div dir="ltr"><p>Dear Sir, </p>

<p>I’m a final year student of Electronics and electrical engineering,

Birla Institute of technology and sciences Pilani. I’m interested in following

two projects mentored by Ankur, India. </p>

<p style="line-height:normal"><a name="13e547f4b85247cf_speech-based-query-and-result-retrieval-"></a><span><span>1.<span style="font:7pt "Times New Roman"">  </span></span></span><span> Speech based query and result retrieval system for Indian languages</span></p>

<span style="font-size:11pt;font-family:"Calibri","sans-serif";font-weight:normal"><span> 2.<span style="font:7pt "Times New Roman"">   </span></span></span><span style="font-size:11pt;font-family:"Calibri","sans-serif";font-weight:normal">Add a language model for speech

recognition software for Bengali language</span>

<p>I have been working in the field of speech processing from

last 2 years through following study oriented projects.</p><p><span><span>1.<span style="font:7pt "Times New Roman"">   </span></span></span>Real time isolated word recognition and

continuous word recognizer for a vocabulary of 40 words. (Implemented on Dspace

processor using Simulink interface) </p>

<p><span><span>2.<span style="font:7pt "Times New Roman""> <font> </font>

</span></span></span>Phoneme recognition system and spoken term

detection by phonetic string matching approach using HTK.</p>

<p><span><span>3.<span style="font:7pt "Times New Roman""> </span></span></span>Continuous speech recognition system trained on

Assamese data having a vocabulary 3000 words – implemented in sphinx 3.</p><p>For last two semesters I worked under Dr. Solomon Raju,

senior scientist, CEERI Pilani. In the current semester I have been working (

as a part of my final year project) in a speech recognition start up – Speechwarenet

(TIC, IIT Guwahati) under Dr. S. R. M. Prasanna, Professor, IIT Guwahati.<span>   </span>I have also worked on Asterisks and

developed a voicemail server to exchange voicemails between different users

using asterisk interface. I could send the log files (10.falign_ci_hmm.zip)  and output file after running Hvite decoder in HTK which could only be

generated during training in in sphinx 3 and can’t be downloaded from anywhere

else if required. As a part of my final year project I’m working on a project sponsored

by VoxEdu, on the American pronunciation practise. The project would extend

upto May 15 and I’ll be able to commit my full time for your projects after

then. </p><p>Here’s my interpretation in terms of implementation of

speech based query and result retrieval system. </p>

<p>System would contain 2 different modules </p>

<p><span><span>1.<span style="font:7pt "Times New Roman"">   </span></span></span>Speech recognition system </p>

<p><span><span>2.<span style="font:7pt "Times New Roman"">     </span></span></span>User interface using asterisk which could use

TTS engine festival for text to speech conversion </p>

<p> </p><span><span>1.<span style="font:7pt "Times New Roman"">      

</span></span></span>Speech recognition system: To develop a speech

recognizer for recognizing user queries, an acoustic model has to be trained.

This demands large amount of speech data with corresponding transcription and I

suppose the data would be available while project implementation. To train a

language model text data is needed which could be readily available in the

local language in which system is supposed to be implemented.<span>  </span>System could be trained either using HTK or

sphinx, personally I would recommend sphinx 3 which is open source. Sphinx-3 or

sphinx-4 decoder could be used for recognizing the audio file using training

model. Again in terms of performance sphinx 4 is much better recognizer.<span><span><br><br>2.<span style="font:7pt "Times New Roman"">      

</span></span></span>User interface using Asterisk: Following tasks

could be performed using Asterisk 

<p style="margin-left:72pt"><span><span>1.<span style="font:7pt "Times New Roman"">      

</span></span></span>Receive the call from user and generate an audio

prompt (e.g. what would you like to ask or Ask your query after the beep sound

(play beep sound)) either using festival or by directly playing the previously

recorded file. </p>

<p style="margin-left:72pt"><span><span>2.<span style="font:7pt "Times New Roman"">      

</span></span></span>Wait for certain time (10 seconds or so) ,

receive and store the input query from user in a wave file. </p>

<p style="margin-left:72pt"><span><span>3.<span style="font:7pt "Times New Roman"">      

</span></span></span>Pass the wave file to sphinx recognizer and get

the result. Search in the database for result of sphinx recognized query and

get the result from database. Effectively searching the database demands

implementation of a speech tagging algorithm.</p>

<p style="margin-left:72pt"><span><span>4.<span style="font:7pt "Times New Roman"">      

</span></span></span>Answer to user by playing the result using

Festival.</p>Please send me your feedback so that we could discuss in terms of implementation of the project.<br><br>-- <br>Nikhil Bhendawade,<br>BITS Pilani<br>

</div>