<div dir="ltr"><div><div><div><div><div><div><div><div>Hi,<br><br></div>Here I would like to answer some of the questions about my final year project.<br><br><span style="color:rgb(153,0,255)">a) You have mentioned in the pdf that its linguistic independent model<br>
using opensmile and praat which is great but how would you think you<br>
can implement the same for an Indic model? What do you think are the<br>
baselines that are used here? or differences/difficulties wrt indic<br>
languages?</span><br><br></div>Here the project is about emotion recognition from Speech. Since I am trying to recognize emotions and not the actual speech, I can achieve Language Independence. Thus I do not consider the linguistic content of the speech i.e. words, etc. Rather we analyze only the speech prosody, i.e the pitch, MFCCs, etc. These are likely to be same irrespective of the language. For instance, to recognize anger emotion, the voice is likely to be high pitched, with greater energy content (i.e the rms_energy of speech), etc. These features will show up in any language. This is what we are targeting in our project. <br>
<br></div>I think it would be relatively easy to implement any language model easily and may require just fine tuning the features we are using in the initial stage. We have implemented the project using SVMs and the baseline model was able to achieve a good accuracy (~80%) though we trained it using a German Database, which is one of the very few freely available databases on emotional speech.<br>
<br><span style="color:rgb(153,0,255)">b) Secondly, the modulations should be captured clear enough to ensure<br>
accurate results I think in case of indic languages.. How do you think<br>
you can ensure reliable accuracy given that indic dialects have their<br>
own style of speech? example: the way one speaks hindi might be<br>
different from the way one speaks kannada</span><br><br></div>Yes, this question would really require that we make our model more generic. So our model is currently limited by the fact that it can recognize only the six basic emotions - So it can recognize Hot Anger - in which the voice is likely to be hot and excited and show similar contours and energy distributions across different languages, but it may not be able to accurately recognize a more subtle emotion like say sarcasm in which we need to be more aware of the regional tone modulations.<br>
<br></div><br></div>Also I have been reading up preparatory material for the project on IR from OCR of Indic Texts and I an finding it really exciting. I will t post about the literature survey I have done so far and some questions that I have. Incidentally I had to present a seminar on IR in our Knowledge Based Management Systems course in College, only this week, so it has been a great help.<br>
<br></div>Cheers till then,<br></div>Madhura Parikh<br><a href="mailto:madhurapaprikh@gmail.com">madhurapaprikh@gmail.com</a><br><a href="https://sites.google.com/site/madhuraparikh/" target="_blank">https://sites.google.com/site/madhuraparikh/</a><br>
<div><div><div><div><div><br></div></div></div></div></div></div>