[Project-ideas] About my speech recognition project

Madhura Parikh madhuraparikh at gmail.com
Fri Apr 12 02:51:04 PDT 2013


Hi,

Here  I would like to answer some of the questions about my final year
project.

a) You have mentioned in the pdf that its linguistic independent model
using opensmile and praat which is great but how would you think you
can implement the same for an Indic model? What do you think are the
baselines that are used here? or differences/difficulties wrt indic
languages?

Here the project is about emotion recognition from Speech. Since I am
trying to recognize emotions and not the actual speech, I can achieve
Language Independence. Thus I do not consider the linguistic content of the
speech i.e. words, etc. Rather we analyze only the speech prosody, i.e the
pitch, MFCCs, etc. These are likely to be same irrespective of the
language. For instance, to recognize anger emotion, the voice is likely to
be high pitched, with greater energy content (i.e the rms_energy of
speech), etc. These features will show up in any language. This is what we
are targeting in our project.

I think it would be relatively easy to implement any language model easily
and may require just fine tuning the features we are using in the initial
stage. We have implemented the project using SVMs and the baseline model
was able to achieve a good accuracy (~80%) though we trained it using a
German Database, which is one of the very few freely available databases on
emotional speech.

b) Secondly, the modulations should be captured clear enough to ensure
accurate results I think in case of indic languages.. How do you think
you can ensure reliable accuracy given that indic dialects have their
own style of speech? example: the way one speaks hindi might be
different from the way one speaks kannada

Yes, this question would really require that we make our model more
generic. So our model is currently limited by the fact that it can
recognize only the six basic emotions - So it can recognize Hot Anger - in
which the voice is likely to be hot and excited and show similar contours
and energy distributions across different languages, but it may not be able
to accurately recognize a more subtle emotion like say sarcasm in which we
need to be more aware of the regional tone modulations.


Also I have been reading up preparatory material for the project on IR from
OCR of Indic Texts and I an finding it really exciting. I will t post about
the literature survey I have done so far and some questions that I have.
Incidentally I had to present a seminar on IR in our Knowledge Based
Management Systems course in College, only this week, so it has been a
great help.

Cheers till then,
Madhura Parikh
madhurapaprikh at gmail.com
https://sites.google.com/site/madhuraparikh/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20130412/f11ec596/attachment-0002.htm>


More information about the Project-ideas mailing list