[Project-ideas] Fwd: Implementation plans with weekly deadlines

Sun May 19 03:35:19 PDT 2013

---------- Forwarded message ----------
From: NAGASHREE S.BHAT <nagashreesbhat at gmail.com>
Date: Sun, May 19, 2013 at 4:03 PM
Subject: Re: Implementation plans with weekly deadlines
To: bhavi at ubuntu.com

*Hello Sir ,*

This is reply to the feedback on GSOC Speech Based Query and Result
Retrieval system for Indian Languages

*- *the feedback was  Could you please elaborate on the implementation
plans to have along with the weekly deadlines* *

Since my mother tongue is Kannada, I have been into this language from
birth, but it saddens me as to how this language is not known to many. By
doing this project I want to make everyone aware of Kannada. And also I
have noticed that there is no speech based query (speech recognition)

 Many people require writing in Bengali different things daily. To them it
will be perfect. Just they need to use this keyboard to type in Bengali and
bingo, they get it!!

* *

*Implementation Plans with weekly deadlines*

Since my college will be off from 9th July to 30th July I will be at home
and will be able to give whole time from morning 9.00 to evening 6.00 and
once my college starts I will be able to give time every day from 5.00 to
9.00 in evening and Sundays and other holidays from 9.00 to 5.00 .I feel
this is pretty enough time to give for the project

As of now my exams will be starting from 17th June and ends at 8th July I
won’t be able to concentrate much on the project. But as soon as my exam
completes I will give my 100% to the project.

*Regarding the project:*

I’m doing some impromptu research on this topic of speech based query and
result retrieval

I got to know that there is no proper stt to tts conversion not only for
Kannada but also to many other Indian Languages

I want to create a framework under Linux os so that it takes speech as a
query and processes the speech and gives the result

Kannada is an agglutinative and inflectional language. Hence the query
needs to be processed using morphological analyzer or stemmer to obtain the
base forms of the given query terms. It accepts the given input string and
performs a database lookup operation to check whether the given query is
directly present in the bilingual dictionary. If not present, the query is
undergone transliteration process. The dictionary has to be built from
scratch if no resources is available for this domain

Kannada is a subject-object-verb (SOV) language. In this type, subject,
object and verb appear in that order. English is subject-verb-object (SVO)
language. The system focuses on machine translation technique rather than
word to word translation. The machine translation involves parts of speech
(POS) tagging should be done for all words in the dictionary. A local word
recording is done based on POS tagging to obtain SVO pattern of English
query.

Then, the text retrieval module searches the collection for documents
relevant to the transcription, and outputs a specific number of top ranked
documents according to the degree of relevance, in descending order

*Usage of ontological tree:* Each keyword identified is matched with every
node in ontological tree. The exact location if the keyword in the tree is
identified.

The language which is most appropriate to the keyword is traced with the
help of the attribute set in that node and the language of the document
search is identified.

All the child nodes of the keyword are traversed and their corresponding
entries in the above decided language are noted as related data for
document search.

*      *

*Tentative Timeline/Phases/milestones (in weekly intervals until the end of
GSoC):*

*            *

·    April 22nd –May 22nd (30 days) - Submitting the proposal

·         May 27th-June 17th (22days) - Bonding with the community more and
more. Going through the implementation details and bit of impromptu
research on speech recognition technique, how the microphone, speakers and
soundcard synthesizes the sound and how it produces the sound.

Which tools and technologies to be used either openAL or webAudio or the
native c++ or Java.

·         June 17th – July 8th (21days)- Semester exams but bit of coding

·         July 9th- July 29th(20 days) – Full on coding i.e designing the
input details, drafting some rendering rules algorithms  and coding the
algorithms for implementation

·         July 29th – Submitting for midterm evaluation

·         August 2nd- September 14th (44 days)– Writing and completing the
whole code base as well as the speech synthesis part of the project

·         September 15th – September 23rd (7 days) – Fine tuning the
project and doing the documentation (Pencil’s down date).

·         September 27th – Final  submission of the project to GSOC

During the duration of project development a rough documentation article
may be maintained to keep track of various activities and project work-flow
which can be very useful during final documentation. Coding and debugging
activities can also go on side by side to decrease excessive workload at
the time of Testing and debugging.

The main goal for Mid-Term Assessment would be completion of a working
module capable of handling speech inputs and rendering proper outputs.

* *I hope this is satisfactory

Thank you,

Nagashree S.Bhat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20130519/f7a9fa61/attachment-0002.htm>