[Project-ideas] gsoc 2012 project idea---A Glossary Tool to index all available localizable strings for Bengali across various FOSS projects and, other available corpus

Subhobrata Dey sbcd90 at gmail.com
Fri Mar 23 02:20:21 PDT 2012


hello ma'am,
         thanks for replying.
Well,i now understand the aim of this project hopefully & proposing another
solution for this project.
Since i'm now talking about the glossary project i'm trying to explain it
using a virtual translator.
Here is what i propose:
Every verb say come,go etc. can be varied using two possible ways.One is by
using tense.
Here is how:
Each verb has a different representation in each tense & they varies in
both english & bengali.
say we take the verb 'come'
Simple Present:'come' 'এস'
Present Continuous:'is coming' 'আসছে'
Present Perfect:'has come' 'এসেছে'
Simple Past:'came' 'এসেছিল'
Past Continuous: 'was coming' 'আসছিল'
Past Perfect: 'had come' 'এসেছিল'
Simple Future: 'will come' 'আসবে'
& similarly for other versions of future tense.
Second is by nouns/pronouns.
Here is how:
I came.
আমি এসেছিলাম
You came.
তুমি এসেছিলে
He/They/any other name came.
সে/তারা/রাম এসেছিল
In this way,there are these many (or may be some more) variants of the verb
'come'.
So, if the virtual translator can identify the subject of the verb & the
tense,then it can use the glossary to correctly identify the variant of the
verb for the given sentence & translate it.
The glossary will become self-learned now as well.Whenever it encounters an
unknown word,it'll first detect if it is a verb/noun.If verb,then create
the variances accordingly & store them.If noun,then just find the bengali
term for that & store them.
Seems like this will more or less work!!By the way,api s for the glossary
will be similar to the one for lokalize glossary & anything extra can be
added.
Hope you like this one....
Thanking you,
Subhobrata
On Fri, Mar 23, 2012 at 10:11 AM, Runa Bhattacharjee <runabh at gmail.com>wrote:

> Hi Subhobrata,
>
> First up, please try and avoid breaking the thread for an email
> conversation. It
> gets difficult to keep track of the responses. Also, if you don't see me
> on IRC
> (especially late nights) then please do feel free to drop me an email. :)
>
> Secondly, I suspect you are trying to propose an automatic machine
> translation
> solution. Which is not the aim for this particular project. This project
> aims at
> providing one of the essential aids for localization i.e. glossaries via
> tools/methods that can integrate and extend translation data into suitable
> presentation formats and also be scalable for extension of features.
>
>
> regards
> Runa
>
> (intentional top-post)
>
> On বৃহস্পতিবার 22 মার্চ 2012 07:44  , Subhobrata Dey wrote:
>
>> Hello ma'am, thank you for your reply. Well,i propose a formal solution
>> for
>> this project based on your reply & studying the translators like
>> lokalize,virtaal etc. First of all,i went through the lokalize
>> documentation
>>  & the source code(since i love kde) from which i mainly got the idea for
>> this project. While lokalize uses paragraph-by-paragraph translation
>> technique, i think using a sentence-by-sentence translation technique
>> followed by using a word-by-word translation is far more efficient. The
>> idea
>> mainly came after reading this article: link <http://Quick%20View> Here
>> i'm
>> elaborating the idea: We can classify english-bengali translation into the
>> following ways: 1>Assertive sentence I will go to bazaar. আমি বাজারে যাব
>> Here the predicate goes to the end & the subject after it comes forward.
>> If I
>> were a bird! যদি আমি পাখি হতাম these are same class of sentences....
>> 2>Interrogative sentence Are you reading a book? তুমি কি বঽ পডছ? Here
>> predicate & subjects are swapped 3>Ordering Give me the book! বঽটা আমাকে
>> দাও
>> There are a few common exceptions like, for 'the' ' টা' can be used.for
>> 'please' 'দয়া করে' can be used. But majority of the translations must fall
>> into these categories. So,my idea is for each sentence in english we first
>> need to decide what kind of a sentence it is.Then apply word-by-word
>> translation on it to convert it to bengali & take the help of glossary to
>> substitute english subjects & predicates with their bengali
>> version(However
>> a few may not have their bengali translations,appropriate measures need to
>> be taken for that.Haven't thought about that as yet :P).This will
>> obviously
>> not give the correct result.So,from the info regarding the kind of
>> sentence
>> we can swap the bengali words to get the right result!!Thus
>> sentence-by-sentence of a paragraph is translated. Apart from this,i also
>> saw
>> that there are shortcuts for words & phrases in lokalize while giving
>> input
>> which i think should be implemented for the sake of users. In this way, a
>> successful glossary cum translator can be created. Finally,there can be
>> mistakes & because of that i'm posting this in the mailing list.
>>
>> Btw,I'm not posting the section "about me" as of now.Please mention if you
>> need that or any other info from me.....i'll be extremely happy to provide
>> them to you.
>>
>> Thanking you, -- subhobrata
>>
>>
>>
>>
>>
>>
>>
>> ______________________________**_________________ Project-ideas mailing
>> list
>> Project-ideas at lists.ankur.org.**in <Project-ideas at lists.ankur.org.in>
>> http://lists.ankur.org.in/**listinfo.cgi/project-ideas-**ankur.org.in<http://lists.ankur.org.in/listinfo.cgi/project-ideas-ankur.org.in>
>>
>
>
> --
> blog: http://arrbee.wordpress.com
> irc: arrbee or runa_b on Freenode
> http://fedoraproject.org/wiki/**User:Runab<http://fedoraproject.org/wiki/User:Runab>
> ______________________________**_________________
> Project-ideas mailing list
> Project-ideas at lists.ankur.org.**in <Project-ideas at lists.ankur.org.in>
> http://lists.ankur.org.in/**listinfo.cgi/project-ideas-**ankur.org.in<http://lists.ankur.org.in/listinfo.cgi/project-ideas-ankur.org.in>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20120323/859b04d7/attachment-0003.htm>


More information about the Project-ideas mailing list