[Project-ideas] Reg- Inquiring More Details for the Project [Add language grammar rules to a machine translation system]

Thu Apr 18 03:56:40 PDT 2013

Hi Sanskarshan,

>I will take a look at this over the weekend (I am between cities and
>it is a bit difficult for me now). Is the code for the implementation
>available as well?

Sure not a problem, we can discuss more during the weekend. Sampark is a
government funded project and the code for the implementation is not
available as per I now we can look into details for same.

>The initial idea was to check if a system like Moses (Statistical MT)
>could be enhanced to be able to handle translation of content at
>scale. I'd look forward to what you think is possible.

We can start by looking how Moses performs and do the error analysis and
make improvisation over same using the necessary methods . What data are we
using for learning can you provide more details about the corpus that we
have in terms of number of sentences.

 I was also thinking it would be a good idea to first do a ground research
about other English-Bengali systems and use the knowledge from same.

Two important systems which I found are as follows-
1) http://tdil-dc.in/components/com_mtsystem/CommonUI/homeMT.php this is a
government project and it's more on hybrid mechanism kind of a pipeline
architecture, we can discuss the details as per the need I know the
architecture and other detailed information about same.

2)Anubadok- (http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl)
it seems this is an open source project and it's using some of the
resources been build by Ankur organization the English-Bengali dictionary (
http://www.bengalinux.org/cgi-bin/abhidhan/statistics.pl) so if you have
some more details about same then it will be great. I downloaded the
Anubadok system and is trying to have some hand-on experience on same and
look into the source code.

Apart from this there is also an apertium project (
http://wiki.apertium.org/wiki/Apertium-bn-en) for English-Bengali language
pair which has some of the tools and resources available.

I have few queries-
What are we aiming by this project as far as I see there can be 3 different
aspects-
1) We want to begin from scratch and use statistical mt and see how it
works for English-Bengali language pair and over this statistical approach
use other knowledge to learn rules and make a translation model / prototype.

2) Search and based on the available other models and resources such as
chunker, pos tagger which are openly available make a model combining the
available resources and build a MT system.

3) Take some of the exiting system and improve over same using statistical
approaches.

Sorry for a big mail but wanted to cover all details.

Looking forward to hear from you.

Regards
Piyush

On Wed, Apr 17, 2013 at 9:47 PM, Sankarshan Mukhopadhyay <
sankarshan.mukhopadhyay at gmail.com> wrote:

> On Wed, Apr 17, 2013 at 6:04 PM, piyush arora <piyusharora07 at gmail.com>
> wrote:
> > I have done some projects on natural language processing, machine
> > translations and information retrieval. I have worked on the Machine
> > Translation project
> http://sampark.iiit.ac.in/sampark/web/index.php/content
> > where the aim is to build MT system for 18 indian language pairs.
>
> I will take a look at this over the weekend (I am between cities and
> it is a bit difficult for me now). Is the code for the implementation
> available as well?
>
> > I worked on the similar lines of tranfer-grammar rules. I have a bit of
> > experience with transfer rules for Hindi, Telugu and a bit of Bengali
> > Language.
> >
> > So would be great if can get more information about the project and other
> > details.
>
> The initial idea was to check if a system like Moses (Statistical MT)
> could be enhanced to be able to handle translation of content at
> scale. I'd look forward to what you think is possible.
>
>
>
> --
> sankarshan mukhopadhyay
> <https://twitter.com/#!/sankarshan>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20130418/ec3442a2/attachment-0003.htm>