[Project-ideas] Project to Develop a system with multi-lingual capabilities in order to receive answer to user specific queries

Tue Mar 20 00:23:49 PDT 2012

Continued from the discussion on Anubad.

The problem can be further enhanced by including the fact that the
> user querying may be in possession of incorrect knowledge of the
> language. For example, say someone who types in a query in English in
> a syntax that is grammatically wrong. There are other ways to include
> noise in the query string.
>

Thanks, it explains the problem very well. It seems like an expert system
or your virtual assistant which has an input semi-structured information
like Wikipedia.

Regarding, the noise I can feel two types of noise clearly coming in.
1 - Use of shorthands as if the person is chatting. We can use hidden
markov models for this. I have had some experience in doing this (for my
project of analyzing chats to extract out user emotions), and we can have
fairly good results for english. But my interaction with Bengali is limited
to the natural language processing where the languages considered are
Hindi, English and Bengali so I am not aware of kind of shorthands used in
Bengali and how good would hidden markov models work.

2 - As you pointed out, the bad grammar. This should not affect much our
methods unless we are using methods like parsing (which makes the model
specific to a given language)

Now, the content set may not be equivalent across the languages. If it
> were equivalent, the actual implementation of the idea would merely be
> to come up with a taxonomy and thereafter ensure that we have a proper
> mapping across the language content sets. Which is in short your first
> alternative.
>
> Ok. Here I ask for a suggestion. We can have following approaches :
1 - Start working on English because of resource rich nature of the same.
And one we achieve a fairly good accuracy we can move on to Bengali.

2 - We start from Bengali itself from the beginning.

3 - Go for a language independent implementation which should be
considerably harder problem and might lead to lower accuracies as we won't
be able to take advantage of domain specific knowledge of the language.

Which approach would Ankur recommend?

> Note:
> > We can integrate the solution with the existing bot code-base like
> alice, so
> > as to take advantage of the extensive knowledge base created.
>
> Alice and similar bots are based on sequential queries which gradually
> constrain the cluster of probable responses thus ensuring a higher
> confidence in the response set. In the system that I discussed above,
> the ability to do a query-response challenge set would probably be
> absent.
>
>
Got your point. My only intuition that I wanted to convey is that incase
our system is not sufficiently confident of the question/answer it can ask
the user further questions in order to clarify the same.

> You have had some good ideas at this stage, I'd like to see them
> coming. At the same time, keep an eye on the milestone/dates. You'll
> need to set yourself a date by which you have an actual proposal
> ready.
>

Thanks. Unfortunately, I have my exams till 26th. But I would try to stick
with all the deadlines (6th April in this case).

> It would probably be a good idea if you can obtain clarification for
> the question "Can a GSoC project be turned into a research
> publication" from the program administrators. I checked up the FAQ
> before this reply and I don't see any specific mention about it.
>
>
I looked at the GSOC documentation and believe that it would be based on
the discretion of the mentor of the project and Ankur.

Thanks and Regards
Abhishek Gupta
3rd Year, Computer Science, IIT Delhi
http://abhishek.cc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20120320/885f867c/attachment-0002.htm>