[Anubad] Project to Develop a system with multi-lingual capabilities in order to receive answer to user specific queries

Sankarshan Mukhopadhyay sankarshan.mukhopadhyay at gmail.com
Mon Mar 19 04:22:51 PDT 2012


Hi,

On Mon, Mar 19, 2012 at 11:47 AM, Abhishek Gupta
<abhishekgupta.iitd at gmail.com> wrote:

> Thanks for the reply. In my view, possible input sets are as follows :

[snipping out the alternatives which you have proposed]

For a moment consider a scenario as follows. An organization has
content in English and multiple Indian languages. For the sake of
argument, let us assume that the set of content is in the form of a
structured set ie. it follows a known pattern. This could be a FAQ
kind of pattern or, a typical article kind of pattern wherein there
are sections and subsections. The objective of the effort is to
present the person querying the content store with the appropriate
response in the selected language or, the nearest possible response
with the aim of removing ambiguity.

The problem can be further enhanced by including the fact that the
user querying may be in possession of incorrect knowledge of the
language. For example, say someone who types in a query in English in
a syntax that is grammatically wrong. There are other ways to include
noise in the query string.

Now, the content set may not be equivalent across the languages. If it
were equivalent, the actual implementation of the idea would merely be
to come up with a taxonomy and thereafter ensure that we have a proper
mapping across the language content sets. Which is in short your first
alternative.

> Note:
> We can integrate the solution with the existing bot code-base like alice, so
> as to take advantage of the extensive knowledge base created.

Alice and similar bots are based on sequential queries which gradually
constrain the cluster of probable responses thus ensuring a higher
confidence in the response set. In the system that I discussed above,
the ability to do a query-response challenge set would probably be
absent.

> The project can be done in two phases. In first phase we can focus on more
> objective questions, while on other phase subjective questions can be
> stressed upon.

You have had some good ideas at this stage, I'd like to see them
coming. At the same time, keep an eye on the milestone/dates. You'll
need to set yourself a date by which you have an actual proposal
ready.

> Please add if I am missing something or wrong at some place. There are some
> really good publications with respect to the same specially in context of
> SIRI. Finally, I wanted to ask about the possibility of a
> research publication if we are able to achieve some noteworthy results,
> which I am quite confident of achieving.

It would probably be a good idea if you can obtain clarification for
the question "Can a GSoC project be turned into a research
publication" from the program administrators. I checked up the FAQ
before this reply and I don't see any specific mention about it.


-- 
sankarshan mukhopadhyay
<http://sankarshan.randomink.org/blog/>



More information about the Anubad mailing list