[Project-ideas] FEW QUESTIONS ABOUT GSOC2012 PROJECT PROPOSAL

Gourab Saha gourab.isikolkata at gmail.com
Wed Mar 28 08:54:09 PDT 2012


I am Gourab here.As I am on the process to draft my formal project
proposal, I have few questions regarding some issues.
As previously said I am seriously interested to work on a project on the
field of  "information retrieval" as a part of GSOC2012.
and even continue to work on that field to for my M.Tech thesis.

During the previous week I did a good study on the following project ideas
you have floated on this area.


1. Improving models for Cross Language Text Re-use
2. Develop a system with multi-lingual capabilities in order to receive
answer to user specific queries
3. Improving information retrieval methods for OCR data sets consisting of
indic scripts

 I have talked with my professors  having similar field of research
interest.As per their valuable suggestions on the above mentioned project
ideas I am on my way to draft a
 formal proposal.

As you have raised the concern over the license issue over the
dataset/tools available, I have clarified from my
professor, RISOT data set(RISOT)(On which I am planing to work) is freely
available and not constrained by any license.
The lemur toolkit is complete open source framework for IR software
development. and Trec_eval, a standard tool for
performance evolution is also completely open source.

I am writing my proposals in a brief here . Kindly give suggestions how it
can be further improved.

The key idea is to propose and implement a method to improve the
cross-language information retrieval with a
pair of languages(Bengali/English).We have RISOT data set containing
article from ABP bengali news paper corpus
from 2004-2006 as well as The Telegraph english newspaper corpus. It will
take the query in english and retrieve the
results from the bengali corpus . The above mentioned will go through a
process of implementation translation,transliteration,
blind relevance feedback,query expansion and finally the information
retrieval.I am aiming for a well-accepted  accuracy
measure.

I have few questions other than technical issues of the project.

1. Apart  from  mentors from the organization(http://ankur.org.in/) can I
have a mentor from my institution/foreign university? However They will not
be
    anyway related to GSOC2012.

2. If my proposal is accepted and my research in this summer lead to paper
publication is there any type of constraints/to-dos from
    GSOC or from your organization for publishing a paper?As far I
understood Google doesn't have any problem as long as I release my
    code under open source license.

 Kindly let me know any other issues regarding the proposal(Details will be
included in the final proposal) or any other impediments over
any other related issues . Kindly give your valuable feedback,as I am on my
way to draft my formal proposal.
I am sincerely hoping to work with you in this summer.

If you have any questions or concerns, please feel free to send me an email
gourab.isikolkata at gmail .com or call me at 9051110501.

Thanks

Gourab Saha
M.Tech(Computer Science)
Indian Statistical Institute,Kolkata
gourab.isikolkata at gmail.com
(+91)9051110501
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20120328/1384c287/attachment-0002.htm>


More information about the Project-ideas mailing list