[Project-ideas] Improving information retrieval methods for OCR data sets consisting of Indic scripts

Sun Feb 2 23:30:45 PST 2014

I (Rabindra Rakshit), am interested in applying for GSOC 2014, and would
like to know if Ankur India is applying as a mentoring organization this
year also.

I am currently pursuing my B.tech in Computer Science(CSE) from College of
Engineering and Management, Kolaghat, and being born a Bengali, would love
to see my language flourish in the open source community.

I am particularly interested in the project about Improving information
retrieval methods for OCR data sets consisting of Indic scripts(Info
Rescue). I had a look on the work plan of Abhishek Gupta, the final voting
system in a general(abstract) manner is yet to be implemented.

I don't have any exact experience about OCR, but I do have experience of
working with Information Retrieval Systems, in fact, right now I am working
on Consensus Sequence Segmentation, an Unsupervised Text Segmentation
algorithm that relies entirely on statistical relationships among alphabets
in the input sequence to detect location of word boundaries. I have
attached a document of our work which is still in progress.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20140203/71608936/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1308.3839v1.pdf
Type: application/pdf
Size: 399088 bytes
Desc: not available
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20140203/71608936/attachment-0002.pdf>