[Project-ideas] Improving information retrieval methods for OCR data sets consisting of Indic scripts

Jayanta Nath jayantanth at gmail.com
Mon Feb 3 04:27:30 PST 2014


Hi Sankarshan,

Thank you for prompt inisitiative after talking at Kolkata bookfair.
Bengali wikipedia community ( wiki source, bn.wikisource.org),   are ready
to do a nothing except coding to crack this OCR issues. As all you know
that, this will not only help for us, it will be the most awaited wishes
from longtime.

Regards,
Jayanta


On Monday, February 3, 2014, Sankarshan Mukhopadhyay <
sankarshan.mukhopadhyay at gmail.com> wrote:

> Hi Rabindra,
>
> Thank you for writing in.
>
> I am replying as a top-post because I have copied in the mailing list
> we use to discuss project ideas (subscription interface should be
> available from <
> http://lists.ankur.org.in/listinfo.cgi/project-ideas-ankur.org.in>
>
> I have also added Jayanta Nath in the list. I met Jayanta yesterday
> (after a suitably long period of interactions over email) and, we
> ended up chatting about the usual - "how to crack this OCR issue in a
> manner that helps the Bengali Wikipedia community and, especially
> Wikisource"
>
> I am glad to note that you have taken a look at Abhishek's existing
> work. Have you been able to reach out to him and discuss in some level
> of detail the current state of the work? The voting piece is somewhat
> based on the concept that a larger number of users of the system can
> help train the system for higher degree of accuracy.
>
> ankur.org.in will be putting in an application as a mentoring
> organization. However, the acceptance in GSoC2014 is always subject to
> - [1] good set of project ideas; [2] reasonable success from previous
> year etc. So, there is a period of waiting before one gets to know
> about being selected as a mentoring organization and, thereafter
> begins the process of selecting strong applications from students.
>
> I would recommend that you spend this time catching up with Abhishek
> and also Jayanta in order to be able to understand a real-life
> utilization of your project (should ankur.org.in be selected and, you
> are accepted as a student)
>
> /sankarshan
>
> On Mon, Feb 3, 2014 at 12:56 PM, Rabindra Rakshit <rovir2r at gmail.com<javascript:;>>
> wrote:
> > I (Rabindra Rakshit), am interested in applying for GSOC 2014, and would
> > like to know if Ankur India is applying as a mentoring organization this
> > year also.
> >
> > I am currently pursuing my B.tech in Computer Science(CSE) from College
> of
> > Engineering and Management, Kolaghat, and being born a Bengali, would
> love
> > to see my language flourish in the open source community.
> >
> > I am particularly interested in the project about Improving information
> > retrieval methods for OCR data sets consisting of Indic scripts(Info
> > Rescue). I had a look on the work plan of Abhishek Gupta, the final
> voting
> > system in a general(abstract) manner is yet to be implemented.
> >
> > I don't have any exact experience about OCR, but I do have experience of
> > working with Information Retrieval Systems, in fact, right now I am
> working
> > on Consensus Sequence Segmentation, an Unsupervised Text Segmentation
> > algorithm that relies entirely on statistical relationships among
> alphabets
> > in the input sequence to detect location of word boundaries. I have
> attached
> > a document of our work which is still in progress.
> >
> > Link: http://arxiv.org/abs/1308.3839
>
> --
> sankarshan mukhopadhyay
> <https://twitter.com/#!/sankarshan>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20140203/eae7a6a7/attachment.htm>


More information about the Project-ideas mailing list