[Project-ideas] GSoC 2014

ritinkar pramanik ritinkarpramanik at gmail.com
Mon Feb 24 10:22:06 PST 2014


Hi

     I am Ritinkar Pramanik, 1st year CSE student in Techno India College
of Technology Rajarhat, Kolkata..

     I do not have much open source experience.This is my first FOSS
proposal. I can code in c,c++,python.

     I am interested in "A platform to integrate into an OCR workflow
pipeline to enable collaborative correction of OCR text" project. As i
understand it requires us to create a webapp for users to edit data put out
by OCR and then feed the corrections back to the OCR for training .

     For the OCR i have familiarized myself with tesseract 3.02
,imagemagick and last years BookWorm project.

     I am familiar with webapp development using Google App Engine in
python but since tesseract is native code it will not work on GAE. I am
thinking of using Django web framework for the webapp as it can also be
coded in python. I do not have much experience with Django but i am getting
familiarized with it.I want to generate some data in bengali for both OCR
training and the dictionary.the user can correct the OCR'd data.The user
corrected data can be used to train the tesseract as well as adding words
to the tesseract dictionary.

     The BookWorm blog has documented various resources which will be
useful for this project. also its data generation algorithm might be useful
too.

     Could you please tell me if i missed any features or if i should look
into some resources.Also please correct me if my interpretation of the
project is wrong. Also could you please clarify what is the intent of the
voting system.

Thanks
Ritinkar Pramanik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20140224/391558b2/attachment-0003.htm>


More information about the Project-ideas mailing list