<div dir="ltr">Hi<div><br></div><div> I am Ritinkar Pramanik, 1st year CSE student in Techno India College of Technology Rajarhat, Kolkata..</div><div><br></div><div> I do not have much open source experience.This is my first FOSS proposal. I can code in c,c++,python.</div>
<div><br></div><div> I am interested in "A platform to integrate into an OCR workflow pipeline to enable collaborative correction of OCR text" project. As i understand it requires us to create a webapp for users to edit data put out by OCR and then feed the corrections back to the OCR for training .</div>
<div> </div><div> For the OCR i have familiarized myself with tesseract 3.02 ,imagemagick and last years BookWorm project. </div><div><br></div><div> I am familiar with webapp development using Google App Engine in python but since tesseract is native code it will not work on GAE. I am thinking of using Django web framework for the webapp as it can also be coded in python. I do not have much experience with Django but i am getting familiarized with it.I want to generate some data in bengali for both OCR training and the dictionary.the user can correct the OCR'd data.The user corrected data can be used to train the tesseract as well as adding words to the tesseract dictionary.</div>
<div><br></div><div> The BookWorm blog has documented various resources which will be useful for this project. also its data generation algorithm might be useful too.</div><div><br></div><div> Could you please tell me if i missed any features or if i should look into some resources.Also please correct me if my interpretation of the project is wrong. Also could you please clarify what is the intent of the voting system. </div>
<div><br></div><div>Thanks</div><div>Ritinkar Pramanik</div></div>