[Project-ideas] OCR tools for Bengali language to 98%
Sankarshan Mukhopadhyay
sankarshan.mukhopadhyay at gmail.com
Mon Apr 2 21:30:15 PDT 2012
On Tue, Apr 3, 2012 at 9:12 AM, sourav dutta <mailsouravdutta at gmail.com> wrote:
> Sorry for coming in so late, i came to know about Gsoc few days ago.
> I am Undergraduate doing my Btech-Hons for IIIT - Hyderabad. I have worked
> in OCR, Vision,(Sfm) ,Image Processing,Information retreval.
You shouldn't wait for the list admin to approve a non-member post. We
make a point to request everyone to subscribe to the list.
> Here is the basic overview of my idea.
>
> A) preprocessing
> i) Image Acquisition and Binarization - (convert image to gray scale
> and then binarise using Otsu method).
> ii) Noise elimination - This is huge area in itself. There can be a
> lot of noises possible. Background noise can be removed with salt n pepper
> noise and connected
> component analysis.
> iii) Skew detection and correction - First we identified the upper
> envelope and then we applied Radon transform to the upper envelope to get
> the skew angle.
> iV) Line, word and character level segmentation- segment and isolate
> each character.( noise can add to splitting error).
> B) Pattern Classification - pattern matching can itself be done in various
> ways- Template matching, Nural networks, HMMs, SVM. HMMs are known to have
> best accracy for
> char recognition.
> i) For the feature we can use vertically segmented char in DCT domain.
> C) Training for any of the classifiers we need a supervised training with a
> annotated dataset.
> D) Post processing - we can use spelling checker for correcting the
> erroneously recognized words with dictionary llokup.
The above is the basic theory which can be converted to some
implementation. I don't see/read much originality in terms of
addressing the problem. Since the proposal submission dates close by
06Apr2012, would you like to take the above and convert it into a
reasonable proposal ?
--
sankarshan mukhopadhyay
<http://sankarshan.randomink.org/blog/>
More information about the Project-ideas
mailing list