[Project-ideas] follow up discussions - improve accuracy of bengali OCR

Fri Apr 19 04:05:48 PDT 2013

On Thu, Apr 18, 2013 at 6:16 AM, Debajyoti Nag <dave0908 at gmail.com> wrote:
> Having done some more reading[1], I am now positive that the factor of
> domain adaptability (due to poor scan or tattered documents), that I was
> concerned with in my last email, is out of the scope for now, however it my
> be included when trying to make the system more robust.

Is that so? I would have thought that minor improvements in handling
noise due to the factors you mention could be part of the project.

> I can see most of the work has been done with tesserct 2.x , but I would
> like to look into tesseract 3.x, which is reported to have better support
> for connected-script based languages. I am currently trying to fond out more
> details about the implementation of support for hindi [2].

So, I hear too. I'll wait to read your conclusions.

> At this point, I would also like to read about the proposal/work approach
> from last year on the same project. Could you provide me with a copy of the
> same?

I'll introduce you to Atriya (off-list) and, you may work out the bits.

> [1] http://www.cvc.uab.es/icdar2009/papers/3725a671.pdf
> [2] http://research.ijcaonline.org/volume39/number6/pxc3877076.pdf

--
sankarshan mukhopadhyay
<https://twitter.com/#!/sankarshan>