[Project-ideas] follow up discussions - improve accuracy of bengali OCR

Wed Apr 17 17:46:24 PDT 2013

Hi Sankarshan,

Having done some more reading[1], I am now positive that the factor of
domain adaptability (due to poor scan or tattered documents), that I was
concerned with in my last email, is out of the scope for now, however it my
be included when trying to make the system more robust.

I can see most of the work has been done with tesserct 2.x , but I would
like to look into tesseract 3.x, which is reported to have better support
for connected-script based languages. I am currently trying to fond out
more details about the implementation of support for hindi [2].

At this point, I would also like to read about the proposal/work approach
from last year on the same project. Could you provide me with a copy of the
same?

[1] http://www.cvc.uab.es/icdar2009/papers/3725a671.pdf
[2] http://research.ijcaonline.org/volume39/number6/pxc3877076.pdf

-- 
-Regards,
Debajyoti Nag
http://twitter.com/aramis7d
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20130418/c6afeec2/attachment-0003.htm>