[Project-ideas] GSoC

Sankarshan Mukhopadhyay sankarshan.mukhopadhyay at gmail.com
Tue Mar 20 19:49:39 PDT 2012


On Tue, Mar 20, 2012 at 12:20 PM, akshat kumar singh
<akshatsince1993 at gmail.com> wrote:

> Main Feature:
> 1.Analogue text based resource into Digitally text resource ,where text can
> be represented as searchable item.
> 2."Primary Background dictionary" of words of Bengali language,which helps
> in major improvement in accuracy.
> 3."Secondary Background dictionary" of user defined words.
> 4.Character by Character matching and ultimately word matching.
> 5.Online sync of important patches, constructed by Users.
> 6.Better Algorithm for word matching and some Artificial intelligence.
>
> Implementation detail:
> 1.Better image quality nearly 500 dpi. higher dpi may reduce the processing
> time
> 2.Better Brightness settings nearly 50%
> 3.Older documents should be scanned using RGB mode to maximize OCR accuracy.
> 4.Better use of grayscale.
> 5.software will provide suggestions for unknown word.
> 6.OCR output to be checked using spell check.
> 7.Availability of editing function of the software to the user.

Before we go into the feature sets and implementation detail, I'd also
suggest that you understand the current state of OCR when it comes to
Indic languages and, more importantly, have a fair idea about the
current challenges using the presently maintained libraries.

And thereafter, it would be a good idea to write in more detail about
the features instead of a single line. The above list is a good
beginning, and, has potential, but it pretty much uses a terseness
that makes it impossible to judge what you have studied up in the
meanwhile.


-- 
sankarshan mukhopadhyay
<http://sankarshan.randomink.org/blog/>



More information about the Project-ideas mailing list