[Project-ideas] comments on GSoC

Sankarshan Mukhopadhyay sankarshan.mukhopadhyay at gmail.com
Tue Mar 27 13:17:21 PDT 2012


On Wed, Mar 28, 2012 at 5:38 AM, Sankarshan Mukhopadhyay
<sankarshan.mukhopadhyay at gmail.com> wrote:

> FYI. The Melange system is set up to send notifications over email for
> any changes to a proposal or, comment stream or, even status/score
> change. The organization admins and, the list of mentors receive the
> changelogs.

And, I notice that you have avoided the Melange system to write a blog
post explaining what choices you have made. I see two issues with
that:

- you now have the conversation in two separate threads
- when the mentors score your proposal they will have to jump back and
forth between your blog and your proposal in the system

Is there a specific reason you want to buck the trend and do things this way ?

To address the issues raised in your post - you would have to do
better than use terms like "robust", "most accurate" and put out
numbers instead. If your choice of Tesseract is already accurate, what
is the confidence level ? How does it behave when faced with noise in
the documents that you are going to digitize ? At what amount of noise
does it now break for Latin ? What are the specific differences and
impediments when considering Latin and non Latin languages ? Will you
be able to make it easy for anyone to add a language ? How do you
generate the seed data for the system ? Do you want to use a data set
where you can verify and validate ?

That is a long list of questions. Do put in your comments on the Melange site.


-- 
sankarshan mukhopadhyay
<http://sankarshan.randomink.org/blog/>



More information about the Project-ideas mailing list