[Project-ideas] OCR IR

Alok Kothari kothari.alok at gmail.com
Fri Apr 19 04:33:46 PDT 2013


Thanks for your reply!


> > Background: I graduated from IIT Kharapur in 2009 and have been involved
> in
> > research in IR/NLP and Machine Learning for nearly 2 years.
>
> Would it be possible to provide links to any papers/presentations or,
> code that you have published?
>

Yes definetely. Unfortunately my oldwebsite is down at the organisation i
worked at. It contained more details of the projects.

However here are the links to papers:

https://dl.acm.org/citation.cfm?id=2010069&dl=ACM&coll=DL&CFID=316248085&CFTOKEN=31366376
http://www.aclweb.org/anthology/D11-1073
http://www.icwsm.org/2013/program/accepted-papers/   A recent one
('Detecting Comments on News Articles in Microblogs')



>
> > 1. I was wondering whether I could have a look at or have some
> indication to
> > the quality of files available. This will give me some idea about the
> kinds
> > of error
>
> The project idea requires the interested candidate to propose within
> the scope of the project the kind of errors the initial
> iteration/release will handle.
>

I would be happy to propose some methods to tackle errors. I was wondering
whether I could have a look at the digitized text corpora itself. for e.g.
I know there can be a wrongly recognized characters, spelling mistakes and
such. However I thought I would get a better idea about other errors if I
saw some of the documents for which such search would be built.  Do you
think this is possible?


>
> > 3. Does the IR system have to be implemented on top of Lucene (or other
> open
> > source software) or can be completely stand alone.
>
> I was hoping that we would be able to utilize ElasticSearch or,
> similar. Lucene is an option too.
>
>
I will look at ElasticSearch.

Thanks again!

Alok
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ankur.org.in/pipermail/project-ideas-ankur.org.in/attachments/20130419/9455a386/attachment-0003.htm>


More information about the Project-ideas mailing list