static void_f_do_barnacle_install_properties(GObjectClass
*gobject_class){
GParamSpec *pspec;
/* Party code attribute */ pspec = g_param_spec_uint64
(F_DO_BARNACLE_CODE, "Barnacle code.", "Barnacle code",
0, G_MAXUINT64,
G_MAXUINT64 /* default value */,
G_PARAM_READABLE | G_PARAM_WRITABLE |
G_PARAM_PRIVATE);
g_object_class_install_property (gobject_class,
F_DO_BARNACLE_PROP_CODE,
Joaquim [email protected]
OCRFeeder
Converting printed documents into digital formats
Berlin, May 2011
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
What is it?
Document Analysis and Optical Character Recognition
for GNOME
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Why?
Paper has a number of problems
No applications for GNU/Linux to do a fair job
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Paper problems:Security
CC Photo by: http://www.flickr.com/photos/badwsky/
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Paper problems:Preservation
CC Photo by: http://www.flickr.com/photos/98469445@N00/
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Paper problems:Data processing
CC Photo by: http://www.flickr.com/photos/hugovk/
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Paper problems:Ecology
CC Photo by: http://www.flickr.com/photos/pranavsingh/
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
No fair conversion apps for GNU/Linux
apart from OCR engines, but...
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
OCR != Document Conversion
(it only deals with chars)(does not consider the layout)(does not distinguish contents)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
What's needed is
Document Analysis and Recognition
(conversion of documents to an electronic format)
(first projects in the 80s)
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Where are were we at?
* Some closed solutions* Only for proprietary systems
* Various prices* still... arguable results
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
How
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
So many layouts...
CC Photo by: http://www.flickr.com/photos/uber-tuber/
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Layouts vary with the type of document
What works on detecting one, won't work on others
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
OCRFeeder focuses on contents, not on layouts!
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Key concept:
If a document image can be divided in windows of 1 (content)
or 0 (not content), then it is possible to group all the
1s and outline the contents
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Recognition:
System-wide OCR engines are used
Engines are configured from the GUI or XML files
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Most known free OCR engines are detected and configured
automatically:
* Tesseract* GOCR
* OCRAD* Cuneiform
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Exportation formats:
ODTHTML
Plain text
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
User interaction:
Users can edit everythingand review the algorithm's results
So, UI can work in attended and unattended ways
CLI only works in an unattended mode
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Demo time!
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Other features:
* PDF importation* Unpaper preprocessor
* Font style edition* Image deskewing
* OCR results cleaning* Project saving/loading
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
A11y:
* OCRFeeder is a very useful tool for visually impaired users
* Last year, the main target of its development was to improve a11y
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Future:
* Integrate Ocropus as an alternative analysis backend
* More exportation formats: HOCR, PDF, etc.
* Make OCR engines' management easier
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Webpage:http://live.gnome.org/OCRFeeder
git:http://git.gnome.org/ocrfeeder
Bugzilla:http://bugzilla.gnome.orgproduct: OCRFeeder
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Manual in German:
http://wiki.ubuntuusers.de/OCRFeeder
Joaquim Rocha (Igalia) · OCRFeeder · LinuxTag 2011
Thank you!
Top Related