Advanced OCRAdvanced OCRwith OmniPage and with OmniPage and FineReaderFineReader
OverviewOverview
Optical character recognitionOptical character recognition Structural recognitionStructural recognition OptionsOptions LoadingLoading ZoningZoning OCROCR EditingEditing
Optical Character Optical Character Recognition (OCR)Recognition (OCR) OCR turns pictures of text into e-OCR turns pictures of text into e-
texttext Does well unless…Does well unless…
– The picture is fuzzyThe picture is fuzzy– The contrast is poorThe contrast is poor– The font is unusualThe font is unusual– The font is too small or too largeThe font is too small or too large– The material has unusual charactersThe material has unusual characters
Structural RecognitionStructural Recognition
Analyzes the layout of the pageAnalyzes the layout of the page– ColumnsColumns– HeadingsHeadings– GraphicsGraphics– TablesTables
Usually does fairly well, unless Usually does fairly well, unless the layout is non-standardthe layout is non-standard
Programs that Run Programs that Run OCROCR Programs for consumersPrograms for consumers
– Kurzweil 1000, 3000Kurzweil 1000, 3000– OpenBookOpenBook– Intel ReaderIntel Reader– Many others…Many others…
Programs for productionPrograms for production– ABBYY FineReaderABBYY FineReader– Nuance OmniPageNuance OmniPage
Consumer ProgramsConsumer Programs
Highly automatedHighly automated Designed for individuals who have Designed for individuals who have
print disabilitiesprint disabilities Are not good production toolsAre not good production tools
– Do not provide flexibilityDo not provide flexibility– Do not allow much overridingDo not allow much overriding– Interfaces not designed for editingInterfaces not designed for editing
Production Programs Production Programs in Generalin General A good program for production A good program for production
allows you to…allows you to…– Control the zones (areas or blocks of Control the zones (areas or blocks of
text and graphics)text and graphics) Add, delete, changeAdd, delete, change
– Edit easilyEdit easily– Improve recognitionImprove recognition
Preferred ProgramsPreferred Programs
ABBYY FineReaderABBYY FineReader– Relatively easy to learnRelatively easy to learn– Fairly intuitiveFairly intuitive– Good structural recognitionGood structural recognition
Nuance OmniPageNuance OmniPage– Less intuitive but more accessibleLess intuitive but more accessible– Often does better with technical Often does better with technical
materialsmaterials
Both Good ToolsBoth Good Tools
If you can afford to have both, it’s If you can afford to have both, it’s nice, but not absolutely nice, but not absolutely necessary.necessary.
If you have both, run a couple If you have both, run a couple test pages through each to see test pages through each to see which is doing better on a which is doing better on a particular job.particular job.
Under the HoodUnder the Hood
For best results with a program, For best results with a program, set up your options before you set up your options before you begin!begin!
Tools > OptionsTools > Options
Lots of LanguagesLots of Languages
FineReader and OmniPage handle FineReader and OmniPage handle multiple languages.multiple languages.
For foreign language, turn on all For foreign language, turn on all the languages in the book.the languages in the book.– It will recognize the diacritical It will recognize the diacritical
marks.marks.– Turn on what you need, but only Turn on what you need, but only
what you need.what you need.
MathMath
If you are running OCR on math, If you are running OCR on math, try turning on Greek.try turning on Greek.– Greek will allow the program to Greek will allow the program to
recognize alphas, deltas, sigmas, recognize alphas, deltas, sigmas, etc.etc.
Another DecisionAnother Decision
Detect page orientation or not?Detect page orientation or not?– Does not always get it rightDoes not always get it right– Try it if you have many pages turnedTry it if you have many pages turned
ConsiderationsConsiderations
You may or may not want to keep You may or may not want to keep headers and footers.headers and footers.– I generally keep them to pull the I generally keep them to pull the
page numbers.page numbers. You may want to keep the page You may want to keep the page
breaks.breaks.– Retaining page breaks helps to Retaining page breaks helps to
maintain one-to-one page maintain one-to-one page correspondence with the book.correspondence with the book.
Fitting Everything Fitting Everything
In some cases, you may need to In some cases, you may need to work with a custom paper size to work with a custom paper size to fit everything onto one page.fit everything onto one page.
This feature can be helpful when This feature can be helpful when you are retaining everything on you are retaining everything on the page but not the layout.the page but not the layout.
Loading FilesLoading Files
““Open”Open”– Opens saved program filesOpens saved program files
““Load”Load”– Loads image files to processLoads image files to process
Note that this same issue comes Note that this same issue comes up with saving!up with saving!
Wizards Are Evil…Wizards Are Evil…
Do not rely on the automationDo not rely on the automation
Load the image file and choose Load the image file and choose the processes you wantthe processes you want
WorkspaceWorkspace
The program has three primary The program has three primary areasareas
Pages PanePages Pane– Either thumbnails or detailsEither thumbnails or details– Allows simple navigation of pagesAllows simple navigation of pages
Image PaneImage Pane– Your graphicYour graphic
Text PaneText Pane– Area where the text from OCR will showArea where the text from OCR will show
More AccessibleMore Accessible
Both programs have a detail view.Both programs have a detail view.– Shows text instead of graphicsShows text instead of graphics
Detail view is more accessible for Detail view is more accessible for screen readers.screen readers.
Otherwise, it is personal Otherwise, it is personal preference.preference.
Two Ways to SaveTwo Ways to Save
To Save the program file to To Save the program file to access later in the OCR program, access later in the OCR program, choose File > Savechoose File > Save– This saves your work file.This saves your work file.
You save your converted file You save your converted file during the last phase of the during the last phase of the processing.processing.
Production TipsProduction Tips
Work with dual monitorsWork with dual monitors– Check your computer and video cardCheck your computer and video card
Stretching an OCR program across Stretching an OCR program across two monitors is a HUGE time-two monitors is a HUGE time-saver!saver!
Learn to use keyboard shortcuts.Learn to use keyboard shortcuts.– They save tons of time!They save tons of time!
Top Related