Post on 24-May-2015
description
Image Enhancement and OCR
Niall Anderson, The British Library, 12 July 2010
2
What is Image Enhancement?
Image enhancement is a suite of technical solutions to improve display or delivery of digital images – particularly text-based images
Main areas of improvement• Removing noise and other digital
artefacts• Geometric correction for skewed images• Geometric correction for warped pages
in paper original
3
Example of an enhanced image
Warped Dewarped
4
Why Image Enhancement?
To increase quality of image for display
To increase quality of image for printing (especially for Print On Demand)
To increase quality of Optical Character Recognition results
5
OCR and Image Enhancement
OCR will produce its best results on material with the following characteristics
• The layout of the text is simple, with no tables or illustrations;
• The text itself is in a modern, computer-generated typeface;
• The digital image preserves a high contrast between the text block and non-text detail (including blank space)
• The image has been created from a perfectly flat and straight scan (if a digital copy from an analogue source)
• The text of the analogue source is clear, well aligned and consistently presented
• The basic material of the analogue source is undamaged; the text is in a single language
• The image has been taken from the original physical source and not a degraded surrogate (such as microfilm)
6
IMPACT Image Enhancement toolkit
7
Types of image enhancement in toolkit
Binarisation
8
Types of image enhancement in toolkit
Border removal
9
Types of image enhancement in toolkit
Page splitting
10
Types of Image Enhancement in toolkit
Dewarping
11
Using the IMPACT Image Enhancement toolkit - 1
Select the directory with your images or copy your images to directory
12
Using the IMPACT Image Enhancement toolkit - 2
Select the directory for saving the results
13
Using the IMPACT Image Enhancement toolkit - 3
Select one or more document images
14
Using the IMPACT Image Enhancement toolkit - 4
Define a processing workflow
15
Using the IMPACT Image Enhancement toolkit - 5
Select the method for every processing module
16
Using the IMPACT Image Enhancement toolkit - 6
Execute workflow by pressing "Apply Processes"
17
Using the IMPACT Image Enhancement toolkit - 7
View results on the preview window or right click on any module at the workflow line and select "View Result".
18
Indicative results – Border Removal
22383 images to test border removal
BL: 7% BNE: 34%BNF: 34% BSB: 11%JSI: 6% NLB: 2%ONB: 6%
Only images with borders
38718 images to test border removal
BL: 9% BNE: 29%BNF: 32% BSB: 12%JSI: 11% NLB: 2%ONB: 5%
19
Indicative results – Page splitting
458 images from BNF to
test page split
3009 images to test page split
BL: 72% BSB: 10% JSI: 18%
20
Indicative results - Dewarping
IMPACT Page Curl Correction v.4
87.78%(81.98% only coarse correction)
BookRestorer
80.87%
21
Research and references
N. Stamatopoulos, B. Gatos, I. Pratikakis and S.J. Perantonis, Goal-oriented Rectification of Camera-Based Document Images, IEEE Transactions on Image Processing, vol. 20, no. 4, pp. 910-920, 2011.
N. Stamatopoulos, B. Gatos, T. Georgiou, Page frame detection for double page document images, 9th IAPR International Workshop on Document Analysis Systems (DAS 2010), pp. 401-408, Cambridge, MA, USA, June 2010
B. Gatos, I. Pratikakis and S. J. Perantonis, Adaptive Degraded Document Image Binarization, Pattern Recognition, Vol. 39, pp. 317-327, 2006
22
Questions?