Recognition of Cursive Roman Handwriting – Past, Present ... · PDF file optical...
date post
29-Jul-2020Category
Documents
view
2download
0
Embed Size (px)
Transcript of Recognition of Cursive Roman Handwriting – Past, Present ... · PDF file optical...
Recognition of Cursive Roman Handwriting – Past, Present and Future
H. Bunke
Department of Computer Science, University of Bern
Neubrückstrasse 10, CH-3012 Bern, Switzerland
Acknowledgments:
- S. Günter, T. Varga, M. Zimmermann
- Swiss National Science Foundation (20-5287.97 and IM2)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.1/61
Introduction
optical character recognition (OCR)
� �
� �
� �
H H
H H
H H
Oriental Script Roman Script
� �
� ��
H H
H HH
machine printed text handwritten text
� �
�
H H
H
on-line off-line
� �
H H
isolated cursive
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.2/61
Introduction
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.3/61
Introduction
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.4/61
Introduction
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.5/61
Introduction
(why) is it difficult?
• large variation in personal handwriting style
• different writing instruments
• segmentation problem
• large vocabulary (possibly open)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.6/61
Introduction
hundert
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.7/61
Introduction
is there any future need for automatic handwriting recognition?
• applications with commercial potential: address, form and check reading
• digital libraries, transcription of historical archives
• "non-death" of paper and new devices for handwriting acquisition
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.8/61
Introduction
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.9/61
Contents
1. Introduction
2. State of the Art
3. Current Developments
4. Future Trends
5. Conclusion
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.10/61
Document Image Preprocessing
standard operations include
• noise filtering
• binarization
• thinning
• skew correction
• slant correction
• estimation of baseline and main writing zones
• horizontal and vertical scaling
• additional problem dependent methods to separate handwriting from background
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.11/61
Document Image Preprocessing
original image final result
binarized image deslanted image
thinned image estimation of writing zones
estimation of slant deslanted and deskewed image
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.12/61
Isolated Character Recognition
• usually cast as a classification problem
• consists of preprocessing, feature extraction, and classification
features for isolated character recognition:
• raw pixels
• derived from series expansion, moments, etc.
• projection based features, contour based features
• structural features: end points, forks, junctions, etc.
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.13/61
Isolated Character Recognition
classifiers for isolated character recognition:
• nearest-neighbor
• Bayes classifier
• neural nets
• SVM, etc.
which classifier is best?
• depends on many factors, for example, available training set, number of free parameters, time & memory constraints, etc.
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.14/61
Cursive Word Recognition
• major problem: segmentation
• Sayre’s paradox
• three approaches
− holistic − segmentation-based (oversegment and merge) − segmentation-free (Hidden Markov Models, HMM)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.15/61
Hidden Markov Models (HMMs)
sliding window
feature vector
↓ 0
B
B
@
x01 ...
x0n
1
C
C
A
HMM S1
P11
S2 P12
P(X)
P22
S3 P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.16/61
Hidden Markov Models (HMMs)
sliding window
feature vector
↓ 0
B
B
@
x11 ...
x1n
1
C
C
A
HMM S1
P11
S2 P12
P(X)
P22
S3 P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.17/61
Hidden Markov Models (HMMs)
sliding window
feature vector
↓ 0
B
B
@
x21 ...
x2n
1
C
C
A
HMM S1
P11
S2 P12
P(X)
P22
S3 P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.18/61
Hidden Markov Models (HMMs)
sliding window
feature vector
↓ 0
B
B
@
x31 ...
x3n
1
C
C
A
HMM S1
P11
S2 P12
P(X)
P22
S3 P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.19/61
Hidden Markov Models (HMMs)
sliding window
feature vector
↓ 0
B
B
@
x41 ...
x4n
1
C
C
A
HMM S1
P11
S2 P12
P(X)
P22
S3 P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.20/61
Hidden Markov Models (HMMs)
sliding window
feature vector
↓ 0
B
B
@
x51 ...
x5n
1
C
C
A
HMM S1
P11
S2 P12
P(X)
P22
S3 P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.21/61
Hidden Markov Models (HMMs)
sliding window
feature vector
↓ 0
B
B
@
x61 ...
x6n
1
C
C
A
HMM S1
P11
S2 P12
P(X)
P22
S3 P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.22/61
General Text Recognition
• segmentation-based: segment line of text into individual words, then use cursive word recognizer
• segmentation-free: segmentation and recognition are integrated
− concatenate HMM word to word sequence (or sentence) models − use constraints to narrow down the search-space, for example,
soft-constraints derived from n-gram language models
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.23/61
Segmentation-free Word Sequence Recognition
• concatenation of HMM
w1
w2
wn
w1
w2
wn
w1
w2
wn
...
...
...
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.24/61
Segmentation-free Word Sequence Recognition
• concatenation of HMM
w1
w2
wn
w1
w2
wn
w1
w2
wn
...
...
...
p(w1 i )
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.25/61
Segmentation-free Word Sequence Recognition
• concatenation of HMM
w1
w2
wn
w1
w2
wn
w1
w2
wn
...
...
...
p(w1 i ) p(w2
i |w1
j )
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.26/61
Segmentation-free Word Sequence Recognition
• concatenation of HMM
w1
w2
wn
w1
w2
wn
w1
w2
wn
...
...
...
p(w1 i ) p(w2
i |w1
j ) p(w3
i |w2
j )
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.27/61
Segmentation-free Word Sequence Recognition
• concatenation of HMM