Download - Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Transcript
Page 1: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Recognition of Cursive Roman Handwriting– Past, Present and Future

H. Bunke

[email protected]

Department of Computer Science, University of Bern

Neubruckstrasse 10, CH-3012 Bern, Switzerland

Acknowledgments:

- S. Gunter, T. Varga, M. Zimmermann

- Swiss National Science Foundation (20-5287.97 and IM2)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.1/61

Page 2: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Introduction

optical character recognition (OCR)

��

��

��

HH

HH

HH

Oriental Script Roman Script

��

���

HH

HHH

machine printed text handwritten text

��

HH

H

on-line off-line

��

HH

isolated cursive

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.2/61

Page 3: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Introduction

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.3/61

Page 4: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Introduction

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.4/61

Page 5: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Introduction

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.5/61

Page 6: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Introduction

(why) is it difficult?

• large variation in personal handwriting style

• different writing instruments

• segmentation problem

• large vocabulary (possibly open)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.6/61

Page 7: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Introduction

hundert

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.7/61

Page 8: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Introduction

is there any future need for automatic handwriting recognition?

• applications with commercial potential: address, form and check reading

• digital libraries, transcription of historical archives

• "non-death" of paper and new devices for handwriting acquisition

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.8/61

Page 9: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Introduction

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.9/61

Page 10: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Contents

1. Introduction

2. State of the Art

3. Current Developments

4. Future Trends

5. Conclusion

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.10/61

Page 11: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Document Image Preprocessing

standard operations include

• noise filtering

• binarization

• thinning

• skew correction

• slant correction

• estimation of baseline and main writing zones

• horizontal and vertical scaling

• additional problem dependent methods to separate handwriting frombackground

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.11/61

Page 12: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Document Image Preprocessing

original image final result

binarized image deslanted image

thinned image estimation of writing zones

estimation of slant deslanted and deskewed image

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.12/61

Page 13: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Isolated Character Recognition

• usually cast as a classification problem

• consists of preprocessing, feature extraction, and classification

features for isolated character recognition:

• raw pixels

• derived from series expansion, moments, etc.

• projection based features, contour based features

• structural features: end points, forks, junctions, etc.

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.13/61

Page 14: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Isolated Character Recognition

classifiers for isolated character recognition:

• nearest-neighbor

• Bayes classifier

• neural nets

• SVM, etc.

which classifier is best?

• depends on many factors, for example, available training set, number offree parameters, time & memory constraints, etc.

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.14/61

Page 15: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Cursive Word Recognition

• major problem: segmentation

• Sayre’s paradox

• three approaches

− holistic− segmentation-based (oversegment and merge)− segmentation-free (Hidden Markov Models, HMM)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.15/61

Page 16: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Hidden Markov Models (HMMs)

slidingwindow

featurevector

↓0

B

B

@

x01

...x0n

1

C

C

A

HMM S1

P11

S2P12

P(X)

P22

S3P23

P(X)

P33

...

P(X)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.16/61

Page 17: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Hidden Markov Models (HMMs)

slidingwindow

featurevector

↓0

B

B

@

x11

...x1n

1

C

C

A

HMM S1

P11

S2P12

P(X)

P22

S3P23

P(X)

P33

...

P(X)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.17/61

Page 18: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Hidden Markov Models (HMMs)

slidingwindow

featurevector

↓0

B

B

@

x21

...x2n

1

C

C

A

HMM S1

P11

S2P12

P(X)

P22

S3P23

P(X)

P33

...

P(X)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.18/61

Page 19: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Hidden Markov Models (HMMs)

slidingwindow

featurevector

↓0

B

B

@

x31

...x3n

1

C

C

A

HMM S1

P11

S2P12

P(X)

P22

S3P23

P(X)

P33

...

P(X)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.19/61

Page 20: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Hidden Markov Models (HMMs)

slidingwindow

featurevector

↓0

B

B

@

x41

...x4n

1

C

C

A

HMM S1

P11

S2P12

P(X)

P22

S3P23

P(X)

P33

...

P(X)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.20/61

Page 21: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Hidden Markov Models (HMMs)

slidingwindow

featurevector

↓0

B

B

@

x51

...x5n

1

C

C

A

HMM S1

P11

S2P12

P(X)

P22

S3P23

P(X)

P33

...

P(X)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.21/61

Page 22: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Hidden Markov Models (HMMs)

slidingwindow

featurevector

↓0

B

B

@

x61

...x6n

1

C

C

A

HMM S1

P11

S2P12

P(X)

P22

S3P23

P(X)

P33

...

P(X)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.22/61

Page 23: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

General Text Recognition

• segmentation-based: segment line of text into individual words, then usecursive word recognizer

• segmentation-free: segmentation and recognition are integrated

− concatenate HMM word to word sequence (or sentence) models− use constraints to narrow down the search-space, for example,

soft-constraints derived from n-gram language models

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.23/61

Page 24: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Segmentation-free Word Sequence Recognition

• concatenation of HMM

w1

w2

wn

w1

w2

wn

w1

w2

wn

...

...

...

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.24/61

Page 25: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Segmentation-free Word Sequence Recognition

• concatenation of HMM

w1

w2

wn

w1

w2

wn

w1

w2

wn

...

...

...

p(w1

i)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.25/61

Page 26: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Segmentation-free Word Sequence Recognition

• concatenation of HMM

w1

w2

wn

w1

w2

wn

w1

w2

wn

...

...

...

p(w1

i) p(w2

i|w1

j)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.26/61

Page 27: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Segmentation-free Word Sequence Recognition

• concatenation of HMM

w1

w2

wn

w1

w2

wn

w1

w2

wn

...

...

...

p(w1

i) p(w2

i|w1

j) p(w3

i|w2

j)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.27/61

Page 28: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Segmentation-free Word Sequence Recognition

• concatenation of HMM

w1

w2

wn

w1

w2

wn

w1

w2

wn

...

...

...

p(w1

i) p(w2

i|w1

j) p(w3

i|w2

j)

• bi-gram language model

word next word probability

to the 0.009333

to be 0.002239

to a 0.000138

to have 0.000105

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.28/61

Page 29: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Recognition Experiment

40

50

60

70

80

0 1000 2000 3000 4000 5000 6000 7000 8000

Wor

d R

ecog

nitio

n R

ate

[%]

Vocabulary Size [n]

Simple Sentence ModelUnigram ModelBigram Model

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.29/61

Page 30: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Some Recent Trends

• databases for development and performance evaluation

• multiple classifier systems

• synthetic training data

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.30/61

Page 31: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Databases

• isolated characters and words:− CEDAR− NIST− CENPARMI− ELT9− IRESTE− ...

• cursively handwritten text− Senior/Robinson, PAMI 1998− Elliman/Sherkat, ICDAR 2001− IAM, collection in progress (since about 1997)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.31/61

Page 32: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Some Details of the IAM Database

• more than 1,500 scanned pages of handwritten text

• material from over 600 individual writers− 95,000 correctly segmented words− over 13,000 lines of text− over 5,000 complete sentences

• covering a vocabulary of over 12,000 words

• ground truth and lexical tags available (LOB corpus)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.32/61

Page 33: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Some Details of the IAM Database (2)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.33/61

Page 34: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Some Details of the IAM Database (3)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.34/61

Page 35: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Multiple Classifier Systems

• motivation: use a group of experts rather than a single expert

• many approaches to handwriting recognition have been proposed usingmcs’s

• often the basic classifiers are constructed ’by hand’

• recently so-called ensemble methods have been proposed:− they require only a single classifier to be constructed by hand− the classifier ensemble is generated automatically

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.35/61

Page 36: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Multiple Classifier Systems (2)

"classical" approach

input resultcombiner

nc

1c

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.36/61

Page 37: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Multiple Classifier Systems (3)

c1

cn

combiner resultinput

ensemble method

generateautomatically

base classifier

c

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.37/61

Page 38: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Issues in MCS’s

• ensemble generation− bagging− feature subspace− boosting− others

• combination− voting− rank sum− weighted voting− trainable classifier

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.38/61

Page 39: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Some Results

recognition rates achieved by various ensemble generation methods

algorithm recognition rate

Bagging 68.11%

AdaBoost 68.67%

random subspace 67.35%

feature selection 71.58%

original classifier 66.23%

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.39/61

Page 40: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Synthetic Generation of Training Data

• all recognizers need to be trained

• the larger the training set, the better the performance("you never have enough training data")

• but collection of training data is expensive

• previous work on generation of synthetic training data:− machine printed OCR [Baird et al.]− Arabic and Chinese OCR− isolated characters− (synthetic handwriting for other purposes [Guyon, Plamondon])

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.40/61

Page 41: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Synthetic Generation of Training Data

• no work on synthetic training data generation for cursive Romanhandwriting recognition

• two approaches:− using templates− applying geometric distortions to existing handwritten text

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.41/61

Page 42: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Synthetic Handwriting from Templates

• templates extracted from forms

• templates extracted from running text, using HMM in forced alignmentmode

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.42/61

Page 43: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Synthetic Handwriting from Templates (2)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.43/61

Page 44: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Synthetic Handwriting from Templates (3)

• disadvantages:− all instances of a character are identical− no ligatures

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.44/61

Page 45: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Synthetic Handwriting from N-Grams

• compile a list of frequent 3- and 2-tuples from an electronic corpus

• extract templates of these tuples from a handwritten text, using forcedalignment

• split the given text into available tuples and generate the synthetichandwriting

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.45/61

Page 46: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Synthetic Handwriting from N-Grams (2)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.46/61

Page 47: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Some Results

0 1 2 3 4 560

62

64

66

68

70

72

74

training set

reco

gniti

on r

ate

[%]

• 1193 word instances; 16 writers; 357 word vocabulary

• 80% training; 20% testing; 5-fold cross validation

• 1 = natural training data2 = synthetic training data3 = synthetic training data4 = synthetic training data

• test data: always natural

• except for the training data (natural/synthetic) identical conditions for allexperiments (same training/test words; same size of training/test set etc.)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.47/61

Page 48: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Future Perspectives

• some random comments:

− MCS’s− synthetic training data− enhanced HMMs (for example, 2D)− enhanced language models− etc.

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.48/61

Page 49: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Future Perspectives

• to reach a new quality of recognition we need to go from text transcriptionto text understanding:

− include syntactic and semantic text analysis− include task specific knowledge (in addition to statistical parameter

estimation)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.49/61

Page 50: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Who can read this?

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.50/61

Page 51: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Who can read this?

When I was in high school, my physics teacher - whose namewas Mr. Bader - called me down one day after physics classand said, "You look bored; I want to tell you something inte-resting." Then he told me something which I found fascina-ting, and have, since then, always found fascinating....The subject # is this - the principle of least action.Richard P. Feynman: The Feynman Lectures, Volume II.

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.51/61

Page 52: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Who can read this?

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.52/61

Page 53: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Who can read this?

Középiskolás koromban, egy nap a fizikatanárom - Bader úrnakívták - magához hívott fizikaóra után és azt mondta: "Unott-nak látszol; szeretnék mondani neked valami érdekeset." Majdelmondott valamit, amit elbûvölõnek találtam, és az-óta is mindig elbûvölõnek találom ... A legkisebb hatáselvérõl van szó.Richard P. Feynman: The Feynman Lectures, Volume II.

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.53/61

Page 54: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Integration of Grammatical Knowledge

• prerequisites:

− a word sequence recognizer that produces an n-best list (see before)− a stochastic context free grammar− a parser to compute the probability of a sentence or the most

probable parse tree

• procedure:

− reorder the n-best list from the recognizer taking parse probabilitiesinto account

final score = recognition score + γ f(parse probability)

where γ is a normalization factor and f(.) is a normalization function

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.54/61

Page 55: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Example of Grammatical Knowledge Integration

Rank Recognition Score Candidate Sentence

1 23923.6 She has put up the value other money .

2 23921.8 She has put up the value of her money .

3 23890.3 She had put up the value other money .

4 23888.4 She had put up the value of her money .

5 23854.3 She has put up the value at her money .

Rank Parse Prob. Candidate Sentence

1 1.58352e-19 She had put up the value of her money .

2 4.62861e-20 She has put up the value of her money .

3 1.12458e-21 She has put up the value at her money .

4 2.63105e-22 She had put up the value other money .

5 7.69052e-23 She has put up the value other money .

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.55/61

Page 56: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Example of Grammatical Knowledge Integration

Rank Recognition Score Candidate Sentence

1 23923.6 She has put up the value other money .

2 23921.8 She has put up the value of her money .

3 23890.3 She had put up the value other money .

4 23888.4 She had put up the value of her money .

5 23854.3 She has put up the value at her money .

Rank Parse Prob. Candidate Sentence

1 1.58352e-19 She had put up the value of her money .

2 4.62861e-20 She has put up the value of her money .

3 1.12458e-21 She has put up the value at her money .

4 2.63105e-22 She had put up the value other money .

5 7.69052e-23 She has put up the value other money .

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.56/61

Page 57: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Some Experimental Results

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

0 10 20 30 40 50 60 70 80 90 100

Sen

tenc

e R

ecog

nitio

n R

ate

[%]

Rank [n]

Reordered 100-Best ListBaseline System

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.57/61

Page 58: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Future Challenge

• to deal with human factors (i.e. errors and abnormalities introduced byhumans)

− statistical modeling has proven very useful− however we also need to incorporate task specific knowledge

provided by human experts

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.58/61

Page 59: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Sample Check Images

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.59/61

Page 60: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Sample Check Images (2)

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.60/61

Page 61: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten

Conclusions

• the recognition of cursive Roman handwriting has been a subject ofresearch for several decades

• for specific tasks some level of maturity has been reached andcommercial systems have become available

• some other tasks, particularly the recognition of unconstrained generaltext, need much more research

• these tasks are interesting for practical applications

• there do exist promising directions to further develop the field

Recognition of Cursive Roman Handwriting – Past, Present and Future – p.61/61