Recognition of Cursive Roman Handwriting – Past, Present ... · PDF file optical...

Click here to load reader

  • date post

    29-Jul-2020
  • Category

    Documents

  • view

    2
  • download

    0

Embed Size (px)

Transcript of Recognition of Cursive Roman Handwriting – Past, Present ... · PDF file optical...

  • Recognition of Cursive Roman Handwriting – Past, Present and Future

    H. Bunke

    [email protected]

    Department of Computer Science, University of Bern

    Neubrückstrasse 10, CH-3012 Bern, Switzerland

    Acknowledgments:

    - S. Günter, T. Varga, M. Zimmermann

    - Swiss National Science Foundation (20-5287.97 and IM2)

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.1/61

  • Introduction

    optical character recognition (OCR)

    � �

    � �

    � �

    H H

    H H

    H H

    Oriental Script Roman Script

    � �

    � ��

    H H

    H HH

    machine printed text handwritten text

    � �

    H H

    H

    on-line off-line

    � �

    H H

    isolated cursive

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.2/61

  • Introduction

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.3/61

  • Introduction

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.4/61

  • Introduction

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.5/61

  • Introduction

    (why) is it difficult?

    • large variation in personal handwriting style

    • different writing instruments

    • segmentation problem

    • large vocabulary (possibly open)

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.6/61

  • Introduction

    hundert

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.7/61

  • Introduction

    is there any future need for automatic handwriting recognition?

    • applications with commercial potential: address, form and check reading

    • digital libraries, transcription of historical archives

    • "non-death" of paper and new devices for handwriting acquisition

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.8/61

  • Introduction

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.9/61

  • Contents

    1. Introduction

    2. State of the Art

    3. Current Developments

    4. Future Trends

    5. Conclusion

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.10/61

  • Document Image Preprocessing

    standard operations include

    • noise filtering

    • binarization

    • thinning

    • skew correction

    • slant correction

    • estimation of baseline and main writing zones

    • horizontal and vertical scaling

    • additional problem dependent methods to separate handwriting from background

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.11/61

  • Document Image Preprocessing

    original image final result

    binarized image deslanted image

    thinned image estimation of writing zones

    estimation of slant deslanted and deskewed image

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.12/61

  • Isolated Character Recognition

    • usually cast as a classification problem

    • consists of preprocessing, feature extraction, and classification

    features for isolated character recognition:

    • raw pixels

    • derived from series expansion, moments, etc.

    • projection based features, contour based features

    • structural features: end points, forks, junctions, etc.

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.13/61

  • Isolated Character Recognition

    classifiers for isolated character recognition:

    • nearest-neighbor

    • Bayes classifier

    • neural nets

    • SVM, etc.

    which classifier is best?

    • depends on many factors, for example, available training set, number of free parameters, time & memory constraints, etc.

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.14/61

  • Cursive Word Recognition

    • major problem: segmentation

    • Sayre’s paradox

    • three approaches

    − holistic − segmentation-based (oversegment and merge) − segmentation-free (Hidden Markov Models, HMM)

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.15/61

  • Hidden Markov Models (HMMs)

    sliding window

    feature vector

    ↓ 0

    B

    B

    @

    x01 ...

    x0n

    1

    C

    C

    A

    HMM S1

    P11

    S2 P12

    P(X)

    P22

    S3 P23

    P(X)

    P33

    ...

    P(X)

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.16/61

  • Hidden Markov Models (HMMs)

    sliding window

    feature vector

    ↓ 0

    B

    B

    @

    x11 ...

    x1n

    1

    C

    C

    A

    HMM S1

    P11

    S2 P12

    P(X)

    P22

    S3 P23

    P(X)

    P33

    ...

    P(X)

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.17/61

  • Hidden Markov Models (HMMs)

    sliding window

    feature vector

    ↓ 0

    B

    B

    @

    x21 ...

    x2n

    1

    C

    C

    A

    HMM S1

    P11

    S2 P12

    P(X)

    P22

    S3 P23

    P(X)

    P33

    ...

    P(X)

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.18/61

  • Hidden Markov Models (HMMs)

    sliding window

    feature vector

    ↓ 0

    B

    B

    @

    x31 ...

    x3n

    1

    C

    C

    A

    HMM S1

    P11

    S2 P12

    P(X)

    P22

    S3 P23

    P(X)

    P33

    ...

    P(X)

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.19/61

  • Hidden Markov Models (HMMs)

    sliding window

    feature vector

    ↓ 0

    B

    B

    @

    x41 ...

    x4n

    1

    C

    C

    A

    HMM S1

    P11

    S2 P12

    P(X)

    P22

    S3 P23

    P(X)

    P33

    ...

    P(X)

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.20/61

  • Hidden Markov Models (HMMs)

    sliding window

    feature vector

    ↓ 0

    B

    B

    @

    x51 ...

    x5n

    1

    C

    C

    A

    HMM S1

    P11

    S2 P12

    P(X)

    P22

    S3 P23

    P(X)

    P33

    ...

    P(X)

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.21/61

  • Hidden Markov Models (HMMs)

    sliding window

    feature vector

    ↓ 0

    B

    B

    @

    x61 ...

    x6n

    1

    C

    C

    A

    HMM S1

    P11

    S2 P12

    P(X)

    P22

    S3 P23

    P(X)

    P33

    ...

    P(X)

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.22/61

  • General Text Recognition

    • segmentation-based: segment line of text into individual words, then use cursive word recognizer

    • segmentation-free: segmentation and recognition are integrated

    − concatenate HMM word to word sequence (or sentence) models − use constraints to narrow down the search-space, for example,

    soft-constraints derived from n-gram language models

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.23/61

  • Segmentation-free Word Sequence Recognition

    • concatenation of HMM

    w1

    w2

    wn

    w1

    w2

    wn

    w1

    w2

    wn

    ...

    ...

    ...

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.24/61

  • Segmentation-free Word Sequence Recognition

    • concatenation of HMM

    w1

    w2

    wn

    w1

    w2

    wn

    w1

    w2

    wn

    ...

    ...

    ...

    p(w1 i )

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.25/61

  • Segmentation-free Word Sequence Recognition

    • concatenation of HMM

    w1

    w2

    wn

    w1

    w2

    wn

    w1

    w2

    wn

    ...

    ...

    ...

    p(w1 i ) p(w2

    i |w1

    j )

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.26/61

  • Segmentation-free Word Sequence Recognition

    • concatenation of HMM

    w1

    w2

    wn

    w1

    w2

    wn

    w1

    w2

    wn

    ...

    ...

    ...

    p(w1 i ) p(w2

    i |w1

    j ) p(w3

    i |w2

    j )

    Recognition of Cursive Roman Handwriting – Past, Present and Future – p.27/61

  • Segmentation-free Word Sequence Recognition

    • concatenation of HMM