HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel...

19
HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of Computer Science and Engineering University of West Bohemia in Pilsen

Transcript of HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel...

Page 1: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen

Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf

Department of Computer Science and EngineeringUniversity of West Bohemia in Pilsen

Page 2: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 2

Outline

• Data acquisition device: The BiSP pen

• Handwritten text recognition

• Hidden Markov models

• Experimental results

• Future Work

Page 3: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 3

Input Devices – Overview

• off-line (static)– scanners– cameras*

• on-line (dynamic)– electronic pens– digitizers, tablets– cameras*– mouse*

Page 4: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan

Input Device: The BiSP Pen• Electronic pen* is used for data acquisition

built at University of Applied Sciences in Regensburg, Germany

*

Page 5: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 5

Input Device – Writing

Page 6: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 6

Input Device – Signals

Page 7: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 7

Handwritten Text Recognition

• Objective: To convert handwritten sentences or phrases in analog form (off-line or on-line sources) into digital form (ASCII or Unicode).

• isolated character recognition (TM, DTW, NN)

• word recognition (HMMs)• gesture recognition

Page 8: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 8

Hanwritten Text

hand printed characters

spaced descrete characters

cursive script words

Page 9: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 9

Signal Description

• Pairs of x and y signals are transformed into sequence of primitives

Primitive (observation)

Signal trendx y

1 2 3 4

Page 10: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 10

Hidden Markov Models

• left-to-right model(used mostly in speech recognition)

Page 11: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 11

Hidden Markov Models

• Training – Baum-Welch algorithm• Recognition – Backward algorithm

• Matrices that describes the model (A, B, ) are decomposed after the training – one model for each letter

Page 12: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 12

Word HMM Decomposition

Page 13: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 13

Word HMM Composition

Page 14: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 14

Experimental Results

• method have been tested on three independent data sets of various sizes

• limited number of letters used in our data sets: 15– reduced complexity of tagging the training set

Vocabulary size 1649 2198 5129

Recognition rate (%) 88 90 82

Recognition time (min) 17-26 27-49 360

Page 15: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 15

Future Work

• to speed up the algorithm to achieve real-time recognition

• incorporation of language models to improve the recognition rate

• special attention will be paid to signature analysis and signature verification

• application in tele-robotics and robot sensing robot aided signature forging

Page 16: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 16

Forgeries – Overwiew

a) genuine c) unskilled b) zero-effort d) skilled

Page 17: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 17

Example of Two Features

Page 18: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 18

Class Boundaries

Page 19: HMM Based Handwritten Text Recognition Using Biometrical Data Acquisition Pen Ondrej Rohlik, Pavel Mautner, Vaclav Matousek, Juergen Kempf Department of.

Ondrej Rohlik, IEEE CIRA 2003, Kobe, Japan 19

Signature Verification – Algorithms For each class C Training algorithm For each feature f For each pair of signatures Classes[C][i] and Classes[C][j] Compute the difference between Classes[C][i] and Classes[C][j] and add it to an extra variable Sum[f] Compute mean value mean[f] and variance var[f] of each feature over all pairs using the variable Sum[f] Compute critical cluster coefficient using variances var[f] and weights w[f] over all features f

For class C to be verified Classification algorithm For each pattern Classes[c][i] For each feature f Compute the difference and remember the least one over all patterns Sum up products of least differences and weights w[f] and compare the sum with Critical cluster coefficient