Hidden Markov Modelling and Handwriting Recognition

31
Hidden Markov Modelling Hidden Markov Modelling and Handwriting and Handwriting Recognition Recognition Csink László 2009 Csink László 2009

description

Hidden Markov Modelling and Handwriting Recognition. Csink László 2009. Types of Handwriting 1. 1. BLOCK PRINTING. 2. GUIDED CURSIVE HANDWRITING. Types of Handwriting 2. 3. UNCONSTRAINED CURSIVE HANDWRITING Clearly faster, but less legible, than 1 or 2. - PowerPoint PPT Presentation

Transcript of Hidden Markov Modelling and Handwriting Recognition

Page 1: Hidden Markov Modelling and Handwriting Recognition

Hidden Markov Modelling and Hidden Markov Modelling and Handwriting RecognitionHandwriting Recognition

Csink László 2009Csink László 2009

Page 2: Hidden Markov Modelling and Handwriting Recognition

22

Types of Handwriting 1Types of Handwriting 1

1. BLOCK PRINTING

2. GUIDED CURSIVE HANDWRITING

Page 3: Hidden Markov Modelling and Handwriting Recognition

33

Types of Handwriting 2Types of Handwriting 2

3. UNCONSTRAINED CURSIVE HANDWRITING

Clearly faster, but less legible, than 1 or 2

ONLINE recognition for 3: some systems have been developed

OFFLINE recognition for 3: much research has been done, still a lot to do

Suen: ”no simple scheme is likely to achieve high recognition and reliability rates, not to mention human performance”

Page 4: Hidden Markov Modelling and Handwriting Recognition

44

Introduction to Hidden Markov Modelling (HMM): a Introduction to Hidden Markov Modelling (HMM): a simple example 1simple example 1

Suppose we want to determine the Suppose we want to determine the average annual temperature at a average annual temperature at a specific location over a series of years. specific location over a series of years. We want to do it of such a past era of We want to do it of such a past era of which measurements are unavailable. which measurements are unavailable. We assume that only two kinds of years We assume that only two kinds of years exist: hot (H) and cold(C) and we know exist: hot (H) and cold(C) and we know that the probability of a cold year that the probability of a cold year coming after a hot one is 0.3, and the coming after a hot one is 0.3, and the probability of a cold year coming after a probability of a cold year coming after a cold one is 0.6. Similar data are known cold one is 0.6. Similar data are known about the prob of a hot year after a hot about the prob of a hot year after a hot one or a cold one, respectively. We one or a cold one, respectively. We assume that the probabilities are the assume that the probabilities are the same over the years. Then the data are same over the years. Then the data are expressed like this:expressed like this:

HH CC

HH 0.70.7 0.30.3

CC 0.40.4 0.60.6

We note that the row sums in the red matrix are 1! (row stochastic matrix)

The transition process described by the red matrix is a MARKOV PROCESS, as the next state depends only on the prevoius one.

Page 5: Hidden Markov Modelling and Handwriting Recognition

55

Introduction to HMM: a simple example 2Introduction to HMM: a simple example 2

We also suppose that there is We also suppose that there is a known correaltion between a known correaltion between the size of tree growth rings the size of tree growth rings and temperature. We and temperature. We consider only 3 different ring consider only 3 different ring sizes: Small, Medium and sizes: Small, Medium and Large. We know that in each Large. We know that in each year the following year the following probabilistic realtionship probabilistic realtionship holds between the holds between the statesstates H H and C and the and C and the ringsrings S, M and S, M and L:L:

SS MM LL

HH 0.10.1 0.40.4 0.50.5

CC 0.70.7 0.20.2 0.10.1

We note that the row sums in the red matrix are 1!

(also a row stochastic matrix)

Page 6: Hidden Markov Modelling and Handwriting Recognition

66

Introduction to HMM: a simple example 3Introduction to HMM: a simple example 3

Since the past temperatures are unknown, that is the past states Since the past temperatures are unknown, that is the past states are hidden, the above model is called a are hidden, the above model is called a Hidden Markov Model Hidden Markov Model (HMM)(HMM)..

6.04.0

3.07.0A

1.02.07.0

5.04.01.0B

4.06.0

State transition matrix (Markov)

Observation matrix

Initial state distribution (we assume this is also known)

A and B and π are all row stochastic

Page 7: Hidden Markov Modelling and Handwriting Recognition

77

Introduction to HMM: a simple example 4Introduction to HMM: a simple example 4

Denote the rings S, M and L by 0,1 and 2, resp. Denote the rings S, M and L by 0,1 and 2, resp. Assume that in a –year period we observe Assume that in a –year period we observe O=(0,1,0,2). We want to determine the most likely O=(0,1,0,2). We want to determine the most likely sequence of the Markov process given the sequence of the Markov process given the observations O.observations O.

Dynamic ProgrammingDynamic Programming: the most likely sequence is : the most likely sequence is the one with the highest probability from all possible the one with the highest probability from all possible state sequences of length four.state sequences of length four.

HMM solutionHMM solution: the most likely sequence is the one : the most likely sequence is the one that maximizes the expected number of correct that maximizes the expected number of correct states.states.

These two solutuions do not necessarily coincide!These two solutuions do not necessarily coincide!

Page 8: Hidden Markov Modelling and Handwriting Recognition

88

6.04.0

3.07.0

C

H

CH

Introduction to HMM: a simple example 5Introduction to HMM: a simple example 5

Notations

In the previous example,

T=4, N=2, M=3,

Q={H, C}

V={0(=S), 1(=M), 2(=L)}, O=(0,1,0,2)

O= (0, 1, 0, 2)

State transition matrix A

1.02.07.0

5.04.01.0

)(2)(1)(0

C

H

LMSObservation matrix B

4.06.0Initial state distribution π

Page 9: Hidden Markov Modelling and Handwriting Recognition

99

State Sequence ProbabilityState Sequence Probability

Consider a state sequence of length fourConsider a state sequence of length four X=(xX=(x00,x,x11,x,x22,x,x33) with observations ) with observations

O=(OO=(O00,O,O11,O,O22,O,O33)) Denote by Denote by ππxx00

the probability of starting in state the probability of starting in state

xx00. b. bxx00(O(O00) is the probability of initially observing ) is the probability of initially observing

OO00 and a and axx00,x,x11 is the probability of transiting from is the probability of transiting from

state xstate x00 to state x to state x11. We see that the probability . We see that the probability of the state sequence X above isof the state sequence X above is

OxxxOxxxOxxxOxxXP bababab 3,2,1,0

33222111000

)(

Page 10: Hidden Markov Modelling and Handwriting Recognition

1010

Probability of Sequence (H,H,C,C)Probability of Sequence (H,H,C,C)

6.04.0

3.07.0

C

H

CH 4.06.0

OxxxOxxxOxxxOxxXP bababab 3,2,1,0

33222111000

)(

1.02.07.0

5.04.01.0

)(2)(1)(0

C

H

LMS

A= B=

)2,0,1,0(O

P(HHCC) = 0.6 (0.1) (0.7) (0.4) (0.3) (0.7) (0.6) (0.1) = 0.000212

2010)(,,, bababab CCCCCHHHHHH

XP

),,,(,,,3210

CCHHX xxxx

Page 11: Hidden Markov Modelling and Handwriting Recognition

1111

Finding the Best Solution in the DP SenseFinding the Best Solution in the DP Sense

Using EXCEL functions,

=INDEX(A2:A17; MATCH(MAX(B2:B17);B2:B17;0))

[ =INDEX(A2:A17; HOL.VAN(MAX(B2:B17);B2:B17;0)) ]

we find that sequence with highest probability is CCCH.

This gives the best solution in the Dynamic Programming (DP) sense.

1

state seq. (A) Prob. (B)

Normalized prob.(C)

2 HHHH 0,000412 0,042787

3 HHHC 0,000035 0,003635

4 HHCH 0,000706 0,073320

5 HHCC 0,000212 0,022017

6 HCHH 0,000050 0,005193

7 HCHC 0,000004 0,000415

8 HCCH 0,000302 0,031364

9 HCCC 0,000091 0,009451

10 CHHH 0,001098 0,114031

11 CHHC 0,000094 0,009762

12 CHCH 0,001882 0,195451

13 CHCC 0,000564 0,058573

14 CCHH 0,000470 0,048811

15 CCHC 0,000040 0,004154

16 CCCH 0,002822 0,293073

17 CCCC 0,000847 0,087963

18 SUM 0,009629 1,000000

We compute the state sequence probabilities (see left) the same way as we computed the 5th row in the previous slides. Writing =B2/B$18 into C2 and copying the formula downwards we get the normalized probabilities.

Page 12: Hidden Markov Modelling and Handwriting Recognition

1212

1state seq.

(A) Prob. (B)

Norm. prob.(C) 1st 2nd 3rd 4th

2 HHHH 0,000412 0,042787 H H H H

3 HHHC 0,000035 0,003635 H H H C

4 HHCH 0,000706 0,073320 H H C H

5 HHCC 0,000212 0,022017 H H C C

6 HCHH 0,000050 0,005193 H C H H

7 HCHC 0,000004 0,000415 H C H C

8 HCCH 0,000302 0,031364 H C C H

9 HCCC 0,000091 0,009451 H C C C

10 CHHH 0,001098 0,114031 C H H H

11 CHHC 0,000094 0,009762 C H H C

12 CHCH 0,001882 0,195451 C H C H

13 CHCC 0,000564 0,058573 C H C C

14 CCHH 0,000470 0,048811 C C H H

15 CCHC 0,000040 0,004154 C C H C

16 CCCH 0,002822 0,293073 C C C H

17 CCCC 0,000847 0,087963 C C C C

0 1 2 3

P(H) 0,1882 0,5196 0,2288 0,8040

P(C) 0,8118 0,4804 0,7712 0,1960

Using the EXCEL finctions MID [KÖZÉP] and SUMIF [SZUMHA] we produced the columns D,E,F and G to show the 1st,2nd,3rd and 4th states. Then summing up columns D,E,F and G when the state is ”H” we get the first row of the HMM prob matrix.

The second row of the HMM prob matrix is computed similarly, using ”C” instead of ”H”.

The HMM prob matrix

Page 13: Hidden Markov Modelling and Handwriting Recognition

1313

Three ProblemsThree Problems

Problem 1Problem 1Given the model Given the model λλ=(A,B,=(A,B,ππ) and a sequence of observations ) and a sequence of observations

O, find P(O| O, find P(O| λλ). In other words, we want to determine the ). In other words, we want to determine the likelihood of the observed sequence O, given the model.likelihood of the observed sequence O, given the model.

Problem 2Problem 2Given the model Given the model λλ=(A,B,=(A,B,ππ) and a sequence of observations ) and a sequence of observations

O, find an optimal state sequence for the underlying O, find an optimal state sequence for the underlying Markov process. In other words, we want to uncover the Markov process. In other words, we want to uncover the hidden part of the Hidden Markov Model.hidden part of the Hidden Markov Model.

Problem 3Problem 3Given an observation sequence O and dimensions N and Given an observation sequence O and dimensions N and

M, find the model M, find the model λλ=(A,B,=(A,B,ππ) that maximizes the ) that maximizes the probability of O. This can be viewd as training the model probability of O. This can be viewd as training the model to best fit the observed data.to best fit the observed data.

Page 14: Hidden Markov Modelling and Handwriting Recognition

1414

Solution to Problem 1Solution to Problem 1

Let Let λλ=(A,B,=(A,B,ππ) be a given model and let O=(O) be a given model and let O=(O00,O,O11,…,O,…,OT-1T-1) ne a series of ) ne a series of observations. We want to find P(O| observations. We want to find P(O| λλ). ).

Let X=(xLet X=(x00,x,x11,…,x,…,xT-1T-1) be a state sequence.Then by definion of B we have) be a state sequence.Then by definion of B we have

OxOxOxXOP Tbbb

T110

110

),|(

aaa xxxxxxxXP

TT

1221100,,,

)|(

and by the definition of π and A we have

|,||,

|,|

,|

XPXOPXOPhavewe

P

XOP

P

XP

XP

XOPXPXOP

andP

XOPXOPSince

Page 15: Hidden Markov Modelling and Handwriting Recognition

1515

XT

XX

OxxxOxxxOxx

XPXOPXOPOP

bababTTT

1,1,011211000

|,||,|

By summing over all possible state sequences we get

As the length of the state sequence and the observation sequence is T, we have NT terms in this sum, and we have T multiplications in a term, so the total number of multiplications is T×NT. Fortunately, there exists a much faster algorithm as well.

Page 16: Hidden Markov Modelling and Handwriting Recognition

1616

The Forward The Forward αα-pass Algorithm-pass Algorithm

αt(i) is the probability of the partial observation sequence up to time t, where qi is the state the underlying Markov process has at time t. Let α0(i)=πibi(O0) for i=0,1,…,N-1. For t=1,2,…,T-1 and i=0,1,…,N-1 compute

1

01

)()(N

jjitt aji

1

01

1

01110 |,,,,|

N

iT

N

iiTT iOOOPOP qx

We have to compute α T×N-times and there are N multiplications in each α, so this method needs T×N2 multiplications.

Page 17: Hidden Markov Modelling and Handwriting Recognition

1717

Solution to Problem 2Solution to Problem 2

Given the model Given the model λλ=(A,B,=(A,B,ππ) and a sequence of observations O, our ) and a sequence of observations O, our goal is to find the most likely state sequence, i.e. the one that goal is to find the most likely state sequence, i.e. the one that maximizes the expected number of correct states. First we maximizes the expected number of correct states. First we define the backward algorithm called define the backward algorithm called ββ-pass.-pass.

,|,,,

1,,1,01,,1,0

121 qxOOOPi

defineNiandTtFor

itTttt

ji

computeNiandTTtFor

NiforiLet

ttj

N

jijt

T

Oba

11

1

0

1

1,,1,00,,1,2

1,1,0,1

||

|,,,|,:

OP

ii

OP

OPOPi ttit

iti

qxqx

Page 18: Hidden Markov Modelling and Handwriting Recognition

1818

Example (1996):Example (1996): HMM-based Handwritten Symbol Recognition HMM-based Handwritten Symbol Recognition

InputInput: a sequence of strokes captured : a sequence of strokes captured during writing.A stroke is a sequence of during writing.A stroke is a sequence of (x,y)-coordinates correponding to pen (x,y)-coordinates correponding to pen positions. A stroke is writing from pen positions. A stroke is writing from pen down to pen up.down to pen up.

Slant correctionSlant correction: try to find a near-vertical : try to find a near-vertical part in each stroke and rotate it the whole part in each stroke and rotate it the whole stroke so that the part should be vertical.stroke so that the part should be vertical.

Page 19: Hidden Markov Modelling and Handwriting Recognition

1919

Normalization of StrokesNormalization of Strokes

Normalization: determine the x-length of each Normalization: determine the x-length of each stroke. Denote stroke. Denote t10t10 the threshold under which 10 % the threshold under which 10 % of strokes are with respect to x-length. Denote of strokes are with respect to x-length. Denote x90x90 the threshold above which 10 % of strokes are with the threshold above which 10 % of strokes are with respect to x-length. Then compute the average of x-respect to x-length. Then compute the average of x-lengths of all strokes that are between the two lengths of all strokes that are between the two thresholds, denote this by thresholds, denote this by x’.x’.

Perform the above operations with respect to y-Perform the above operations with respect to y-length two; compute length two; compute y’.y’.

Then normalize all strokes to Then normalize all strokes to x’x’ and and y’.y’.

Page 20: Hidden Markov Modelling and Handwriting Recognition

2020

The Online Temporal Feature VectorThe Online Temporal Feature Vector

Introduce a Introduce a hidden strokehidden stroke between the pen-up position of between the pen-up position of a stroke and the pen-down position of the next stroke a stroke and the pen-down position of the next stroke (we assume that the strokes are sequenced according to (we assume that the strokes are sequenced according to time).time).

The unified sequence of strokes and hidden strokes is The unified sequence of strokes and hidden strokes is resampled at equispaced points along the trajectory resampled at equispaced points along the trajectory retaining the temporal order. For each point we store: the retaining the temporal order. For each point we store: the local positionlocal position, the , the sine and cosine of the angle between sine and cosine of the angle between the x-axis and the vector connecting the current point the x-axis and the vector connecting the current point and the originand the origin, and the , and the fact that the point belomgs to a fact that the point belomgs to a stroke or a hidden strokestroke or a hidden stroke constitute a feature vector. constitute a feature vector.

Page 21: Hidden Markov Modelling and Handwriting Recognition

2121

HMM TopologyHMM Topology

For each symbol SFor each symbol Sii of the alphabet {S of the alphabet {S11,S,S22,,

…,S…,SKK} an HMM } an HMM λλii is generated. The HMM is generated. The HMM

is such that P(sis such that P(sjj|s|sii)=0 for states j<i or j>i+2.)=0 for states j<i or j>i+2.The question is: how can we generate an The question is: how can we generate an

HMM? The answer is given by the solution HMM? The answer is given by the solution to Problem 3.to Problem 3.

Page 22: Hidden Markov Modelling and Handwriting Recognition

2222

Solution of Problem 3Solution of Problem 3

Now we want to adjust the model parameters to best fit the Now we want to adjust the model parameters to best fit the observations. The sizes N (number of states) and M observations. The sizes N (number of states) and M (number of observations) are fixed but A, B and (number of observations) are fixed but A, B and ππ are are free, we only have to take care that they be row free, we only have to take care that they be row stochastic.stochastic.

For t=0,1,…,T-2 and i,j in {0,1,…,N-1} define the prob of For t=0,1,…,T-2 and i,j in {0,1,…,N-1} define the prob of being in state qbeing in state qii at t and transiting to state q at t and transiting to state qjj at t+1: at t+1:

|,|,:,

11

1 OP

jiOPji

ttjijt

jtitt

Obaqxqx

1

0

,N

jtt

jiihaveweThen

1

0

,N

jtt

jiihaveweThen

1

0

,N

jtt

jiihaveweThen

Page 23: Hidden Markov Modelling and Handwriting Recognition

2323

)(1,,1,00iletNiFor

i

2

0

2

0

,1,,1,01,,1,0 T

tt

T

tt

ij

i

jicomputeNjandNiFor a

2

0

2,,1,0

1,,1,01,,1,0 T

tt

kTt

t

j

j

O

j

kcomputeMkandNjFor tb

Page 24: Hidden Markov Modelling and Handwriting Recognition

2424

The IterationThe Iteration

1.1. First we initialize First we initialize λλ=(A,B,=(A,B,ππ) with a best guess, ) with a best guess, or choose random values such that or choose random values such that ππii≈1/N, a≈1/N, aijij

≈1/N, b≈1/N, bjj(k) ≈1/M, (k) ≈1/M, ππ,A and B must be row ,A and B must be row

stochastic.stochastic.

2.2. ComputeCompute

3.3. Estimate the model Estimate the model λλ=(A,B,=(A,B,ππ))

4.4. If P(O| If P(O| λλ) increases, GOTO 2.) increases, GOTO 2.(the increase may be measured by a (the increase may be measured by a threshold, or the maximum number of threshold, or the maximum number of iterations may be set)iterations may be set)

iandjiiitttt ,,,

Page 25: Hidden Markov Modelling and Handwriting Recognition

2525

Practical ConsiderationsPractical Considerations

Be aware of the fact that Be aware of the fact that ααtt(i) tends to 0 as (i) tends to 0 as

T increases. Therefore, realization of the T increases. Therefore, realization of the above formulas may lead to underflow.above formulas may lead to underflow.

Details, and pseudocodes, may be found Details, and pseudocodes, may be found here:here:http://www.cs.sjsu.edu/faculty/stamp/RUA/HMM.pdfhttp://www.cs.sjsu.edu/faculty/stamp/RUA/HMM.pdf

Page 26: Hidden Markov Modelling and Handwriting Recognition

2626

Another example (2004): Another example (2004): Writer Identification Using HMM RecognizersWriter Identification Using HMM Recognizers

Writer identificationWriter identification is the task of determining the is the task of determining the author of a sample handwriting from a set of writers.author of a sample handwriting from a set of writers.

Writer verificationWriter verification is the task of determining if a given is the task of determining if a given text has been written by a certain person.text has been written by a certain person.

If the text is predefined, it is text dependent If the text is predefined, it is text dependent verification, otherwise it is text independent verification, otherwise it is text independent verification.verification.

Writer verification may be done online or offline.Writer verification may be done online or offline. It is generally believed that text independent It is generally believed that text independent

verification is more difficult than the text dependent verification is more difficult than the text dependent one.one.

Page 27: Hidden Markov Modelling and Handwriting Recognition

2727

For each writer, an individual HMM-based handwriting For each writer, an individual HMM-based handwriting recognition system is trained using only data from that recognition system is trained using only data from that writer. Thus from writer. Thus from nn writers we get writers we get nn different HMM’s. different HMM’s.

Given an arbitrary line of text input, each HMM Given an arbitrary line of text input, each HMM recognizer outputs some recognition with a recognition recognizer outputs some recognition with a recognition score.score.

It is assumed thatIt is assumed that Correctly recognized words have a higher score than incorrectly Correctly recognized words have a higher score than incorrectly

recognized wordsrecognized words Recognition rate on input from a writer the system was trained Recognition rate on input from a writer the system was trained

on is higher than on input from other writerson is higher than on input from other writers The scores produced by the different HMM’s can be The scores produced by the different HMM’s can be

used to decide who has written the input text line.used to decide who has written the input text line.

Page 28: Hidden Markov Modelling and Handwriting Recognition

2828

After preprocessing (slant, skew, baseline location, After preprocessing (slant, skew, baseline location, height) a sliding window of one-pixel width is shifted from height) a sliding window of one-pixel width is shifted from left to rightleft to right

The features are: number of black pixels in the window, The features are: number of black pixels in the window, center of gravity, second order moment, position and center of gravity, second order moment, position and contour direction of the upper- and lowermost pixels, contour direction of the upper- and lowermost pixels, number of black-to-white transitions in the window, number of black-to-white transitions in the window, distance between the upper- and lowermost pixels.distance between the upper- and lowermost pixels.

Normalization may lead to the reduction of individuality, Normalization may lead to the reduction of individuality, on the other hand, it supports recognition which is on the other hand, it supports recognition which is important for the verification projectimportant for the verification project

For each upper- and lowercase character an individual For each upper- and lowercase character an individual HMM is built.HMM is built.

Page 29: Hidden Markov Modelling and Handwriting Recognition

2929

Related ConceptsRelated Concepts

The Viterbi algorithm is The Viterbi algorithm is a dynamic a dynamic programmingprogramming for finding the most for finding the most likely likely sequence of hidden states – called the sequence of hidden states – called the Viterbi pathViterbi path..

The Baum–Welch algorithmThe Baum–Welch algorithm is used to is used to find the unknown parameters of an HMM. find the unknown parameters of an HMM. It makes use of the forward-backward It makes use of the forward-backward algorithm used above. algorithm used above.

Page 30: Hidden Markov Modelling and Handwriting Recognition

3030

HMM-based Speech RecognitionHMM-based Speech Recognition

Modern general-purpose speech recognition Modern general-purpose speech recognition systems are generally based on Hidden Markov systems are generally based on Hidden Markov Models. Reason: Speech could be thought of as Models. Reason: Speech could be thought of as a a MarkovMarkov modelmodel..

For further reference consult Rabiner: A Tutorial For further reference consult Rabiner: A Tutorial on Hidden Markov Models and Selected on Hidden Markov Models and Selected Applications in Speech RecognitionApplications in Speech Recognition

http://www.caip.rutgers.edu/~lrr/Reprints/tutorial%20on%20hmm%2http://www.caip.rutgers.edu/~lrr/Reprints/tutorial%20on%20hmm%20and%20applications.pdf0and%20applications.pdf

Page 31: Hidden Markov Modelling and Handwriting Recognition

3131

Thank you for your attentionThank you for your attention!!