Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson...

Large Vocabulary Large Vocabulary Unconstrained Unconstrained Handwriting Handwriting RecognitionRecognition

J Subrahmonia

Pen Technologies

IBM T J Watson Research Center

Pen Technologies

Pen-based interfaces in mobile computing

Mathematical Formulation

H : Handwriting evidence on the basis of which a recognizer will make its decision– H = {h1, h2, h3, h4,…,hm}

W : Word string from a large vocabulary– W = {w1, w2, w3, w4,…., wn}

Recognizer :– )|( HWW p

Wargmax

Mathematical Formulation

)()|(

)(

)()|(

)|(

WWH

H

WWH

HWW

pp

p

pp

p

W

W

W

argmax

argmax

argmax

SOURCECHANNEL

Source Channel Model

WRITER DIGITIZER FEATURE EXTRACTOR

DECODER

H

W

CHANNEL

Source Channel Model

)()|(

)|(

WWH

HWW

pp

p

W

W

argmax

argmax

Handwriting Modeling : HMMs

LanguageModeling

SEARCH STRATEGY

Hidden Markov Models

Memoryless Model

Add Memory

Hide Something

Markov Model Mixture Model

Hide Something

Add Memory

Hidden Markov Model

Alan B Poritz : Hidden Markov Models : A Guided Tour ICASSP 1988

Memoryless ModelCOIN : Heads (1) : probability p Tails (0) : probability 1-p

Flip the coin 10 times (IID Random sequence)

Sequence : 1 0 1 0 0 0 1 1 1 1

Probability = p*(1-p)*p*(1-p)*(1-p)*(1-p)*p*p*p*p = p)-(1p

46

Add Memory – Markov Model2 Coins : COIN 1 => p(1) = 0.9, p(0) = 0.1 COIN 2 => p(1) = 0.1, p(0) = 0.9

Experiment :Flip COIN 1, Note the outcomeIf ( outcome = Head) Flip Coin 1Else Flip Coin 2End

Sequence 110 0 : Probability = 0.9*0.9*0.1*0.9Sequence 1010 : Probability = 0.9*0.1*0.1*0.1

State Sequence Representation

1 2

1 : 0.9

0 : 0.1

1 : 0.1

0 : 0.9

Observed Output Sequence Unique State Sequence

Hide the states => Hidden Markov Model

s1 s2

0.9

0.1

0.1

0.90.90.1

0.10.9

0.10.9

0.90.1

Why use Hidden Markov Models Instead of Non-hidden?

Hidden Markov Models can be smaller – less parameters to estimate

States may be truly hidden– Position of the hand– Positions of articulators

Summary of HMM Basics We are interested in assigning probabilities p(H)

to feature sequences Memoryless model

– This model has no memory of the past Markov noticed that is some sequences the future

depends on the past. He introduced the concept of a STATE – a equivalence class of the past that influences the future

Hide the states : HMM

n

i

pp1

)()( hiH

)|()|( 1 ispp hih11,...,hihi

),()( SHH ppS

Hidden Markov Models

Given a observed sequence H– Compute p(H) for decoding– Find the most likely state sequence for a

given Markov model (Viterbi algorithm)– Estimate the parameters of the Markov

source (training)

Compute p(H)

s1 s3

0.5

0.3

0.2

0.4p(a)p(b)

0.50.5

0.70.3

0.5

0.1

s20.30.7

0.80.2

Compute p(H) – contd.

Compute p(H) where H = a a b b Enumerate all ways of producing h1=a

s1 s1

s2

s2 s2

s2 s3

0.5x0.8

0.3x0.7

0.2

0.4x0.5

0.5x0.3

0.2

0.40

0.21

0.04

0.03

Compute p(H) – contd. Enumerate all ways of producing

h1=a h2=a

s1 s1

s2

s2 s2

s2 s3

0.5x0.8

0.3x0.7

0.2

0.4x0.5

0.5x0.3

0.2

s1

s2

s2 s2

s2 s3

0.5x0.8

0.3x0.70.2

0.4x0.5

0.5x0.3

0.2

s2

s3

0.4x0.5

0.5x0.3

Compute p(H)

Can save computation by combining paths

s1 s1

s2

s2

s2 s3

s1

s2

s2

s2 s3

s2

s3

Compute p(H)

Trellis Diagram

s1

s2

s3

0 a aa aab aabb

.5x.8 .5x.8 .5x.2 .5x.2

.4x.5 .4x.5 .4x.5 .4x.5

.3x.7 .3x.7 .3x.3 .3x.3

.5x.3 .5x.3 .5x.7 .5x.7

.2 .2 .2 .2 .2

.1 .1 .1 .1 .1

Basic Recursion Prob (Node) =

sum (Prob(predecessor) x Prob (predecessor->node) ) Boundary condition : Prob (s, 0) = 1

s1

s2

s3

0 a aa aab aabb

1.0 s1, a : 0.4

1.0 0.4 .16 .016 .0016

s1, a : 0.4 s1, a : 0.4 s1, a : 0.4

s1, 0 : .08s1, a : .21s2, a : .04

0.20.33 .182 .054 .01256

s1, 0 : 0.2s1, 0 : .032s1, a : .084s2, a : .066

s1, 0 : .0032s1, b : .0144s2, b : .0364

s1, 0 : .00032s1, b : .00144s2, b : .0108

s2, 0 : .033s1, a : .03

0.02 0.063 .0677 .0691 .020156

s2, 0 : 0.02s2, 0 : .0182s2, a : .0495

s2, 0 : .0054s2, b : .0637

s2, 0 : .001256s2, b : .0189

More Formally –Forward Algorithm

)|()(

)|()|()(

)(

1

ssPs

ssPssPs

s

st

st

t

ht

Find Most Likely Path for aabb- Dynamic Prog. or Viterbi

Max Prob (Node) =

MAX(Max(predecessor) x Prob (predecessor->node) )

s1

s2

s3

0 a aa aab aabb

1.0 s1, a : 0.4 s1, a : .16 s1, b : .016 s1,b : .0016

s1, 0 : .08s1, a : .21s2, a : .04

s1, 0 : 0.2s1, 0 : .032s1, a : .084s2, a : .066

s1, 0 : .0032s1, b : .0144s2, b : .0168

s1, 0 : .00032s1, b : .00144s2, b : .00336

s2, 0 : .021s1, a : .03

s2, 0 : 0.02s2, 0 : .0084s2, a : .0315

s2, 0 :.00168s2, b : .0294

s2, 0 : .000336s2, b : .00588

Training HMM parameters1/3

1/3

1/3

1/2

1/2

1/21/2

p(a)p(b) =H = abaa

.000385 .000578 .000868

.001302 .001157 .002604 .001736

p(H) = .008632

Training HMM parameters1t

2t

3t

4t

5t

ic = A posterior probability of path i = )(Hppi

1c 2c 3c 4c 5c 6c 7c.045 .067 .134 .100 .201 .150 .301

46.0)(

:

363.0)(

637.0)(

838.0223)(

1

64213

7532

543211

tp

New

cccctc

ccctc

ccccctc

34.0)( 2 tp 20.0)( 3 tp

Training HMM parameters

1t

2t

4t

5t

29.0),(

71.0),(

:

246.0),(

592.02),(

1

1

3211

543211

btp

atp

New

cccbtc

cccccatc


.71

.29.68.32

.64

.36

.60

.40

.34

.46

.20

.60

.40

1p 2p 3p 4p 5p 6p 7p

0.00108 0.00129 0.00404 0.00212 0.00537 0.00253 0.00791

008632.002438.0)( Hp

Keep on repeating : 600 iterations : p(H) = .037037037Another initial parameter set : p(H) = 0.0625


Converges to local maximum There are 7 (atleast) local maxima Final solution depends on starting point Speed of convergence depends on

starting point

Training HMM parameters : Forward Backward algorithm

Improves on enumerating algorithm by using the Trellis

Results in reduction from exponential computation to linear computation

Forward Backward Algorithm

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

asas

as

bs

j


= Probability that hj is produced by and the complete output is H

=

),( Htp ij

it)().()().(1 bjiiaj st|PtPs hj

)(1 aj s = Probability of being in state and producing the output h1, .. hj-1

as

)( bj s = Probability of being in state and producing the output hj+1,..hm

bs


Transition count

)(/),()|( HHH ptptC ji

)|()(

)|()|()(

)(

1

ssPs

ssPssPs

s

st

st

t

1ht

Training HMM parameters Guess initial values for all parameters Compute forward and backward pass

probabilities Compute counts Re-estimate probabilities

BAUM-WELCH, BAUM-EAGON, FORWARD-BACKWARD, E-M

Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson...

Documents

Transcript of Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson...