Hidden Markov Model and some applications in handwriting recognition

44
Hidden Markov Model and some applications in handwriting recognition

description

Hidden Markov Model and some applications in handwriting recognition. Sequential Data. Often arise through measurement of time series. - Rainfall measurements in Beer- Sheva . - Daily values of currency exchange rate. First Order Markov Model . We have stochastic process in time: - PowerPoint PPT Presentation

Transcript of Hidden Markov Model and some applications in handwriting recognition

Page 1: Hidden Markov Model and some applications in handwriting recognition

Hidden Markov Modeland some applications in handwriting recognition

Page 2: Hidden Markov Model and some applications in handwriting recognition

• Often arise through measurement of time series.- Rainfall measurements in Beer-Sheva.- Daily values of currency exchange rate.

Sequential Data

Page 3: Hidden Markov Model and some applications in handwriting recognition

• We have stochastic process in time:• The system has N states, S1,S2,…,SN, where the

state of the system at time step t is qt

• For simplicity of calculations we assume the state of the system in time t+1 depends only on the state of the system in time t.

First Order Markov Model

Page 4: Hidden Markov Model and some applications in handwriting recognition

Formal Definition for Markov Property:

P[qt = Sj | qt-1 = Si , qt-2 = Sk, ….] = P[qt = Sj | qt-1 = Si], 1 ≤ i,j ≤ N.

That is, the state in the next time step of a Markov chain depends only on the state in the current time. This is called Markov Property or memory-less property.

First Order Markov Model

Page 5: Hidden Markov Model and some applications in handwriting recognition

Formal Definition (Cont):The transitions in the Markov chain are independent of time, So we can write:P[qt = Si | qt-1 = Sj] = aij, 1 ≤ i,j ≤ N.

With the following conditions:1. aij ≥ 0.

2.

First Order Markov Model

1

1N

ij

j

a

Page 6: Hidden Markov Model and some applications in handwriting recognition

First Order Markov Model Example (Weather):

• Rain today 40% rain tomorrow

60% no rain tomorrow

• Not raining today 20% rain tomorrow

80% no rain tomorrow

Rain No rain

0.60.4 0.8

0.2

Stochastic Finite State Machine:

Page 7: Hidden Markov Model and some applications in handwriting recognition

First Order Markov Model Example (Weather continued):

• Rain today 40% rain tomorrow

60% no rain tomorrow

• Not raining today 20% rain tomorrow

80% no rain tomorrow

The transition matrix:

0.4 0.6{ }

0.2 0.8ijA a

Page 8: Hidden Markov Model and some applications in handwriting recognition

First Order Markov Model Example (Weather continued):

Question:

Given that day 1 is sunny, what is the probability that the weather for the next 3 days will be “sun-rain-rain-sun” ?

Answer:

We write the sequence of states as O = {S2,S1,S1,S2} and compute:

P(O| Model) = P{S2,S1,S1,S2 | Model} = P[S2]*P[S1|S2]*P[S1|S1]*S[S2|S1]

= π2*a21*a11*a12 = 1*0.2*0.4*0.6 = 0.048

Where πi = P[q1 = Si], 1 ≤ i ≤ N, that is π is the initial state probabilities.

Page 9: Hidden Markov Model and some applications in handwriting recognition

Example (Random Walk on Undirected Graphs):

We have an undirected graph G=(V,E), and a particle is placed at vertex vi with probability πi.

In the next time point, it moves to one of it’s neighbors with probability 1/d(i),

where d(i) is the degree of vi .

v1

v5

v4v7

v3

v2

v6

First Order Markov Model

Page 10: Hidden Markov Model and some applications in handwriting recognition

Example (Random Walk on Undirected Graphs):

We have an undirected graph G=(V,E), and a particle is placed at vertex vi with probability πi.

In the next time point, it moves to one of it’s neighbors with probability 1/d(i),

where d(i) is the degree of vi .

v1

v5

v4v7

v3

v2

v6

First Order Markov Model

Page 11: Hidden Markov Model and some applications in handwriting recognition

Example (Random Walk on Undirected Graphs):

It can be proven that for connected, not bipartite graphs, pi - the probability of

being in vertex vi, converges to d(vi)/2|E|.That is, the initial probability does not matter.

p1 =1/18

v1

v5

v4v7

v3

v2

v6

p3 =3/18

p4 =3/18 p7 =2/18

First Order Markov Model

In the next time point, it moves to one of it’s neighbors with probability 1/d(i),

where d(i) is the degree of vi .

We have an undirected graph G=(V,E), and a particle is placed at vertex vi with probability πi.

Page 12: Hidden Markov Model and some applications in handwriting recognition

Example (Random Walk, some applications):

First Order Markov Model

-In economic, the “random walk hypothesis" is used to model shares prices and other factors.

-In physics, random walks are used as simplified models of physical random movement of molecules in liquids and gases.

-In computer science, random walks are used to estimate the size of the Web (bar-yossef et al, 2006).

- In image segmentation, random walks are used to determine the labels (i.e., "object" or "background") to associate with each pixel. This algorithm is typically referred to as the random walker segmentation algorithm.

Random walk in two dimensions.

Page 13: Hidden Markov Model and some applications in handwriting recognition

Introducing Hidden Variables

S1 S2 SL-1 SNSi

O1 O2 OL-1 ONOi

Observed data

Hidden states

For each observation On, introduce a hidden variable Sn.Hidden variables form the Markov chain

Page 14: Hidden Markov Model and some applications in handwriting recognition

Hidden Markov ModelExample:

Let us consider Bob which lives in a foreign country, Bob posts in his blog on a daily basis, his activity. Which is one of the following

activities :

-Walking in the park (with probability 0.1, if it rains, and probability 0.6 otherwise) . -Shopping (with probability 0.4, if it rains, and probability 0.3 otherwise).

-Cleaning his apartment (with probability 0.5, if it rains, and probability 0.1 otherwise) .

The choice of what Bob does is determined exclusively by the weather on a given day .

The activities of Bob are the observations, while the weather is hidden from us. The entire system is that of a hidden Markov model (HMM).

Page 15: Hidden Markov Model and some applications in handwriting recognition

Start

SunnyRainy

CleanShopWalk

0.3 7.0

0.6

0.20.80.4

0.10.3

0.6

0.10.4 0.5

Hidden Markov ModelExample (cont):

Page 16: Hidden Markov Model and some applications in handwriting recognition

Elements of HMM- N States, S = {S1,S2,..,SN}, we denote the state at time t as qt.

- M distinct observation symbols per state, V = {v1,v2,..,vM}.

- The state transition probability distribution A = {aij}, where:

aij = P[qt = Si | qt-1 = Sj].

- The observation symbol probability distribution in state j, B = {bj(k)}, where:

bj(k) =P[vk at t|qt = Sj] 1 ≤ j ≤ N, 1≤ k ≤ M.

- Initial distribution π = {πi}, where πi = P[q1 = Si], 1 ≤ i ≤ N.

Page 17: Hidden Markov Model and some applications in handwriting recognition

Start

SunnyRainy

CleanShopWalk

Hidden Markov ModelExample (cont):

π1=0.3 π2=0.7

a11=0.4

a12=0.6

a12=0.2a22=0.8

S1 S2

v1 v2 v3

b1(1)=0.1 b1(2)=0.4b1(3)=0.5

b2(1)=0.6b2(2)=0.3

b2(1)=0.1

N=2M=3

Page 18: Hidden Markov Model and some applications in handwriting recognition

Problem 1: The Evaluation Problem

Given the observation sequence O=O1O2…OT, and a model λ = (A,B,π), how do we determine the probability that O symbols was generated by that model?

Problem 2: The Decoding Problem

Given the observation sequence O determine the most likely sequence of hidden states that led to the observations.

Problem 3: The Learning Problem

Given a coarse structure of the model (number of states and symbols) but not the probabilities aij and bjk.Determine these parameters.

The Three Basic Problems for HMMs.

Page 19: Hidden Markov Model and some applications in handwriting recognition

Probability that the model produces the observation sequence O = O1O2…OT:

Naïve Solution:

We denote by Q a fixed state sequence, Q = q1q2…qT.

We Sum over all possible states:

P(O | λ) = Σ P(O,Q | λ) * P(Q | λ)

Problem 1: The Evaluation problem

Hidden States

Observations

Problem:Too Expensive, the complexity is O(NT).

all Q

Outline of the solution:

We use a recursive algorithm that computes the value of the forward variable

αt (i) = P(O1O2…Ot, qt = Si | λ), based on the preceding time step in the algorithm, i.e., {αt -1(1), αt -1(2),…, αt -1(N)}.

Page 20: Hidden Markov Model and some applications in handwriting recognition

Problem 2: Decoding Problem

Given a sequence of observations O, the decoding problem is to find the most probable sequence of hidden states.

We want to find the “best” state sequence q1,q2,…,qT such that:

q1,q2,…,qT = argmax P[q1,q2,…,qT , O1O2…OT| λ]q1,q2,…,qT

Viterbi Algorithm:

A dynamic programming algorithm that computes the most “probable” sequence of steps up until time step t, using the most “probable” sequence up until time step t-1.

Page 21: Hidden Markov Model and some applications in handwriting recognition

Viterbi Algorithm

Consider Bob and the weather example from before.

The state transition matrix (TRANS) is:0 0.3 0.7

00 0.4 0.6

00 0.2 0.8

i

ija

Whereas the observation (EMIS) matrix is:0 0 0

00.1 0.4 0.50.6 0.2 0.1

ijb

The following command in Matlab:

[observations,states = ](10, ,hmmgenerate TRANS EMIS...,' Statenames',{'start','rain','sun'}...,

' Symbols',{'walk','shop','clean'}(

Generates a random sequence of length 10 of states and observation symbols .

Example (Matlab):

Page 22: Hidden Markov Model and some applications in handwriting recognition

• Result:

• 'sun' 'rain' 'sun' 'sun' 'sun' 'sun' 'sun' 'sun' 'rain’ 'sun’• 'clean' 'clean' 'walk' 'clean' 'walk' 'walk' 'clean' 'walk’ 'shop' 'walk’

Viterbi AlgorithmExample (Matlab,continued):

Observations

states

T= 1 2 3 4 5 6 7 8 9 10

Page 23: Hidden Markov Model and some applications in handwriting recognition

Viterbi AlgorithmThe Matlab function hmmviterbi uses the Viterbi algorithm to compute the most likely sequence of states the model would go through to generate a given sequence of observations:

[observations,states] = hmmgenerate(1000,TRANS,EMIS)

likelystates = hmmviterbi(observations, TRANS, EMIS);

To test the accuracy of hmmviterbi, compute the percentage of the actual sequence states that agrees with the sequence likelystates.

sum(states==likelystates)/1000 ans = 0.8030

In this case, the most likely sequence of states agrees with the random sequence 80% of the time.

Example (Matlab,continued):

Page 24: Hidden Markov Model and some applications in handwriting recognition

Problem 3: Learning ProblemGoal: To determine model parameters aij and bjk from an ensemble of training samples (observations).

Outline of Baum-Welch Algorithm:

- Start with rough estimates of aij and bjk.- Calculate improved estimate.- Repeat until sufficiently small change in the estimated values of the parameters.

Problem:The algorithm converges to a local maximum.

Page 25: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionTwo approaches:

1. Path-Discriminant HMM can model all possible words (fitted to large lexicons)• Each letter is associated with

a sub-HMM that is connected to all others.

• Viterbi Algorithm gives the most likely word

2. Model-DiscriminantSeparate HMMs are used to model each word (small lexicons)

• Evaluation problem gives probability of observations

•We choose the model with the highest probability.

… …aSub-HMM

Clique topology

ProbabilityComputation

ProbabilityComputation

ProbabilityComputation

SelectMaximum

HMM for word 1

HMM for word 2

HMM for word v

zSub-HMM

i’th letterSub-HMM

Page 26: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionPreliminaries (feature extraction):Question:

So far the symbols we have seen could be presented as scalars (sun = 1, rain =2), what are the symbols for a 2D image?

Answer:We extract from the image a vector of features, were each feature is a number ,

representing a measurable property of the image.

4 6 0 8 2 8 4 3 1

Page 27: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionPreliminaries (feature extraction, cont):Example for a feature:

The number of crossing between the skeleton of the letter and a line passing through a the center of mass of the letter.

(for this letter, the value is 3.)

Binarization

Skeletonization and c.o.m

computation

Page 28: Hidden Markov Model and some applications in handwriting recognition

HMM Word Recognition

Problem:Working on a small lexicon of a few hundred words could generate

thousands of symbols, is there a way to use (much)less symbols?

Answer:There are 2 popular algorithms, Vector Quantization or K means, that are used

to map a set of vectors into a finite [smaller] set of vectors (representatives) without losing too much information!

-Usually there is a distance function between the original set of vectors and their representatives. We wish to minimize the value of this function over each vector.

The representatives are called centroids or codebooks.

Preliminaries (Vector Quantization, K means, cont):

Page 29: Hidden Markov Model and some applications in handwriting recognition

HMM Word Recognition

An example of vector quantizer in 2D, with 34 centroids .

Each point in a cell is replaced by the corresponding Voronoi site.

The Distance function, is the Euclidian distance.

Voronoi Diagram

Preliminaries (Vector Quantization, K means, cont):

Page 30: Hidden Markov Model and some applications in handwriting recognition

HMM Word Recognition

Segmentation in the context of this lecture, is the splitting of the word image into segments that relate to characters.

Segmentation, pros and cons:

Example of Segmentation (cusp at bottom):

Pro:Segmentation based methods that use the path-discriminant approach,have great flexibility with respect to the size of the lexicon.

Con:Segmentation is hard and ambiguous. "To recognize a letter, one must know where it starts and where it ends, to isolate a letter, one must recognize it first“. K. M. Sayre (1973).

Page 31: Hidden Markov Model and some applications in handwriting recognition

HMM Word Recognition

Segmentation free methods:

-In a segmentation-free method, one should find the best interpretation possible for an observation sequence derived from the word image without performing a meaningful segmentation first.

-Segmentation free methods are usually used with model-discriminant model.

-HMMs that realize segmentation-free methods do not attach any meaning to specific transitions, with respect to character fractions.

Segmentation, pros and cons (cont):

Page 32: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionExample (Segmentation Free Recognition System):

- The model of Bunke et al (1995), uses a fixed lexicon.

-The observations are based on the edges of the skeleton graph of the word image.

Definition: The pixels of the skeleton of a word are considered part of an edge if they have exactly two neighbors, they are considered nodes otherwise.

Four reference lines are also extracted: The lower line, lower baseline, upper baseline and upper line.

Page 33: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionExample (Segmentation Free Recognition System, cont):

Example of edges in the word lazy, pixels which belong to the same edge, are marked with the same letter.

Lower Line

Lower Baseline

Upper Baseline

Upper Line

Page 34: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionExample (Segmentation Free Recognition System, cont):

Feature Extraction:

-The authors extract 10 features for each edge.

- The first 4 feature, f1,..,f4, are based on the relation between the edge and the baselines, e.g., f1 defines the percentage of pixels lying between the upper line and upper baseline.

- The other features, are related to the edges themselves. For example f7 is defined as the percentage of pixels lying above the top endpoint.

top end point of the Edge “E”

end point of the Edge “E” f7= 21/23

Page 35: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionExample (Segmentation Free Recognition System, cont):

Model:

-The model-discriminant is used (lexicon size = 150 words).

- The vector quantization algorithm produced 28 codebooks.

-The number of states for each letter in the alphabet is set to the minimum number of edges that can be expected for a letter.

- The initial values of (A,B, π) are set to some fixed probabilities, and were improved using the Baum-Welch algorithm. For each word in the model there were approximately 60 words for training.

- Recognition rate is reported to be 98%.

Page 36: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionExample (Segmentation Free Recognition System, cont):

Word Model Example:

Page 37: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionSegmentation Based Algorithm (Kundu et at, 1988):

The authors assume we can segment each letter (problematic assumption).The path-discriminant model is used, where each state corresponds to a letter.

-To compute the initial and transition probability the authors used a statistical study made on the English language.

-To compute the symbol probability, as training set of 2500 words was used.

-The vector quantization algorithm produced 90 codebooks

Page 38: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionSegmentation Based Algorithm (Kundu et at, 1988, cont):Feature Extraction:From each letter the authors extract 15 features.

Example for features:

fzh = horizontal zero crossings

A horizontal line passing through the center of gravity is calculated.fzh is assigned a value of the number of crossing between the letter and the line.

fx = number of x jointsIn the thinned image, if the central pixel of the 3x3 window is black, and 4 (or more) of the neighboring pixels are black too.

Page 39: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionExample (Cont):

Model Overview.

Page 40: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionExample (Raid Saabni et al., 2010):

Keyword Searching for Arabic Handwritten Documents:The authors use model-discriminant method.

Arabic is written in a cursive style from right to left.The authors denote connected letters in a word, as word-part.

Example:The following word contains 7 letters, but only 3 word-parts .

The authors use the word-parts as the basic building block for their recognition system.

Page 41: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionExample (Raid Saabni, 2010):

In Arabic, the word-parts could be divided into 2 main components.The main component to denote the continuous body of a word-part,and secondary component to refer to an additional stroke(s).

Example of a word-parts with different numbers of additional strokes.In the scope of this lecture, we show only the recognition of the main component.

Page 42: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionExample (Raid Saabni, 2010):

Feature Extraction:

The pixels on a component's contour form a 2D polygon. The authors simplify the contour polygon to a smaller number of representative vertices.

Later on, the simplified polygon is refined by adding k vertices from the original polygon, which are distributed nearly uniformly between each two consecutive vertices. The point sequence P = [p1, p2, …, pn] includes all the vertices on the refined polygon.

The authors extract 2 features:

1. The angle between 2 consecutive vectors, (pi-1,pi) and (pi,pi+1).2. The angle between the vectors (pi,pi+1) and (pj,pj+1), where pj and pj+1 are

consecutive vertices in the simplified polygon, and pi is a vertex inserted between them by the refining process.

pi-1

pipi+1

pj

pj+1

pi

Page 43: Hidden Markov Model and some applications in handwriting recognition

HMM Word RecognitionExample (Raid Saabni, 2010):

Matching:

The authors have manually extracted different occurrences of word-parts from the searched document, which are used to train HMMs.

The search for a keyword is performed by searching for its word-parts, which are later combined into words (the keywords). For each processed word-part an observation sequence is generated and fed to the trained HMM system to determine its proximity to each of the keyword's word-parts.

Page 44: Hidden Markov Model and some applications in handwriting recognition

References

- A tutorial on hidden Markov models and selected applications in speech recognition

Rabiner (1989).

- Recognition of Handwritten Word: First and Second Order Hidden Markov Model Based Approach

Amlan Kundu, Yang He and PAramvir Bahl (1988)

- Off-Line Cursive Handwriting Rrecognition Using Hidden Markov Models H. Bunke, M. Roth and E. G. Schukat-Talamazzini (1995)

-Offline cursive script word recognition – a survey Tal Steinherz, Ehud Rivlin, Nathan Intrator (1999)

-A presentation on “Sequential Data and Hidden Markov”, taken from the course Introduction to Pattern Recognition by Sargur Srihari.

-Keyword Searching for Arabic Handwritten Documents -Raid Saabni, Jihad El-Sana, 2010.