Hidden Markov Models David Meir Blei November 1, 1999.

Hidden Markov Models

David Meir BleiNovember 1, 1999

What is an HMM?

• Graphical Model• Circles indicate states• Arrows indicate probabilistic dependencies

between states

What is an HMM?

• Green circles are hidden states• Dependent only on the previous state• “The past is independent of the future given the

present.”

What is an HMM?

• Purple nodes are observed states• Dependent only on their corresponding hidden

state

HMM Formalism

• {S, K, • S : {s1…sN } are the values for the hidden states

• K : {k1…kM } are the values for the observations

SSS

KKK

S

K

S

K

HMM Formalism

• {S, K, • are the initial state probabilities

• A = {aij} are the state transition probabilities

• B = {bik} are the observation state probabilities

A

B

AAA

BB

SSS

KKK

S

K

S

K

Inference in an HMM

• Compute the probability of a given observation sequence

• Given an observation sequence, compute the most likely hidden state sequence

• Given an observation sequence and set of possible models, which model most closely fits the data?

)|( Compute

),,( ,)...( 1

OP

BAooO T

oTo1 otot-1 ot+1

Given an observation sequence and a model, compute the probability of the observation sequence

Decoding

Decoding

TT oxoxox bbbXOP ...),|(2211

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Decoding


TT xxxxxxx aaaXP132211

...)|(

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Decoding

)|(),|()|,( XPXOPXOP



...)|(

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Decoding

)|(),|()|,( XPXOPXOP



...)|(

X

XPXOPOP )|(),|()|(

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

111

1

111

1

1}...{

)|(

tttt

T

oxxx

T

txxoxx babOP

Decoding

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

)|,...()( 1 ixooPt tti

Forward Procedure

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

• Special structure gives us an efficient solution using dynamic programming.

• Intuition: Probability of the first t observations is the same for all possible t+1 length state sequences.

• Define:

)|(),...(

)()|()|...(

)()|...(

),...(

1111

11111

1111

111

jxoPjxooP

jxPjxoPjxooP

jxPjxooP

jxooP

tttt

ttttt

ttt

tt

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Forward Procedure

)1( tj

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Forward Procedure

)1( tj

)|(),...(

)()|()|...(

)()|...(

),...(

1111

11111

1111

111

jxoPjxooP

jxPjxoPjxooP

jxPjxooP

jxooP

tttt

ttttt

ttt

tt

Nijoiji

ttttNi

tt

tttNi

ttt

ttNi

ttt

tbat

jxoPixjxPixooP

jxoPixPixjxooP

jxoPjxixooP

...1

111...1

1

11...1

11

11...1

11

1)(

)|()|(),...(

)|()()|,...(

)|(),,...(

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Forward Procedure

)|...()( ixooPt tTti

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Backward Procedure

1)1( Ti

Nj

jioiji tbatt

...1

)1()(

Probability of the rest of the states given the first state

oTo1 otot-1 ot+1

x1 xt+1 xTxtxt-1

Decoding Solution

N

ii TOP

1

)()|(

N

iiiOP

1

)1()|(

)()()|(1

ttOP i

N

ii

Forward Procedure

Backward Procedure

Combination

oTo1 otot-1 ot+1

Best State Sequence

• Find the state sequence that best explains the observations

• Viterbi algorithm

• )|(maxarg OXPX

oTo1 otot-1 ot+1

Viterbi Algorithm

),,...,...(max)( 1111... 11

ttttxx

j ojxooxxPtt

The state sequence which maximizes the probability of seeing the observations to time t-1, landing in state j, and seeing the observation at time t

x1 xt-1 j

oTo1 otot-1 ot+1

Viterbi Algorithm

),,...,...(max)( 1111... 11

ttttxx

j ojxooxxPtt

1)(max)1(

tjoijii

j batt

1)(maxarg)1(

tjoijii

j batt Recursive Computation

x1 xt-1 xt xt+1

oTo1 otot-1 ot+1

Viterbi Algorithm

)(maxargˆ TX ii

T

)1(ˆ1

^

tXtX

t

)(maxarg)ˆ( TXP ii

Compute the most likely state sequence by working backwards

x1 xt-1 xt xt+1 xT

oTo1 otot-1 ot+1

Parameter Estimation

• Given an observation sequence, find the model that is most likely to produce that sequence.

• No analytic method• Given a model and observation sequence, update

the model parameters to better fit the observations.

A

B

AAA

BBB B

oTo1 otot-1 ot+1


A

B

AAA

BBB B

Nmmm

jjoijit tt

tbatjip t

...1

)()(

)1()(),( 1

Probability of traversing an arc

Nj

ti jipt...1

),()( Probability of being in state i

oTo1 otot-1 ot+1


A

B

AAA

BBB B

)1(ˆ i i

Now we can compute the new estimates of the model parameters.

T

t i

T

t tij

t

jipa

1

1

)(

),(ˆ

T

t i

kot t

ikt

ib t

1

}:{

)(

)(ˆ

HMM Applications

• Generating parameters for n-gram models• Tagging speech• Speech recognition

oTo1 otot-1 ot+1

The Most Important Thing

A

B

AAA

BBB B

We can use the special structure of this model to do a lot of neat math and solve problems that are otherwise not solvable.

Hidden Markov Models David Meir Blei November 1, 1999.

Documents

Transcript of Hidden Markov Models David Meir Blei November 1, 1999.