A Tutorial on Hidden Markov Models.pdf
-
Upload
nicky-muscat -
Category
Documents
-
view
248 -
download
0
Transcript of A Tutorial on Hidden Markov Models.pdf
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
1/22
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
A Tutorial on Hidden Markov Modelsby Lawrence R. Rabiner
in Readings in speech recognition (1990)
Marcin Marsza lek
Visual Geometry Group
16 February 2009
Marcin Marsza lek A Tutorial on Hidden Markov Models
Figure: Andrey Markov
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
2/22
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
Signals and signal models
Real-world processes produce signals, i.e., observable outputsdiscrete (from a codebook) vs continousstationary (with const. statistical properties) vs nonstationarypure vs corrupted (by noise)
Signal models provide basis for
signal analysis, e.g., simulationsignal processing, e.g., noise removalsignal recognition, e.g., identification
Signal models can be
deterministic – exploit some known properties of a signal
statistical – characterize statistical properties of a signal
Statistical signal models
Gaussian processesPoisson processes
Markov processesHidden Markov processes
Marcin Marsza lek A Tutorial on Hidden Markov Models
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
3/22
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
Signals and signal models
Real-world processes produce signals, i.e., observable outputsdiscrete (from a codebook) vs continousstationary (with const. statistical properties) vs nonstationarypure vs corrupted (by noise)
Signal models provide basis for
signal analysis, e.g., simulationsignal processing, e.g., noise removalsignal recognition, e.g., identification
Signal models can be
deterministic – exploit some known properties of a signal
statistical – characterize statistical properties of a signal
Statistical signal models
Gaussian processesPoisson processes
Markov processesHidden Markov processes
Marcin Marsza lek A Tutorial on Hidden Markov Models
Assumption
Signal can be well characterized as a parametric randomprocess, and the parameters of the stochastic process can bedetermined in a precise, well-defined manner
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
4/22
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
Discrete (observable) Markov model
Figure: A Markov chain with 5 states and selected transitions
N states: S 1,S 2, ..., S N
In each time instant t = 1, 2, ...,T a system changes(makes a transition) to state q t
Marcin Marsza lek A Tutorial on Hidden Markov Models
I d i F d B k d P d Vi bi Al i h B W l h R i i E i
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
5/22
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
Discrete (observable) Markov model
For a special case of a first order Markov chain
P (q t = S j |q t −1 = S i , t t −2 = S k ,...) = P (q t = S j |q t −1 = S i )
Furthermore we only assume processes where right-hand side
is time independent – const. state transition probabilities
aij = P (q t = S j |q t −1 = S j ) 1 ≤ i , j ≤ N
where
aij ≥ 0N
j =1
aij = 1
Marcin Marsza lek A Tutorial on Hidden Markov Models
I t d ti F d B k d P d Vit bi Al ith B W l h R ti ti E t i
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
6/22
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
Discrete hidden Markov model (DHMM)
Figure: Discrete HMM with 3 states and 4 possible outputs
An observation is a probabilistic function of a state, i.e.,HMM is a doubly embedded stochastic process
A DHMM is characterized byN states S j and M distinct observations v k (alphabet size)State transition probability distribution AObservation symbol probability distribution B Initial state distribution π
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward Backward Procedure Viterbi Algorithm Baum Welch Reestimation Extensions
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
7/22
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
Discrete hidden Markov model (DHMM)
We define the DHMM as λ = (A,B , π)A = {aij } aij = P (q t +1 = S j |q t = S i ) 1 ≤ i , j ≤ N B = {b ik } b ik = P (O t = v k |q t = S i ) 1 ≤ i ≤ N
1 ≤ k ≤ M π = {πi } πi = P (q 1 = S i ) 1 ≤ i ≤ N
This allows to generate an observation seq. O = O 1O 2...O T 1 Set t = 1, choose an initial state q 1 = S i according to the
initial state distribution π2 Choose O t = v k according to the symbol probability
distribution in state S i , i.e., b ik 3 Transit to a new state q
t +1 = S
j according
to the state transition probability distibutionfor state S i , i.e., aij
4 Set t = t + 1,if t
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
8/22
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
Three basic problems for HMMs
Evaluation Given the observation sequence O = O 1O 2...O T anda model λ = (A,B , π), how do we efficientlycompute P (O |λ), i.e., the probability of theobservation sequence given the model
Recognition Given the observation sequence O = O 1O 2...O T anda model λ = (A,B , π), how do we choose acorresponding state sequence Q = q 1q 2...q T which isoptimal in some sense, i.e., best explains theobservations
Training Given the observation sequence O = O 1O 2...O T , howdo we adjust the model parameters λ = (A,B , π) tomaximize P (O |λ)
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
9/22
Introduction Forward Backward Procedure Viterbi Algorithm Baum Welch Reestimation Extensions
Brute force solution to the evaluation problem
We need P (O |λ), i.e., the probability of the observationsequence O = O 1O 2...O T given the model λ
So we can enumerate every possible state sequenceQ = q 1q 2...q T
For a sample sequence Q
P (O |Q , λ) =T
t =1
P (O t |q t , λ) =T
t =1
b q t O t
The probability of such a state sequence Q is
P (Q |λ) = P (q 1)T
t =2
P (q t |q t −1) = πq 1
T
t =2
aq t −1q t
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
10/22
Introduction Forward Backward Procedure Viterbi Algorithm Baum Welch Reestimation Extensions
Brute force solution to the evaluation problem
Therefore the joint probability
P (O ,Q |λ) = P (Q |λ)P (O |Q , λ) = πq 1
T
t =2
aq t −1q t
T
t =1
b q t O t
By considering all possible state sequences
P (O |λ) =
Q
πq 1b q 1O 1
T
t =2
aq t −1q t b q t O t
Problem: order of 2TN T calculations
N T possible state sequencesabout 2T calculations for each sequence
Marcin Marsza lek A Tutorial on Hidden Markov Models
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
11/22
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
12/22
g
Forward procedure
Figure: Operations forcomputing the forwardvariable α j (t + 1)
Figure: Computing α j (t )in terms of a lattice
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
http://find/http://goback/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
13/22
Backward procedure
Figure: Operations for
computing the backwardvariable β i (t )
We define a backwardvariable β i (t ) as theprobability of the partialobservation seq. after time t , given state S i at time t
β i (t ) = P (O t +1Ot + 2...O T |q t = S i , λ)
This can be computed inductively as well
β i (T ) = 1 1 ≤ i ≤ N
β i (t − 1) =N
j =1
aij b jO t β j (t ) 2 ≤ t ≤ T
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
http://find/http://goback/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
14/22
Uncovering the hidden state sequence
Unlike for evaluation, there is no single “optimal” sequence
Choose states which are individually most likely(maximizes the number of correct states)Find the single best state sequence(guarantees that the uncovered sequence is valid)
The first choice means finding argmaxi γ i (t ) for each t , where
γ i (t ) = P (q t = S i |O , λ)
In terms of forward and backward variables
γ i (t ) = P (O 1...O t , q t = S i |λ)P (O t +1...O T |q t = S i , λ)
P (O |λ)
γ i (t ) = αi (t )β i (t )N j =1 α j (t )β j (t )
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
15/22
Viterbi algorithm
Finding the best single sequence means computingargmaxQ P (Q |O , λ), equivalent to argmaxQ P (Q ,O |λ)
The Viterbi algorithm (dynamic programming) defines δ j (t ),i.e., the highest probability of a single path of length t whichaccounts for the observations and ends in state S j
δ j (t ) = maxq 1,q 2,...,q t −1
P (q 1q 2...q t = j ,O 1O 2...O t |λ)
By induction
δ j (1) = π j b jO 1 1 ≤ j ≤ N δ j (t + 1) =
maxi
δ i (t )aij b jO t +1 1 ≤ t ≤ T − 1
With backtracking (keeping the maximizing argument for eacht and j ) we find the optimal solution
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
16/22
Backtracking
Figure: Illustration of the backtracking procedure c G.W. Pulford
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
17/22
Estimation of HMM parameters
There is no known way to analytically solve for the modelwhich maximizes the probability of the observation sequence
We can choose λ = (A,B , π) which locally maximizes P (O |λ)gradient techniquesBaum-Welch reestimation (equivalent to EM)
We need to define ξ ij (t ), i.e., the probability of being in stateS i at time t and in state S j at time t + 1
ξ ij (t ) = P (q t = S i , q t +1 = S j |O , λ)
ξ ij (t ) = αi (t )aij b jO
t +1
β j (t + 1)
P (O |λ) =
= αi (t )aij b jO t +1β j (t + 1)N i =1
N j =1 αi (t )aij b jO t +1β j (t + 1)
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
18/22
Estimation of HMM parameters
Figure: Operations for
computing the ξ ij (t )
Recall that γ i (t ) is a probability of state S i at time t , hence
γ i (t ) =N
j =1
ξ ij (t )
Now if we sum over the time index t T −1t =1 γ i (t ) = expected number of times that S i is visited
∗
= expected number of transitions from state S i T −1t =1 ξ ij (t ) = expected number of transitions from S i to S j
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
http://find/http://goback/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
19/22
Baum-Welch Reestimation
Reestimation formulas
π̄i = γ i (1) āij =
T −1t =1 ξ ij (t )T −1t =1 γ i (t )
b̄ jk =
O t =v k
γ j (t )T
t =1 γ j (t )
Baum et al. proved that if current model is λ = (A,B , π) andwe use the above to compute λ̄ = (Ā, B̄ , π̄) then either
λ̄ = λ – we are in a critical point of the likelihood functionP (O |λ̄) > P (O |λ) – model λ̄ is more likely
If we iteratively reestimate the parameters we obtain amaximum likelihood estimate of the HMM
Unfortunately this finds a local maximum and the surface canbe very complex
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
20/22
Non-ergodic HMMs
Until now we have only considered
ergodic (fully connected) HMMsevery state can be reached fromany state in a finite number of steps
Figure: Ergodic HMM
Left-right (Bakis) model good for speech recognition
as time increases the state index increases or stays the samecan be extended to parallel left-right models
Figure: Left-right HMM
Figure: Parallel HMM
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
21/22
Gaussian HMM (GMMM)
HMMs can be used with continous observation densitiesWe can model such densities with Gaussian mixtures
b j O =M
m=1
c jm N (O,µ jm,U jm)
Then the reestimation formulas are still simple
Marcin Marsza lek A Tutorial on Hidden Markov Models
Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions
http://find/
-
8/17/2019 A Tutorial on Hidden Markov Models.pdf
22/22
More fun
Autoregressive HMMs
State Duration Density HMMs
Discriminatively trained HMMs
maximum mutual informationinstead of maximum likelihood
HMMs in a similarity measure
Conditional Random Fields canloosely be understood as ageneralization of an HMMs
Figure: Random Oxford
fields c R. Tourtelot
constant transition probabilities replaced with arbitraryfunctions that vary across the positions in the sequence of hidden states
Marcin Marsza lek A Tutorial on Hidden Markov Models
http://find/