Hidden Markov Models Outline - people.cs.pitt.edupakdaman/tutorials/Hmm_tutorial.pdf · Hidden...
Transcript of Hidden Markov Models Outline - people.cs.pitt.edupakdaman/tutorials/Hmm_tutorial.pdf · Hidden...
11/1/2007
1
Hidden Markov Models &Its application in Bioinformatics
Bioinformatics Group,Electrical and computer Department
University of Tehran, 2008
By: Mahdi Pakdaman
2Pakdaman@ gmail.com 1387/7/291387/7/29
Markov modelsHidden Markov models
DefinitionThree basic problems
Forward/Backward algorithmViterbi algorithmBaum-Welch estimation algorithmIssuesApplications in Bioinformatics
Hidden Markov ModelsHidden Markov ModelsOutlineOutline
11/1/2007
2
3Pakdaman@ gmail.com 1387/7/291387/7/29
Markov Models Markov Models
4Pakdaman@ gmail.com 1387/7/291387/7/29
Markov Model ExampleMarkov Model Example
Weather model: 3 states {rainy, cloudy, sunny}
Problem: Forecast weather state, based on the current weather state
11/1/2007
3
5Pakdaman@ gmail.com 1387/7/291387/7/29
Observable states: S1 ,S2 ,…,SN Below we will designate them simply as
1,2,…,N
Actual state at time t is qt.
q1 ,q2 ,…,qt ,…,qT
First order Markov assumption:
Stationary condition:
Observable Markov Models Observable Markov Models
1 2 1( | , ,...) ( | )t t t t tP q j q i q k P q j q i− − −= = = = = =
1 1( | ) ( | )t t t l t lP q j q i P q j q i− + + −= = = = =
6Pakdaman@ gmail.com 1387/7/291387/7/29
State transition matrix A:
where
Constraints on :
Markov ModelsMarkov Models
11 12 1 1
21 22 2 2
1 2
1 2
... ...
... ...
... ...
... ...
j N
j N
i i ij iN
N N Nj NN
a a a aa a a a
a a a a
a a a a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥
= ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
AM M M M M M
M M M M M M
1( | ) 1 ,ij t ta P q j q i i j N−= = = ≤ ≤
ija
1
0, ,
1,
ij
N
ijj
a i j
a i=
≥ ∀
= ∀∑
11/1/2007
4
7Pakdaman@ gmail.com 1387/7/291387/7/29
States:1. Rainy (R)2. Cloudy (C)3. Sunny (S)
State transition probability matrix:
Compute the probability of observing SSRRSCS given that today is S.
Markov Models: ExampleMarkov Models: Example
0.4 0.3 0.3⎡ ⎤⎢ ⎥= 0.2 0.6 0.2⎢ ⎥⎢ ⎥0.1 0.1 0.8⎣ ⎦
A
8Pakdaman@ gmail.com 1387/7/291387/7/29
Basic conditional probability rule:
The Markov chain rule:
Markov Models: ExampleMarkov Models: Example
( , ) ( | ) ( )P A B P A B P B=
1 2
1 2 1 1 2 1
1 1 2 1
1 1 2 1 2 2
1 1 2 2 1 1
( , ,..., )( | , ,..., ) ( , ,..., )( | ) ( , ,..., )( | ) ( | ) ( , ,..., )( | ) ( | )... ( | ) ( )
T
T T T
T T T
T T T T T
T T T T
P q q qP q q q q P q q qP q q P q q qP q q P q q P q q qP q q P q q P q q P q
− −
− −
− − − −
− − −
= = = =
11/1/2007
5
9Pakdaman@ gmail.com 1387/7/291387/7/29
Observation sequence O:
Using the chain rule we get:
Markov Models: ExampleMarkov Models: Example
33 33 31 11 13 32 23
2 4
( | model)( , , , , , , , | model)( ) ( | ) ( | ) ( | ) ( | )
( | ) ( | ) ( | )
(1)(0.8) (0.1)(0.4)(0.3)(0.1)(0.2) 1.536 10
P OP S S S R R S C SP S P S S P S S P R S P R R
P S R P C S P S Ca a a a a a aπ
−
= = × =
= = ×
( , , , , , , , )O S S S R R S C S=
10Pakdaman@ gmail.com 1387/7/291387/7/29
What is the probability that the sequence remains in state ifor exactly d time units?
Exponential Markov chain duration density.
What is the expected value of the duration d in state i?
Markov Models: ExampleMarkov Models: Example
1 2 1
1
( ) ( , ,..., , ,...)
( ) (1 )i d d
di ii ii
p d P q i q i q i q i
a aπ+
−
= = = = ≠
= −
1
1
1
( )
( ) (1 )
i id
dii ii
d
d dp d
d a a
∞
=
∞−
=
=
= −
∑
∑
11/1/2007
6
11Pakdaman@ gmail.com 1387/7/291387/7/29
1
1
1
(1 ) ( )
(1 ) ( )
(1 )1
11
dii ii
d
dii ii
dii
iiii
ii ii
ii
a d a
a aa
aaa a
a
∞−
=
∞
=
= −
∂ = −
∂
⎛ ⎞∂ = − ⎜ ⎟∂ −⎝ ⎠
=−
∑
∑
Markov Models: ExampleMarkov Models: Example
12Pakdaman@ gmail.com 1387/7/291387/7/29
Markov Models: ExampleMarkov Models: Example
Avg. number of consecutive sunny days =
Avg. number of consecutive cloudy days = 2.5
Avg. number of consecutive rainy days = 1.67
33
1 1 51 1 0.8a
= =− −
11/1/2007
7
13Pakdaman@ gmail.com 1387/7/291387/7/29
States are not observable
Observations are probabilistic functions of state
State transitions are still probabilistic
Hidden Markov ModelsHidden Markov Models
14Pakdaman@ gmail.com 1387/7/291387/7/29
HMM: an exampleHMM: an example
Weather model: 3 “hidden” states
{rainy, cloudy, sunny}
Measure weather-related variables (e.g. temperature, humidity, barometric
pressure)Problem:
Forecast the weather state, given the current weather variables.
11/1/2007
8
15Pakdaman@ gmail.com 1387/7/291387/7/29
N urns containing colored balls
M distinct colors of balls
Each urn has a (possibly) different distribution of colors
Sequence generation algorithm:1. Pick initial urn according to some random process.
2. Randomly pick a ball from the urn and then replace it
3. Select another urn according a random selection process associated with the urn
4. Repeat steps 2 and 3
Urn and Ball ModelUrn and Ball Model
16Pakdaman@ gmail.com 1387/7/291387/7/29
N – the number of hidden statesQ – set of states Q={1,2,…,N}M – the number of symbolsV – set of symbols V ={1,2,…,M}A – the state-transition probability matrix
B – Observation probability distribution:
π - the initial state distribution:
λ – the entire model
Elements of Hidden Elements of Hidden Markov ModelsMarkov Models
, 1( | ) 1 ,i j t ta P q j q i i j N+= = = ≤ ≤
( ) ( | ) 1 ,j t k tB k P o v q j i j M= = = ≤ ≤
1( ) 1i P q i i Nπ = = ≤ ≤
( , , )A Bλ π=
11/1/2007
9
17Pakdaman@ gmail.com 1387/7/291387/7/29
1. EVALUATION – given observation O=(o1 , o2 ,…,oT )and model , efficiently compute
Hidden states complicate the evaluation
Given two models λ1 and λ1, this can be used to choose the better one.
2. DECODING - given observation O=(o1 , o2 ,…,oT ) and model λ find the optimal state sequence q=(q1 , q2 ,…,qT ) .
Optimality criterion has to be decided (e.g. maximum likelihood)
3. LEARNING – given O=(o1 , o2 ,…,oT ), estimate model parameters that maximize
Three Basic ProblemsThree Basic Problems
( , , )A Bλ π= ( | ).P O λ
( , , )A Bλ π= ( | ).P O λ
18Pakdaman@ gmail.com 1387/7/291387/7/29
Problem: Compute P(o1 , o2 ,…,oT |λ).
Algorithm:Let q=(q1 , q2 ,…,qT ) be a state sequence.
Assume the observations are independent:
Probability of a particular state sequence is:
Also,
Solution to Problem 1Solution to Problem 1
1
1 1 2 2
( | , ) ( | , )
( ) ( )... ( )
T
t ti
q q qT T
P O q P o q
b o b o b o
λ λ=
=
=
∏
1 1 2 2 3 1( | ) ...q q q q q qT qTP q a a aλ π −==
( , | ) ( | , ) ( | )P O q P O q P qλ λ λ=
11/1/2007
10
19Pakdaman@ gmail.com 1387/7/291387/7/29
Enumerate paths and sum probabilities:
N T state sequences and O(T) calculations.
Complexity: O(T N T) calculations.
Solution to Problem 1Solution to Problem 1
( | ) ( | , ) ( | )q
P O P O q P qλ λ λ= ∑
20Pakdaman@ gmail.com 1387/7/291387/7/29
Forward Procedure: Forward Procedure: IntuitionIntuition
ST
AT
ES
N
3
2
1
v
aNk
t t+1
TIME
a3k
a1k
k
11/1/2007
11
21Pakdaman@ gmail.com 1387/7/291387/7/29
Define forward variable as:
is the probability of observing the partial sequence
such that the state qt is i. Induction:
1. Initialization:
2. Induction:
3. Termination:
4. Complexity:
Forward AlgorithmForward Algorithm
( )t iα1 1( ) ( , ,..., , | )t t ti P o o o q iα λ= =
1 1( , ,..., )to o o
1 1( ) ( )i ii b oα π=
1 11
( ) ( ) ( )N
t t ij j ti
j i a b oα α+ +=
⎡ ⎤= ⎢ ⎥
⎣ ⎦∑
1( | ) ( )
N
Ti
P O iλ α=
= ∑2( )O N T
( )t iα
22Pakdaman@ gmail.com 1387/7/291387/7/29
Consider the following coin-tossing experiment:
– state-transition probabilities equal to 1/3
– initial state probabilities equal to 1/3
ExampleExample
0.250.75
0.750.25
0.50.5
P(H)P(T)
State3State2State 1
11/1/2007
12
23Pakdaman@ gmail.com 1387/7/291387/7/29
1. You observe O=(H,H,H,H,T,H,T,T,T,T). What state
sequence q, is most likely? What is the joint probability,
, of the observation sequence and the state sequence?
2. What is the probability that the observation sequence came entirely of state 1?
3. Consider the observation sequence
4. How would your answers to parts 1 and 2 change?
ExampleExample
( , | )P O q λ
( , , , , , , , , , ). O H T T H T H H T T H=
24Pakdaman@ gmail.com 1387/7/291387/7/29
4. If the state transition probabilities were:
How would the new model λ’ change your answers to parts 1-3?
ExampleExample
0.9 0.45' 0.05 0.1
0.05 0.45A
0.45⎡ ⎤⎢ ⎥= 0.45⎢ ⎥⎢ ⎥ 0.1⎣ ⎦
11/1/2007
13
25Pakdaman@ gmail.com 1387/7/291387/7/29
Backward AlgorithmBackward Algorithm
STA
TES
N
4
3
2
1
OBSERVATION
1 2 t-1 t t+1 t+2 T-1 TTIME
o1 o2 ot-1 ot ot+1 ot+2 oT-1 oT
26Pakdaman@ gmail.com 1387/7/291387/7/29
Define backward variable as:
is the probability of observing the partial sequence
such that the state qt is i.
Induction:
1. Initialization:
2. Induction:
Backward AlgorithmBackward Algorithm
( )t iβ
( )t iβ1 2( ) ( , ,..., | , )t t t T ti P o o o q iβ λ+ += =
1 2( , ,..., )t t To o o+ +
( ) 1T iβ =
1 11
( ) ( ) ( ), 1 , 1,...,1N
t ij j t tj
i a b o j i N t Tβ β+ +=
= ≤ ≤ = −∑
11/1/2007
14
27Pakdaman@ gmail.com 1387/7/291387/7/29
Solution to Problem 2 (1)Solution to Problem 2 (1)
Difficulty lies in declaring optimality condition
Choose the state qt which is individually most likely so maximize the expected number of correct individual states
28Pakdaman@ gmail.com 1387/7/291387/7/29
Solution to Problem 2 (1)Solution to Problem 2 (1)
Define the probability of being in state Si at time t, given O and λ
),|()( λγ OiqPi tt ==
∑=
== N
itt
ttttt
ii
iiOP
iii
1
)().(
)().()|()().()(
βα
βαλ
βαγ
11/1/2007
15
29Pakdaman@ gmail.com 1387/7/291387/7/29
Solution to Problem 2 (1)Solution to Problem 2 (1)
TtiMaxq tNi
t ≤≤=≤≤
1)]([arg1
γ
Although maximize the expected number of correct states there could be some problems with the resulting state sequence!
30Pakdaman@ gmail.com 1387/7/291387/7/29
Choose the most likely path
Find the path (q1 , q2 ,…,qT ) that maximizes the likelihood:
Solution by Dynamic Programming
Define
is the best score (highest Prob.) along a single path accounts for the first t observations and ends in state Si
By induction we have:
Solution to Problem 2 (2)Solution to Problem 2 (2)
1 2 11 2 1 1, ,...,
( ) max ( , ,..., , , ,..., | )t
t t tq q qi P q q q i o o oδ λ
−
= =
1 2( , ,..., | , )TP q q q O λ
( )t iδ
1 1( ) max[ ( ) ] ( )t t ij j tij i a b oδ δ+ += ⋅
11/1/2007
16
31Pakdaman@ gmail.com 1387/7/291387/7/29
ViterbiViterbi AlgorithmAlgorithm
STA
TES
N
4
3
2
1
OBSERVATION
1 2 t-1 t t+1 t+2 T-1 TTIME
o1 o2 ot-1 ot ot+1 ot+2 oT-1 oT
aNk
k
a1k
32Pakdaman@ gmail.com 1387/7/291387/7/29
Initialization:
Recursion:
Termination:
Path (state sequence) backtracking:
Viterbi AlgorithmViterbi Algorithm
1 1
1
( ) ( ), 1( ) 0
i ii b o i Ni
δ πψ
= ≤ ≤=
11
11
( ) max[ ( ) ] ( )
( ) arg max[ ( ) ]
, 1
t t ij j ti N
t t iji N
j i a b o
j i a
t T j N
δ δ
ψ δ−≤ ≤
−≤ ≤
=
=
2 ≤ ≤ ≤ ≤
*
1
*
1
m ax[ ( )]
arg m ax[ ( )]
T Ti N
T Ti N
P i
q i
δ
δ≤ ≤
≤ ≤
=
=
* *1 1 1, 2, ...,1t t tq q t T Tψ + += ( ), = − −
11/1/2007
17
33Pakdaman@ gmail.com 1387/7/291387/7/29
Estimate to maximize No analytic method because of complexity – iterative solution.Baum-Welch Algorithm (actually EM algorithm) :
1. Let initial model be λ02. Compute new λ based on λ0 and observation O.3. Ιf 4. Else set λ0 λ and go to step 2
Solution to Problem 3Solution to Problem 3
( , , )A Bλ π= ( | )P O λ
0log ( | ) log ( | ) stopP O P O DELTAλ λ− <
34Pakdaman@ gmail.com 1387/7/291387/7/29
BaumBaum--Welch: PreliminariesWelch: Preliminaries
1( , ) ( , | , )t t ti j P q i q j Oξ λ+= = =
11/1/2007
18
35Pakdaman@ gmail.com 1387/7/291387/7/29
Define as the probability of being in state i at time t, and in state j at time t+1.
Define as the probability of being in state i at time t, given the observation sequence
BaumBaum--Welch: PreliminariesWelch: Preliminaries
( , )i jξ
1 1
1 1
1 11 1
( ) ( ) ( )( , )
( | )( ) ( ) ( )
( ) ( ) ( )
t ij j t t
t ij j t tN N
t ij j t ti j
i a b o ji j
P Oi a b o j
i a b o j
α βξ
λα β
α β
+ +
+ +
+ += =
=
=∑ ∑
( )t iγ
1
( ) ( , )N
t tj
i i jγ ξ=
= ∑
36Pakdaman@ gmail.com 1387/7/291387/7/29
is the expected number of times state i is visited.
is the expected number of transitions from state ito state j.
BaumBaum--Welch: PreliminariesWelch: Preliminaries
1( )T
ttiγ
=∑1
1( , )T
tti jξ−
=∑
11/1/2007
19
37Pakdaman@ gmail.com 1387/7/291387/7/29
expected frequency in state i at time (t=1) (expected number of transitions from state i to state j) / expected number of transitions from state i):
(expected number of times in state j and observing symbol k) / (expected number of times in state j ):
BaumBaum--Welch: Update RulesWelch: Update Rules_
iπ = 1( )iγ=_
ija =
_ ( , )( )
tij
t
i ja
iξγ
= ∑∑_
( )ib k =
∑
∑
=
=−== T
tt
T
tt
j
j
i
kb kvtOts
1
1
)(
)(
)( ..
γ
γ
38Pakdaman@ gmail.com 1387/7/291387/7/29
Some issuesSome issues
Limitations imposed byMarkov chain
ScalabilityLearning
InitialisationModel orderLocal maximaWeighting training sequences
11/1/2007
20
39Pakdaman@ gmail.com 1387/7/291387/7/29
HMM ApplicationsHMM Applications
Classification (e.g., Profile HMMs)Build an HMM for each class (profile HMMs)Classify a sequence using Bayes rule
Multiple sequence alignmentBuild an HMM based on a set of sequencesDecode each sequence to find a multiple alignment
Segmentation (e.g., gene finding)Use different states to model different regionsDecode a sequence to reveal the region boundaries
40Pakdaman@ gmail.com 1387/7/291387/7/29
HMMsHMMs for Classificationfor Classification
1{ ,..., }( | ) ( )( | )
( )* arg max ( | ) ( )
k
C
C C Cp X C p Cp C X
p XC p X C p C
∈
=
=
p(X|C) is modeled by a profile HMM built specifically for C
Assuming example sequences are available for C
E.g., Protein families
Assign a family to X
11/1/2007
21
41Pakdaman@ gmail.com 1387/7/291387/7/29
HMMsHMMs for Motif Findingfor Motif Finding
Given a set of sequences S={X1, …,Xk}Design an HMM with two kinds of states
Background states: For outside a motifMotif states: For modeling a motif
Train the HMM, e.g., using Baum-Welch (finding the HMM that maximizes the probability of S)The “motif part” of the HMM gives a motif model (e.g., a PWM) The HMM can be used to scan any sequence (including Xi) to figure out where the motif is. We may also decode each sequence Xi to obtain a set of subsequences matched by the motif (e.g., a multiset of k-mers)
42Pakdaman@ gmail.com 1387/7/291387/7/29
HMMsHMMs for Multiple Alignmentfor Multiple Alignment
Given a set of sequences S={X1, …,Xk}Train an HMM, e.g., using Baum-Welch (finding the HMM that maximizes the probability of S)Decode each sequence XiAssemble the Viterbi paths to form a multiple alignment
The symbols belonging to the same state will be aligned to each other
11/1/2007
22
43Pakdaman@ gmail.com 1387/7/291387/7/29
HMMHMM--based Gene Findingbased Gene Finding
Design two types of states “Within Gene” States“Outside Gene” States
Use known genes to estimate the HMMDecode a new sequence to reveal which part is a geneExample software:
GENSCAN (Burge 1997)FGENESH (Solovyev 1997)HMMgene (Krogh 1997)GENIE (Kulp 1996)GENMARK (Borodovsky & McIninch 1993)VEIL (Henderson, Salzberg, & Fasman 1997)
44Pakdaman@ gmail.com 1387/7/291387/7/29
VEIL: Viterbi VEIL: Viterbi ExonExon--IntronIntronLocatorLocator
Exon HMM ModelUpstream
Start Codon
Exon
Stop Codon
Downstream
3’ Splice Site
Intron
5’ Poly-A Site
5’ Splice Site
• Enter: start codon or intron (3’ Splice Site)
• Exit: 5’ Splice site or three stop codons(taa, tag, tga)
VEIL Architecture
11/1/2007
23
45Pakdaman@ gmail.com 1387/7/291387/7/29
Solutions to the Local Maxima Solutions to the Local Maxima ProblemProblem
Repeat with different initializationsStart with the most reasonable initial modelSimulated annealing (slow down the convergence speed)
46Pakdaman@ gmail.com 1387/7/291387/7/29
Local Maxima: IllustrationLocal Maxima: Illustration
Global maximaLocal maxima
Good starting pointBad starting point
11/1/2007
24
47Pakdaman@ gmail.com 1387/7/291387/7/29
Optimal Model ConstructionOptimal Model Construction
( | ) ( )( | )( )
* arg max ( | )arg max ( | ) ( )
HMM
HMM
p X HMM p HMMp HMM Xp X
HMM p HMM Xp X HMM p HMM
=
==
Bayesian model selection: -P(HMM) should prefer simpler models
(i.e., more constrained, fewer states, fewer transitions)-P(HMM) could reflect our prior on the parameters
48Pakdaman@ gmail.com 1387/7/291387/7/29
Sequence WeightingSequence Weighting
Avoid over-counting similar sequences from the same organismsTypically compute a weight for a sequence based on an evolutionary treeMany ways to incorporate the weights, e.g.,
Unequal likelihoodUnequal weight contribution in parameter estimation
11/1/2007
25
49Pakdaman@ gmail.com 1387/7/291387/7/29
Toolkits for HMMToolkits for HMM
Hidden Markov Model Toolkit (HTK)http://htk.eng.cam.ac.uk/Hidden Markov Model (HMM) Toolbox for Matlabhttp://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.htmlTraining HMM for ASRhttp://cslu.cse.ogi.edu/tutordemos/nnet_training/tutorial.html#1.1_Setup
50Pakdaman@ gmail.com 1387/7/291387/7/29
L. Rabiner and B. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine, p. 4--16, Jan. 1986.
Further ReadingFurther Reading