Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin...
-
date post
19-Dec-2015 -
Category
Documents
-
view
212 -
download
0
Transcript of Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin...
EE3J2 Data MiningSlide 1
EE3J2 Data Mining
Lecture 14: Introduction to Hidden Markov Models
Martin Russell
EE3J2 Data MiningSlide 2
Objectives
Limitations of sequence matching Introduction to hidden Markov models (HMMs)
EE3J2 Data MiningSlide 3
Sequence retrieval using DP
……
AAGDTDTDTDD
AABBCBDAAAAAAA
BABABABBCCDF
GGGGDDGDGDGDGDTDTD
DGDGDGDGD
AABCDTAABCDTAABCDTAAB
CDCDCDTGGG
GGAACDTGGGGGAAA
…….
…….
Corpus of sequential data
‘query’ sequence Q
…BBCCDDDGDGDGDCDTCDTTDCCC…
Dynamic Programming
Distance Calculation Calculate ad(S,Q)
for each sequence S in corpus
QSadSS
,minargˆ
EE3J2 Data MiningSlide 4
Limitations of ‘template matching’
This type of analysis is sometimes referred to as template matching
The ‘templates’ are the sequences in the corpus Can think of each template as representing a ‘class’ Problem is to determine which class best fits the
query Performance will depend on precisely which
template is used to represent the class
EE3J2 Data MiningSlide 5
Alternative path shapes
The basic units of path considered so far are:
substitution insertion deletion
Others are possible and may have advantages, e.g:
substitution insertion deletion
EE3J2 Data MiningSlide 6
Example
EE3J2 Data MiningSlide 7
Hidden Markov Models (HMMs)
One solution is to replace the individual template sequence with an ‘average’ sequence
But what is an ‘average sequence’? One solution is to use a type of statistical model
called a Hidden Markov Model
EE3J2 Data MiningSlide 8
HMMs
Suppose the following sequences are in same class:– ABC, YBBC, ABXC, AZ
Compute alignments:
Y B B C
A
B
C
A B X C
A
B
C
A Z
A
B
C
EE3J2 Data MiningSlide 9
Finite State Network Representation The sequence consists of 3 ‘states’
– First state is ‘realised’ as A (twice) or Y (once)
– Second state ‘realised’ as B (three times) or X (once)
– Second state can be repeated or deleted
– Third state can be ‘realised’ as C (twice) or Z (once)
EE3J2 Data MiningSlide 10
Network representation
Directed graph representation Each state associated with a set of probabilities
– Called the ‘state emission’ probabilities
0
,3
1 ,
3
2
ZpXpCpBp
YpAp
EE3J2 Data MiningSlide 11
Transition probabilities
Transition probabilities control insertions and deletions of symbols
1 10.67
0.33
0.5
0.5
00000
10000
05.05.000
033.067.000
00010
A
ajk=Prob(state k follows state j)
Basic rule for drawing transition networks: Connect state j to state k if ajk > 0
EE3J2 Data MiningSlide 12
Formal Definition
A Hidden Markov Model (HMM) for the symbols 1, 2, …, K consists of:
– A number of states N
– An N N state transition probability matrix A
– For each state k a set of probabilities pk(1), … , p(K) - p(k) is the probability that k occurs for state k
EE3J2 Data MiningSlide 13
Alignment paths for HMMs
For HMMs, alignment paths are called state sequences
Y A B B B X B C
A
B
C
CpaBpaApaYpYABBBXBCp 4343232222
EE3J2 Data MiningSlide 14
State-symbol trellis
Y A B B B X B C
A
B
C
Rule: connect state j at symbol m with state k at symbol m+1 if ajk > 0
EE3J2 Data MiningSlide 15
More examples
EE3J2 Data MiningSlide 16
Dynamic Programming
Y A B B B X B C
A
B
C
Bpa
Bpa
k
kk
4341
4241
3
2max4
EE3J2 Data MiningSlide 17
Formal Definition
A Hidden Markov Model (HMM) for the symbols 1, 2, …, K consists of:
– A number of states N
– An N N state transition probability matrix A
– For each state k a set of probabilities pk(1), … , p(K) - p(k) is the probability that k occurs for state k
EE3J2 Data MiningSlide 18
Alignment paths for HMMs
For HMMs, alignment paths are called state sequences
Y A B B B X B C
A
B
C
CpaBpaApaYpYABBBXBCp 4343232222
State sequence
EE3J2 Data MiningSlide 19
The optimal state sequence
Let M be a HMM and s a sequence Probability on previous slide depends on the state
sequence and the model, so we write:
By analogy with dynamic programming, the optimal state sequence is the sequence such that:
Msp |,
Msp
MspMsp
|,maxargˆ
or, ,|,max|ˆ,
EE3J2 Data MiningSlide 20
Computing the optimal state sequence:The ‘state-symbol’ trellis
Y A B B B X B C
A
B
C
Rule: connect state j at symbol m with state k at symbol m+1 if ajk > 0
EE3J2 Data MiningSlide 21
More examples
EE3J2 Data MiningSlide 22
Dynamic Programminga.k.a Viterbi Decoding
Y A B B B X B C
A
B
C
4|ˆ,ˆ ,3
2max4
4341
4241K
k
kk Mspsp
Bpa
Bpa
k
4
K
EE3J2 Data MiningSlide 23
Sequence retrieval using HMMs
Corpus of pre-build HMMs
‘query’ sequence Q
…BBCCDDDGDGDGDCDTCDTTDCCC…Viterbi
Decoding
Calculate p(Q|M) for each HMM M
in corpus MQpMM
|maxargˆ
EE3J2 Data MiningSlide 24
EE3J2 Data MiningSlide 25
EE3J2 Data MiningSlide 26
EE3J2 Data MiningSlide 27
EE3J2 Data MiningSlide 28
HMM Construction
Suppose we have a set of HMMs, each representing a different class (e.g. protein sequence)
Given an unknown sequence s:– Use Viterbi decoding to compare s with each HMM
– Compute
But how do we obtain the HMM in the first place?
MxspMsp |ˆ,|ˆ
EE3J2 Data MiningSlide 29
HMM training
Given a set of example sequences S a HMM M can be built such that p(S|M) is locally maximised
Procedure is as follows:– Obtain an initial estimate of a suitable model M0
– Apply an algorithm – the ‘Baum-Welch’ algorithm – to obtain a new model M1 such that p(S|M1) ≥ p(S|M0)
– Repeat to produce a sequence of HMMs M0, M1,…,Mn with:
p(S|M0) ≤ p(S|M1) ≤ p(S|M2) ≤… ≤ p(S|Mn)
EE3J2 Data MiningSlide 30
Local optimality
M0 M1…Mn
P(S|M)
Local maximum
Global maximum
EE3J2 Data MiningSlide 31
Summary
Hidden Markov Models Importance of HMMs for sequence matching Viterbi decoding HMM training
EE3J2 Data MiningSlide 32
Summary
Review of template matching Hidden Markov Models Dynamic programming for HMMs