Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin...

Post on 19-Dec-2015

212 views 0 download

Tags:

Transcript of Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin...

EE3J2 Data MiningSlide 1

EE3J2 Data Mining

Lecture 14: Introduction to Hidden Markov Models

Martin Russell

EE3J2 Data MiningSlide 2

Objectives

Limitations of sequence matching Introduction to hidden Markov models (HMMs)

EE3J2 Data MiningSlide 3

Sequence retrieval using DP

……

AAGDTDTDTDD

AABBCBDAAAAAAA

BABABABBCCDF

GGGGDDGDGDGDGDTDTD

DGDGDGDGD

AABCDTAABCDTAABCDTAAB

CDCDCDTGGG

GGAACDTGGGGGAAA

…….

…….

Corpus of sequential data

‘query’ sequence Q

…BBCCDDDGDGDGDCDTCDTTDCCC…

Dynamic Programming

Distance Calculation Calculate ad(S,Q)

for each sequence S in corpus

QSadSS

,minargˆ

EE3J2 Data MiningSlide 4

Limitations of ‘template matching’

This type of analysis is sometimes referred to as template matching

The ‘templates’ are the sequences in the corpus Can think of each template as representing a ‘class’ Problem is to determine which class best fits the

query Performance will depend on precisely which

template is used to represent the class

EE3J2 Data MiningSlide 5

Alternative path shapes

The basic units of path considered so far are:

substitution insertion deletion

Others are possible and may have advantages, e.g:

substitution insertion deletion

EE3J2 Data MiningSlide 6

Example

EE3J2 Data MiningSlide 7

Hidden Markov Models (HMMs)

One solution is to replace the individual template sequence with an ‘average’ sequence

But what is an ‘average sequence’? One solution is to use a type of statistical model

called a Hidden Markov Model

EE3J2 Data MiningSlide 8

HMMs

Suppose the following sequences are in same class:– ABC, YBBC, ABXC, AZ

Compute alignments:

Y B B C

A

B

C

A B X C

A

B

C

A Z

A

B

C

EE3J2 Data MiningSlide 9

Finite State Network Representation The sequence consists of 3 ‘states’

– First state is ‘realised’ as A (twice) or Y (once)

– Second state ‘realised’ as B (three times) or X (once)

– Second state can be repeated or deleted

– Third state can be ‘realised’ as C (twice) or Z (once)

EE3J2 Data MiningSlide 10

Network representation

Directed graph representation Each state associated with a set of probabilities

– Called the ‘state emission’ probabilities

0

,3

1 ,

3

2

ZpXpCpBp

YpAp

EE3J2 Data MiningSlide 11

Transition probabilities

Transition probabilities control insertions and deletions of symbols

1 10.67

0.33

0.5

0.5

00000

10000

05.05.000

033.067.000

00010

A

ajk=Prob(state k follows state j)

Basic rule for drawing transition networks: Connect state j to state k if ajk > 0

EE3J2 Data MiningSlide 12

Formal Definition

A Hidden Markov Model (HMM) for the symbols 1, 2, …, K consists of:

– A number of states N

– An N N state transition probability matrix A

– For each state k a set of probabilities pk(1), … , p(K) - p(k) is the probability that k occurs for state k

EE3J2 Data MiningSlide 13

Alignment paths for HMMs

For HMMs, alignment paths are called state sequences

Y A B B B X B C

A

B

C

CpaBpaApaYpYABBBXBCp 4343232222

EE3J2 Data MiningSlide 14

State-symbol trellis

Y A B B B X B C

A

B

C

Rule: connect state j at symbol m with state k at symbol m+1 if ajk > 0

EE3J2 Data MiningSlide 15

More examples

EE3J2 Data MiningSlide 16

Dynamic Programming

Y A B B B X B C

A

B

C

Bpa

Bpa

k

kk

4341

4241

3

2max4

EE3J2 Data MiningSlide 17

Formal Definition

A Hidden Markov Model (HMM) for the symbols 1, 2, …, K consists of:

– A number of states N

– An N N state transition probability matrix A

– For each state k a set of probabilities pk(1), … , p(K) - p(k) is the probability that k occurs for state k

EE3J2 Data MiningSlide 18

Alignment paths for HMMs

For HMMs, alignment paths are called state sequences

Y A B B B X B C

A

B

C

CpaBpaApaYpYABBBXBCp 4343232222

State sequence

EE3J2 Data MiningSlide 19

The optimal state sequence

Let M be a HMM and s a sequence Probability on previous slide depends on the state

sequence and the model, so we write:

By analogy with dynamic programming, the optimal state sequence is the sequence such that:

Msp |,

Msp

MspMsp

|,maxargˆ

or, ,|,max|ˆ,

EE3J2 Data MiningSlide 20

Computing the optimal state sequence:The ‘state-symbol’ trellis

Y A B B B X B C

A

B

C

Rule: connect state j at symbol m with state k at symbol m+1 if ajk > 0

EE3J2 Data MiningSlide 21

More examples

EE3J2 Data MiningSlide 22

Dynamic Programminga.k.a Viterbi Decoding

Y A B B B X B C

A

B

C

4|ˆ,ˆ ,3

2max4

4341

4241K

k

kk Mspsp

Bpa

Bpa

k

4

K

EE3J2 Data MiningSlide 23

Sequence retrieval using HMMs

Corpus of pre-build HMMs

‘query’ sequence Q

…BBCCDDDGDGDGDCDTCDTTDCCC…Viterbi

Decoding

Calculate p(Q|M) for each HMM M

in corpus MQpMM

|maxargˆ

EE3J2 Data MiningSlide 24

EE3J2 Data MiningSlide 25

EE3J2 Data MiningSlide 26

EE3J2 Data MiningSlide 27

EE3J2 Data MiningSlide 28

HMM Construction

Suppose we have a set of HMMs, each representing a different class (e.g. protein sequence)

Given an unknown sequence s:– Use Viterbi decoding to compare s with each HMM

– Compute

But how do we obtain the HMM in the first place?

MxspMsp |ˆ,|ˆ

EE3J2 Data MiningSlide 29

HMM training

Given a set of example sequences S a HMM M can be built such that p(S|M) is locally maximised

Procedure is as follows:– Obtain an initial estimate of a suitable model M0

– Apply an algorithm – the ‘Baum-Welch’ algorithm – to obtain a new model M1 such that p(S|M1) ≥ p(S|M0)

– Repeat to produce a sequence of HMMs M0, M1,…,Mn with:

p(S|M0) ≤ p(S|M1) ≤ p(S|M2) ≤… ≤ p(S|Mn)

EE3J2 Data MiningSlide 30

Local optimality

M0 M1…Mn

P(S|M)

Local maximum

Global maximum

EE3J2 Data MiningSlide 31

Summary

Hidden Markov Models Importance of HMMs for sequence matching Viterbi decoding HMM training

EE3J2 Data MiningSlide 32

Summary

Review of template matching Hidden Markov Models Dynamic programming for HMMs