Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process,...

31
Hidden Markov Model Continues …
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    3

Transcript of Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process,...

Page 1: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Hidden Markov Model

Continues …

Page 2: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Finite State Markov Chain

1 1

EF

11

D

0, (ie, ( ))i i

L

xi

x xa a p x

A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and

1. An m dimensional initial distribution vector ( p(1),.., p(m)).2. An m×m transition probabilities matrix M= (ast)

112

( )i i

L

x xi

p x a

1 2 1 1 1 1

2

(( , ,... )) ( ) ( | )L

L i i i ii

p x x x p X x p X x X x

• For each integer L, a Markov Chain assigns probability to sequences (x1…xL) over D (i.e, xi D) as follows:

Similarly, (X1,…, Xi ,…)is a sequence of probability distributions over D.

Page 3: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Use of Markov Chains: Sequences with CpG Islands

In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG.

Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone.

Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes.

These areas are called CpG islands (p denotes “pair”).

Page 4: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Modeling sequences with CpG Island

The “+” model: Use transition matrix A+ = (a+st),

Where: a+

st = (the probability that t follows s in a CpG island)

The “-” model: Use transition matrix A- = (a-st),

Where: a-

st = (the probability that t follows s in a non CpG island)

Page 5: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

(Stationary) Markov Chains

X1 X2 XL-1 XL

•Every variable xi has a domain. For example, suppose the domain are the letters {a, c, t, g}.•Every variable is associated with a local (transition) probability table p(Xi = xi | Xi-1= xi-1 ) and p(X1 = x1 ).•The joint distribution is given by

L

iiiii

LLLLnn

xXxXpxXp

xXxXpxXxXpxXpxXxXp

21111

1111221111

)|()(

)|()|()(),,(

L

iiiL xxpxxp

111 )|(),,( In short:

Stationary means that the transition probability tables do not depend on i.

Page 6: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Question 1: Using two Markov chains

X1 X2 XL-1 XL

For CpG islands:

We need to specify pI(xi | xi-1) where I stands for CpG Island.

Xi-1

Xi

A C T G

A 0.2 0.3 0.4 0.1

C 0.4 p(C | C) p(T| C) high

T 0.1 p(C | T) p(T | T) p(G | T)

G 0.3 p(C | G) p(T | G) p(G | G)

=1

Lines must add up to one; columns need not.

Page 7: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Question 1: Using two Markov chains

X1 X2 XL-1 XL

For non-CpG islands:

We need to specify pN(xi | xi-1) where N stands for Non CpG island.

Xi-1

Xi

A C T G

A 0.2 0.3 0.25 0.25

C 0.4 p(C | C) p(T | C) low

T 0.1 p(C | T) p(T | T) high

G 0.3 p(C | G) p(T | G) p(G | G)

Some entries may or may not change compared to pI(xi | xi-1) .

Page 8: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Question 1: Log Odds-Ratio test

Comparing the two options via odds-ratio test yields

)|(

)|(log

)(

)(loglog

1

1

1

1

iiN

iiI

iLN

LI

xxp

xxp

xxp

xxpQ

If logQ > 0, then CpG island is more likely.If logQ < 0, then non-CpG island is more likely.

Page 9: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Question 2: Finding CpG Islands

Given a long genomic string with possible CpG Islands, we define a Markov Chain over 8 states, all interconnected (hence it is ergodic):

C+ T+G+A+

C- T-G-A-

The problem is that we don’t know the sequence of states which are traversed, but just the sequence of letters.

Therefore we use here Hidden Markov Model

Page 10: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Hidden Markov Model

1 11

( , , ) ( | )L

L i ii

p s s p s s

A Markov chain (s1,…,sL):

and for each state s and a symbol x we have p(Xi=x|Si=s)

Application in communication: message sent is (s1,…,sm) but we receive (x1,…,xm) . Compute what is the most likely message sent ?

Application in speech recognition: word said is (s1,…,sm) but we recorded (x1,…,xm) . Compute what is the most likely word said ?

S1 S2 SL-1 SL

x1 x2 XL-1 xL

M M M M

TTTT

Page 11: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Hidden Markov Model

Notations:Markov Chain transition probabilities: p(Si+1= t|Si = s) = ast

Emission probabilities: p(Xi = b| Si = s) = es(b)

S1 S2 SL-1 SL

x1 x2 XL-1 xL

M M M M

TTTT

For Markov Chains we know:

What is p(s,x) = p(s1,…,sL;x1,…,xL) ?

1 11

( ) ( , , ) ( | )L

L i ii

p p s s p s ss

Page 12: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Hidden Markov Model

p(Xi = b| Si = s) = es(b), means that the probability of xi depends only on the probability of si.Formally, this is equivalent to the conditional independence assumption:

p(Xi=xi|x1,..,xi-1,xi+1,..,xL,s1,..,si,..,sL) = esi(xi)

S1 S2 SL-1 SL

x1 x2 XL-1 xL

M M M M

TTTT

1 1 11

( , ) ( , , ; ,..., ) ( | ) ( )i

L

L L i i s ii

p p s s x x p s s e xs x

Thus

Page 13: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Hidden Markov Model for CpG Islands

The states:

S1 S2 SL-1 SL

X1 X2 XL-1 XL

Domain(Si)={+, -} {A,C,T,G} (8 values)

In this representation P(xi| si) = 0 or 1 depending on whether xi is consistent with si . E.g. xi= G is consistent with si=(+,G) and with si=(-,G) but not with any other state of si.

The query of interest:

),,|,...,(argmax ),...,( 11),...,(s

**1

1

LLs

L xxsspssL

Page 14: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Hidden Markov Model

Questions:

Given the “visible” sequence x = (x1,…,xL), find:1. A most probable (hidden) path.2. The probability of x.

3. For each i = 1, .., L, and for each state k, p(si=k| x)

S1 S2 SL-1 SL

x1 x2 XL-1 xL

M M M M

TTTT

Page 15: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

1. Most Probable state path

S1 S2 SL-1 SL

x1 x2 XL-1 xL

M M M M

TTTT

First Question: Given an output sequence x = (x1,…,xL),

A most probable path s*= (s*1,…,s*

L) is one which maximizes p( s | x ).

1( ,..., )

* *1 1 1* ( ,..., ) ( ,..., | ,..., )max arg

Ls s

L L Ls s s p s s x x

Page 16: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Viterbi’s Algorithm for Most Probable Path

s1 s2

X1 X2

si

Xi

1

1 1max arg( ,..., )

( ,..., ; ,..., )L

L Ls s

p s s x x The task: compute

vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable path up to i, which ends in state l .

Let the states be {1, …, m}

Idea: for i = 1, …, L and for each state l, compute:

Page 17: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Viterbi’s algorithm for most probable path

vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable path up to i, which ends in state l .

( ) ( ) max{ ( 1) }l l i k klk

v i e x v i a

Exercise: For i = 1,…,L and for each state l:

s1 Si-1

X1 Xi-1

l

Xi

...

Page 18: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Viterbi’s algorithm

s1 s2 sL-1 sL

X1 X2 XL-1 XL

si

Xi

For i=1 to L do for each state l :

vl(i) = el(xi) MAXk {vk(i-1)akl }ptri(l)=arg maxk{vk(i-1)akl}

[storing previous state for reconstructing the path]Termination:

Initialization: v0(0) = 1 , vk(0) = 0 for k > 0

0

We add the special initial state 0.

Result: p(s1*,…,sL

*;x1,…,xl) = { ( )}max kk

v L

Page 19: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

2. Computing p(x)

S1 S2 SL-1 SL

x1 x2 XL-1 xL

M M M M

TTTT

Given an output sequence x = (x1,…,xL),Compute the probability that this sequence was generated:

( ) ( ),p px x sS

The summation taken over all state-paths s generating x.

Page 20: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Forward algorithm for computing p(x)

( ) ( ),s

p px x s

? ?

X1 X2

si

Xi

The task: compute

Idea: for i=1,…,L and for each state l, compute:

fl(i) = p(x1,…,xi;si=l ), the probability of all the paths which emit (x1, .., xi) and end in state si=l.

( ) ( ) ( 1)l l i k klk

f i e x f i a Use the recursive formula:

Page 21: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Forward algorithm for computing p(x)

s1 s2 sL-1 sL

X1 X2 XL-1 XL

si

Xi

For i=1 to L do for each state l :

fl(i) = el(xi) ∑k fk(i-1)akl

Initialization: f0(0) := 1 , fk(0) := 0 for k>0

0

Similar to Viterbi’s algorithm:

Result: p(x1,…,xL) = ( )kkf L

Page 22: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

3. The distribution of Si, given x

S1 S2 SL-1 SL

x1 x2 XL-1 xL

M M M M

TTTT

Given an output sequence x = (x1,…,xL),

Compute for each i=1,…,l and for each state k the

probability that si = k ; i.e., p(si=k| x) .

This helps to reply queries like: what is the probability that si is in a CpG island, etc.

Page 23: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Solution in two stages

1. For each i and each state k, compute p(si=k | x1,…,xL).

s1 s2 sL-1 sL

X1 X2 XL-1 XL

si

Xi

2. Do the same computation for every i = 1, .. , L but without repeating the first task L times.

Page 24: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Computing for a single i:

s1 s2 sL-1 sL

X1 X2 XL-1 XL

si

Xi

11

1

( , ,..., )( | ,..., )

( ,..., )i L

i LL

p s x xp s x x

p x x

Page 25: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Decomposing the computation

P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | x1,…,xi,si)

(by the equality p(A,B) = p(A) p(B|A ).

s1 s2 sL-1 sL

X1 X2 XL-1 XL

si

Xi

P(x1,…,xi,si)= fsi(i) ≡ F(si), so we are left with the

task to compute P(xi+1,…,xL | x1,…,xi,si) ≡ B(si)

Page 26: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Decomposing the computation

s1 s2 Si+1 sL

X1 X2 Xi+1 XL

si

Xi

Exercise: Show from the definitions of Markov Chain and Hidden Markov Chain that:P(xi+1,…,xL | x1,…,xi,si) = P(xi+1,…,xL | si)

Denote P(xi+1,…,xL | si) ≡ B(si).

Page 27: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Decomposing the computation

Summary:P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | x1,…,xi , si)

s1 s2 sL-1 sL

X1 X2 XL-1 XL

si

Xi

Equality due to independence of ( {xi+1,…,xL}, and {x1,…,xi} | si ) – by the Exercise.

= P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ F(si)·B(si)

Page 28: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

F(si): The Forward algorithm:

s1 s2 sL-1 sL

X1 X2 XL-1 XL

si

Xi

For i=1 to L do for each state l :

F(si) = esi(xi)·∑si-1

F (si-1)asi-1si

Initialization: F (0) = 1

0

The algorithm computes F(si) = P(x1,…,xi,si) for i=1,…,L (namely, considering evidence up to time slot i).

Page 29: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

B(si): The backward algorithm

The task: Compute B(si) = P(xi+1,…,xL|si) for i=L-1,…,1 (namely, considering evidence after time slot i).

SL-1 SL

XL-1 XL

Si Si+1

Xi+1

{first step, step L-1: Compute B(sL-1).}

{step i: compute B(si) from B(si+1)}

P(xi+1,…,xL|si) = P(si+1 | si) P(xi+1 | si+1) P(xi+2,…,xL| si+1)si+1

B(si) B(si+1)

P(xL| sL-1) = sLP(xL ,sL |sL-1) = s

L P(sL |sL-1) P(xL |sL )

Page 30: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

The combined answer

1. To compute the probability that Si=si given that {x1,…,xL} run the forward algorithm and compute F(si) = P(x1,…,xi,si), run the backward algorithm to compute B(si) = P(xi+1,…,xL|si), the product F(si)B(si) is the answer (for every possible value si).2. To compute these probabilities for every si simply run the forward and backward algorithms once, storing F(si) and B(si) for every i (and every value of si). Compute F(si)B(si) for every i.

s1 s2 sL-1 sL

X1 X2 XL-1 XL

si

Xi

Page 31: Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Time and Space Complexity of the forward/backward algorithms

Time complexity is O(m2L) where m is the number of states. It is linear in the length of the chain, provided the number of states is a constant.

s1 s2 sL-1 sL

X1 X2 XL-1 XL

si

Xi

Space complexity is also O(m2L).