Hidden Markov Models

Markov chain and Hidden Markov Models

Nasir and Rajab

Dr. PanSpring 2014

MARKOV CHAINS:

One assumption that the stochastic process lead to Markov chain, which has the following key property:

A stochastic process {Xt} is said to have the Markovian property if The state of the system at time t+1

depends only on the state of the system at time t

] x X | x P[X

] x X , x X , . . . , x X , x X | x P[X

tt11t

00111-t1-ttt11t

t

t

Stationarity Assumption:

Probabilities independent of t when process is “stationary” So,

This means that if system is in state i, the probability that the system will next move to state j is .

Because the are conditional probabilities, they must be nonnegative, and since the process must make a transition into some

state, they must satisfy the properties:

1.

The n-step transition matrix

The Markov chains have the following properties:

1. A finite number of states.

2. Stationary transition probabilities.

We also will assume that we know the initial probabilities for all i.

_ Irreducible Markov chain :A Markov Chain is irreducible if the corresponding graph is strongly connected.

- Recurrent and transient states

A and B are transient states, C and D are recurrent states.

Once the process moves from B to D, it will never come back.A B

C D

A B

C D

E

The period of a state

A Markov Chain is periodic if all the states in it have a period k >1.

It is aperiodic otherwise.

Ergodic

A Markov chain is ergodic if :

1.the corresponding graph is strongly connected.

2.It is not peridoicA B

C D

A B

C D

E

Markov Chain Example

• Based on the weather today what will it be tomorrow?

• Assuming only four possible weather states

° Sunny

° Cloudy

° Rainy

° Snowing

Markov Chain Structure

• Each state is an observable event

• At each time interval the state changes to another or same state (qt {S1, S2, S3, S4})

State S1 State S2

State S3 State S4

(Sunny)

(Snowing)(Rainy)

(Cloudy)

Markov Chain Structure

Sunny Cloudy

Rainy Snowy

Markov Chain Transition Probabilities

• Transition probability matrix:

Time t + 1

State S1 S2 S3 S4 Total

Time t

S1 a11 a12 a13 a14 1

S2 a21 a22 a23 a24 1

S3 a31 a32 a33 a34 1

S4 a41 a42 a43 a44 1

𝑎𝑖𝑗=𝑝 (𝑞𝑡+ 1=𝑠 𝑗 {𝑞¿¿𝑡=𝑠𝑖)

Markov Chain Transition Probabilities

• Probabilities for tomorrow’s weather based on today’s weather

Time t + 1

State Sunny Cloudy Rainy Snowing

Time t

Sunny 0.6 0.3 0.1 0.0

Cloudy 0.2 0.4 0.3 0.1

Rainy 0.1 0.2 0.5 0.2

Snowing 0.0 0.3 0.2 0.5

0.6

0.4

0.5

0.5

0.2

0.10.3

0.2

0.3

0.1

0.3

0.20.1

0.2

Sunny

Cloudy

Rainy

Snowing

0.10.1

0.4

0.1

transition probabilities

1.0)|Pr(

4.0)|Pr(

1.0)|Pr(

1.0)|Pr(

1

1

1

1

gXtX

gXgX

gXcX

gXaX

ii

ii

ii

ii

Markov Chain ModelsA Markov Chain Model for DNA

A

TC

G

begin

state

transition

A AdenineC

G

T

Cytosine

Guanine

Thymine

Markov Chain Notation

The transition parameters can be denoted by where

• Similarly we can denote the probability of a sequence x as

where represents the transition from the begin state

• This gives a probability distribution over sequences of length M

)|Pr()Pr( 12

12

11

i

M

ii

M

ixxx xxxaa

iiB

axi 1xiPr(X i x i | X i 1 x i 1)

ii xxa1

1xaB

Estimating the Model Parameters

Given some data (e.g. a set of sequences from CpG islands), how can we

determine the probability parameters of our model?

* One approach maximum likelihood estimation

* A Bayesian Approach

The "p" in CpG indicates that the C and the G are next to each other in

sequence, regardless of being single- or double- stranded. In a CpG site, both C

and G are found on the same strand of DNA or RNA and are connected by a

phosphodiester bond. This is a covalent bond between atoms.

Maximum Likelihood Estimation

• Let’s use a very simple sequence modelEvery position is independent of the othersEevery position generated from the same multinomial distribution

We want to estimate the parameters Pr(a), Pr(c), Pr(g), Pr(t)

and we’re given the sequencesaccgcgcttagcttagtgactagccgttac

then the maximum likelihood estimates are the observed frequencies of the bases

267.030

8)Pr(

233.030

7)Pr(

t

g

3.030

9)Pr(

2.030

6)Pr(

c

a

Pr(a) na

nii

Maximum Likelihood Estimation

• Suppose instead we saw the following sequencesgccgcgcttggcttggtggctggccgttgc

• Then the maximum likelihood estimates are

267.030

8)Pr(

433.030

13)Pr(

t

g

3.030

9)Pr(

030

0)Pr(

c

a

A Bayesian Approach

• A more general form: m-estimates

mn

mpna

ii

aa

)Pr(

• with m=8 and uniform priors

gccgcgcttg

gcttggtggc

tggccgttgc

number of “virtual” instances

prior probability of a

38

11

830

825.09)Pr(

c

Estimation for 1st Order Probabilities

To estimate a 1st order parameter, such as Pr(c|g), we count the number of times that c follows the history g in our given sequences

using Laplace estimates with the sequences

gccgcgcttg

gcttggtggc

tggccgttgc

412

12)|Pr(

412

13)|Pr(

412

17)|Pr(

412

10)|Pr(

gt

gg

gc

ga

47

10)|Pr(

ca

Example ApplicationMarkov Chains for Discrimination

• Suppose we want to distinguish CpG islands from other sequence

regions

• given sequences from CpG islands, and sequences from other regions,

we can construct

• A model to represent CpG islands.

• A null model to represent the other regions.

Markov Chains for Discrimination

+ a c g t

a .18 .27 .43 .12

c .17 .37 .27 .19

g .16 .34 .38 .12

t .08 .36 .38 .18

- a c g t

a .30 .21 .28 .21

c .32 .30 .08 .30

g .25 .24 .30 .21

t .18 .24 .29 .29

• Parameters estimated for CpG and null modelshuman sequences containing 48 CpG islands 60,000 nucleotides

Pr( | )c a

CpG null

The CpG matrix The Null matrix

Hidden Markov Models (HMM)

“Doubly stochastic process with an underlying process that is not

observable (It’s Hidden) but can only be observed through another

set of stochastic process that produce the sequence observed

symbol”

Rabiner& Juang 86’

• In Markov chain, the state is directly visible to the observer, and

therefore the state transition probabilities are the only parameters. In a

hidden Markov model, the state is not directly visible, but output,

dependent on the state, is visible. Each state has a probability

distribution over the possible output tokens. Therefore the sequence of

tokens generated by an HMM gives some information about the

sequence of states.

Difference between Markov chains and HMM

HMM Example

• Suppose we want to determine the average annual temperature at a

particular location on earth over a series of years.

• we consider two annual temperatures, (hot" and cold“).

State H C

H 0.7 0.3

C 0.4 0.6

Time t +1

Time t

The state is the average annual temperature.

The transition from one state to the next is a Markov process

Since we can't observe the state (temperature) in the past, we

can observe the size of tree rings.

Now suppose that current research indicates a correlation between the

size of tree growth rings and temperature. we only consider three

different tree ring sizes, small, medium and large, respectively.

S M L

H 0.1 0.4 0.5

C 0.7 0.2 0.1

Observation

State

The probabilistic relationship between annual temperature and tree ring sizes is

In this example, suppose that the initial state distribution, denoted by

H C

The probability of state sequence X is given by:

= 0.000212

State probability Normalized

HHHH 0.000412 0.042787

HHHC 0.000035 0.003635

HHCH 0.000706 0.073320

HHCC 0.000212 0.022017

HCHH 0.000050 0.005193

HCHC 0.000004 0.000415

HCCH 0.000302 0.031364

HCCC 0.000091 0.009451

CHHH 0.001098 0.114031

CHHC 0.000094 0.009762

CHCH 0.001882 0.195451

CHCC 0.000564 0.058573

CCHH 0.000470 0.048811

CCHC 0.000040 0.004154

CCCH 0.002822 0.293073

CCCC 0.000847 0.087963

Table 1: State sequence probabilities

To find the optimal state sequence in the dynamic

programming (DP), we simply choose the

sequence with the highest probability, namely,

CCCH.

0 1 2 3

P(H) 0.188182 0.519576 0.228788 0.804029

P(C) 0.811818 0.480424 0.771212 0.195971

Table 2: HMM probabilities

From Table 2 we found that the optimal sequence in the HMM sense is CHCH.

and, the optimal DP sequence differs from the optimal HMM sequence and all state

transitions are valid.

Note that: The DP solution and the HMM solution are not necessarily the same. For

example, the DP solution must have valid state transitions, while this is not

necessarily the case for the HMMs.

R code of CpG matrix :

library(markovchain)

DNAStates <- c("A", "C", "G","T")

byRow <- TRUE

DNAMatrix <- matrix(data = c(0.18,0.27,0.43,0.12,0.17,0.37,0.27,0.19,0.16,0.34,0.38,0.12,0.08,0.36,0.38,0.18),byrow

=byRow , nrow =4 , dimnames = list(DNAStates, DNAStates))

mcDNA <- new("markovchain", states = DNAStates, byrow = byRow, transitionMatrix = DNAMatrix, name = "DNA")

plot(mcDNA)

R code of null matrix :

DNAStates <- c("A", "C", "G","T")

byRow <- TRUE

DNAMatrix <- matrix(data = c(0.30,0.21,0.28,0.21,0.32,0.30,0.08,0.30,0.25,0.24,0.30,0.21,0.18,0.24,0.29,0.29),byrow

=byRow , nrow =4 , dimnames = list(DNAStates, DNAStates))

mcDNAnull <- new("markovchain", states = DNAStates, byrow = byRow, transitionMatrix = DNAMatrix, name =

"DNA")

plot(mcDNAnull)

References:

http://www.scs.leeds.ac.uk/scs-only/teaching-materials/HiddenMarkovModels/html_dev/main.html

A Revealing Introduction to Hidden Markov Models, Mark Stamp, Department of Computer Science

San Jose State University, September 28, 2012



Hidden Markov Models

Technology

Transcript of Hidden Markov Models