Download - . Odds and Ends Tutorial #13 © Ilan Gronau. 2 The Noisy Transmission Model.

.

Odds and EndsTutorial #13

© Ilan Gronau

2

The Noisy Transmission Model

3

The Noisy Transmission Model

0

I0

1

I1

0 1 -

0.75 0.2 0.05

0 1

0.5 0.5

0 1

0.5 0.5

0 1 -

0.2 0.75 0.05

0 1 I0 I1

0 0.63 0.27 0.1 --

1 0.09 0.81 -- 0.1

I00.56 0.24 0.2 --

I10.08 0.72 -- 0.2

Transitions:

Stationary distribution:(8, 24, 1, 3)/36

4

Questions

• Given an output sequence (including blanks), what is the most probable path which yields this sequence?(1.c) - Viterbi algorithm

• Given an output sequence, what is the most probable path to yield it, which passes through M non-noise states (0/1)?(1.d)

• Given an output sequence, what is the most probable path to yield it?(bonus)

• Given an output sequence, what is the most probable transmission?Problem: each transmission corresponds to multiple paths!

5

Answer to 1d

• Given an output sequence X1,…,Xn and M, we calculate the following values for all states S, i=1..n and j=1..M:

• vS(i,j) – log-probability of most probable path yielding output X1,

…,Xi, passing through j non-noise states, and ending in state S.

• Initialize: vS(0,0) – initial log-probability of S (stationary distribution)

For i,j>0 and a=0/1: ),()0,1(max),()0,( aSSiaI IStivXIeiva

Hold update-pointers

)0,(iva

),()1,0(max),(),0( aStjvaejv SSa ),0( jvaI

Most valuesare -∞

t(∙,∙), e(∙,∙) are log-probabilities

6

Answer to 1d

• Given an output sequence X1,…,Xn and M, we calculate the following values for all states S, i=1..n and j=1..M:

• vS(i,j) – log-probability of most probable path yielding output X1,

…,Xi, passing through j non-noise states, and ending in state S.

• Recursion formulae: (For i,j>0 and a=0/1) ),(),1(max),(),( aSSiaI IStjivXIejiv

a

),()1,(max),(

),()1,1(max),(max),(

aStjivae

aStjivXaejiv

SS

SSia

• At the end choose: and follow pointers to recover path ),(max MnvSS

Hold update-pointers

7

Bonus

• Given an output sequence, what is the most probable path to yield it?

Approach 1:

• If we don’t know M, then we can fill in the tables column by column

• Eventually the probability of columns starts deteriorating

Approach 2: a-priori bound• Note that an optimal path doesn’t have 2 consecutive deletions (-) Si

Si+

1

Si+2

-- --

Si Si+2

--

Pr <

Pr

Conclusion:M < 2n+2

8

2-species Evolution

Observe the following evolution model for binary-character vectors:

• Each specie corresponds to a binary vector in {0,1}n

• Two species Y,Z evolve from a common ancestor X

• Each bit in X is chosen uniformly by random

• Each bit in X is flipped w.p. θ during evolution towards Y or Z

Given binary vectors for Y, Z calculate most probable value for θ

• Define the sufficient statistics of the problem

• Give formula for L(θ)

• Formulate EM algorithm for the problem

• Give analytic solution (if exists) for MLE

X

Y Z

θθ

hidden

observed

9

2-species Evolution

Define the sufficient statistics of the problem

• Given Y = y1,…yn and Z = z1,…zn define n0=|{i | yi = zi}| , n1=|{i | yi ≠ zi}|

Give formula for L(θ)

• L(θ)= Pr[ Y,Z | θ]= Πi=1..n(Pr[ Yi,Zi | θ])=

X

Y Z

θθ

10 )1())1(( 222

1 nn

Yi Zi Xi Pr[Xi,Yi,Zi]

0 0 0 ½(1-θ)2

1 ½θ2

0 1 0 ½ θ(1-θ)

1 ½ θ(1-θ)

Similarly if Yi=1

10

2-species EvolutionFormulate EM algorithm for the problem

E – Given θ calculate the expected number of flips from X to Y and Z

E(#flips) = Σi=1..n(Pr[xi ≠ yi] + Pr[xi ≠ zi]) =

X

Y Z

θθ

Y Z X Pr[X,Y,Z] Pr[X|Y,Z]

0 0 0 ½(1-θ)2

1 ½θ2

0 1 0 ½ θ(1-θ)

1 ½ θ(1-θ)

22

2

)1()1(

22

2

)1(

21

21

#flips = sum of indicator variables

122

2

0 )1(2 nn

M – Given expected number of flips from X to Y and Z calculate θ’

θ’= E(#flips) / 2n

E+M –n

n

n

n

2)1(' 1

22

20

11

2-species Evolution

Give analytic solution (if exists) for MLE

•

•

• Find extreme-points of log-likelihood:

X

Y Z

θθ

10 )1())1(()( 222

1 nnL

)1(log))1((log))(log()( 122

21

0 nnLl

)1(

)21(

))1((

)12()'( 1

222

1

0

nn

l

))1(()1(

)1())1((

)21( 222

11022

21

nn

0

)1())1((

2)21(

)1())1((

)()()21(22

21

12

222

1

121

012

01

n

nnnnnnn

2

21,

1

21

21 n

n

minimum

maxima

12

Generalizing The Model

Alphabet of size k:

• Uniform transition model:

• More complex transition models

Evolution of n species (given the phylogenetic topology):

ba

bakba|1

|1]Pr[

X

1

X

2

X

3

θ2θ1

X

4

X

5

θ4θ3

Y

1

Y

2

Y

3

Y

nobserved

hidden

θi correlates to evolutionary distance along the edge

solves ‘small’ likelihood problem