3
The Noisy Transmission Model
0
I0
1
I1
0 1 -
0.75 0.2 0.05
0 1
0.5 0.5
0 1
0.5 0.5
0 1 -
0.2 0.75 0.05
0 1 I0 I1
0 0.63 0.27 0.1 --
1 0.09 0.81 -- 0.1
I00.56 0.24 0.2 --
I10.08 0.72 -- 0.2
Transitions:
Stationary distribution:(8, 24, 1, 3)/36
4
Questions
• Given an output sequence (including blanks), what is the most probable path which yields this sequence?(1.c) - Viterbi algorithm
• Given an output sequence, what is the most probable path to yield it, which passes through M non-noise states (0/1)?(1.d)
• Given an output sequence, what is the most probable path to yield it?(bonus)
• Given an output sequence, what is the most probable transmission?Problem: each transmission corresponds to multiple paths!
5
Answer to 1d
• Given an output sequence X1,…,Xn and M, we calculate the following values for all states S, i=1..n and j=1..M:
• vS(i,j) – log-probability of most probable path yielding output X1,
…,Xi, passing through j non-noise states, and ending in state S.
• Initialize: vS(0,0) – initial log-probability of S (stationary distribution)
For i,j>0 and a=0/1: ),()0,1(max),()0,( aSSiaI IStivXIeiva
Hold update-pointers
)0,(iva
),()1,0(max),(),0( aStjvaejv SSa ),0( jvaI
Most valuesare -∞
t(∙,∙), e(∙,∙) are log-probabilities
6
Answer to 1d
• Given an output sequence X1,…,Xn and M, we calculate the following values for all states S, i=1..n and j=1..M:
• vS(i,j) – log-probability of most probable path yielding output X1,
…,Xi, passing through j non-noise states, and ending in state S.
• Recursion formulae: (For i,j>0 and a=0/1) ),(),1(max),(),( aSSiaI IStjivXIejiv
a
),()1,(max),(
),()1,1(max),(max),(
aStjivae
aStjivXaejiv
SS
SSia
• At the end choose: and follow pointers to recover path ),(max MnvSS
Hold update-pointers
7
Bonus
• Given an output sequence, what is the most probable path to yield it?
Approach 1:
• If we don’t know M, then we can fill in the tables column by column
• Eventually the probability of columns starts deteriorating
Approach 2: a-priori bound• Note that an optimal path doesn’t have 2 consecutive deletions (-) Si
Si+
1
Si+2
-- --
Si Si+2
--
Pr <
Pr
Conclusion:M < 2n+2
8
2-species Evolution
Observe the following evolution model for binary-character vectors:
• Each specie corresponds to a binary vector in {0,1}n
• Two species Y,Z evolve from a common ancestor X
• Each bit in X is chosen uniformly by random
• Each bit in X is flipped w.p. θ during evolution towards Y or Z
Given binary vectors for Y, Z calculate most probable value for θ
• Define the sufficient statistics of the problem
• Give formula for L(θ)
• Formulate EM algorithm for the problem
• Give analytic solution (if exists) for MLE
X
Y Z
θθ
hidden
observed
9
2-species Evolution
Define the sufficient statistics of the problem
• Given Y = y1,…yn and Z = z1,…zn define n0=|{i | yi = zi}| , n1=|{i | yi ≠ zi}|
Give formula for L(θ)
• L(θ)= Pr[ Y,Z | θ]= Πi=1..n(Pr[ Yi,Zi | θ])=
X
Y Z
θθ
10 )1())1(( 222
1 nn
Yi Zi Xi Pr[Xi,Yi,Zi]
0 0 0 ½(1-θ)2
1 ½θ2
0 1 0 ½ θ(1-θ)
1 ½ θ(1-θ)
Similarly if Yi=1
10
2-species EvolutionFormulate EM algorithm for the problem
E – Given θ calculate the expected number of flips from X to Y and Z
E(#flips) = Σi=1..n(Pr[xi ≠ yi] + Pr[xi ≠ zi]) =
X
Y Z
θθ
Y Z X Pr[X,Y,Z] Pr[X|Y,Z]
0 0 0 ½(1-θ)2
1 ½θ2
0 1 0 ½ θ(1-θ)
1 ½ θ(1-θ)
22
2
)1()1(
22
2
)1(
21
21
#flips = sum of indicator variables
122
2
0 )1(2 nn
M – Given expected number of flips from X to Y and Z calculate θ’
θ’= E(#flips) / 2n
E+M –n
n
n
n
2)1(' 1
22
20
11
2-species Evolution
Give analytic solution (if exists) for MLE
•
•
• Find extreme-points of log-likelihood:
X
Y Z
θθ
10 )1())1(()( 222
1 nnL
)1(log))1((log))(log()( 122
21
0 nnLl
)1(
)21(
))1((
)12()'( 1
222
1
0
nn
l
))1(()1(
)1())1((
)21( 222
11022
21
nn
0
)1())1((
2)21(
)1())1((
)()()21(22
21
12
222
1
121
012
01
n
nnnnnnn
2
21,
1
21
21 n
n
minimum
maxima
12
Generalizing The Model
Alphabet of size k:
• Uniform transition model:
• More complex transition models
Evolution of n species (given the phylogenetic topology):
ba
bakba|1
|1]Pr[
X
1
X
2
X
3
θ2θ1
X
4
X
5
θ4θ3
Y
1
Y
2
Y
3
Y
nobserved
hidden
θi correlates to evolutionary distance along the edge
solves ‘small’ likelihood problem
Top Related