Disentangling Multiple Features in Video Sequences using ...
HMM for multiple sequences
description
Transcript of HMM for multiple sequences
![Page 1: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/1.jpg)
HMM for multiple sequences
![Page 2: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/2.jpg)
Pair HMM
HMM for pairwise sequence alignment, which incorporates affine gap scores.
“Hidden” States• Match (M)• Insertion in x (X)• insertion in y (Y)
Observation Symbols• Match (M): {(a,b)| a,b in ∑ }.• Insertion in x (X): {(a,-)| a in ∑ }.• Insertion in y (Y): {(-,a)| a in ∑ }.
![Page 3: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/3.jpg)
Pair HMMs
M
X
Y
1-
1-
1-2Begin
End
1--2
![Page 4: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/4.jpg)
Alignment: a path a hidden state sequence
A T - G T T A TA T C G T - A C
M M Y M M X M M
![Page 5: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/5.jpg)
Multiple sequence alignment(Globin family)
![Page 6: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/6.jpg)
Profile model (PSSM)
• A natural probabilistic model for a conserved region would be to specify independent probabilities ei(a) of observing nucleotide (amino acid) a in position i
• The probability of a new sequence x according to this model is
P(x | M) ei(x i)i1
L
![Page 7: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/7.jpg)
Profile / PSSMLTMTRGDIGNYLGLTVETISRLLGRFQKSGMLLTMTRGDIGNYLGLTIETISRLLGRFQKSGMILTMTRGDIGNYLGLTVETISRLLGRFQKSEILLTMTRGDIGNYLGLTVETISRLLGRLQKMGILLAMSRNEIGNYLGLAVETVSRVFSRFQQNELILAMSRNEIGNYLGLAVETVSRVFTRFQQNGLILPMSRNEIGNYLGLAVETVSRVFTRFQQNGLLVRMSREEIGNYLGLTLETVSRLFSRFGREGLILRMSREEIGSYLGLKLETVSRTLSKFHQEGLILPMCRRDIGDYLGLTLETVSRALSQLHTQGILLPMSRRDIADYLGLTVETVSRAVSQLHTDGVLLPMSRQDIADYLGLTIETVSRTFTKLERHGAI
•DNA / proteins Segments of the same length L;
•Often represented as Positional frequency matrix;
![Page 8: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/8.jpg)
Searching profiles: inference
• Give a sequence S of length L, compute the likelihood ratio of being generated from this profile vs. from background model:– R(S|P)=
– Searching motifs in a sequence: sliding window approach
L
i s
ii
b
xe
1
![Page 9: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/9.jpg)
Match states for profile HMMs
• Match states– Emission probabilities
Begin Mj End....
..
..
)(aeiM
![Page 10: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/10.jpg)
Components of profile HMMs
• Insert states– Emission prob.
• Usually back ground distribution qa.
– Transition prob.• Mi to Ii, Ii to itself, Ii to Mi+1
– Log-odds score for a gap of length k (no logg-odds from emission)
Begin Mj End
Ij
)(I aei
jjjjjjakaa II1MIIM log)1(loglog
![Page 11: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/11.jpg)
Components of profile HMMs
• Delete states– No emission prob.– Cost of a deletion
• M→D, D→D, D→M• Each D→D might be different
Begin Mj End
Dj
![Page 12: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/12.jpg)
Full structure of profile HMMs
Begin Mj End
Ij
Dj
![Page 13: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/13.jpg)
Deriving HMMs from multiple alignments
• Key idea behind profile HMMs– Model representing the consensus for the
alignment of sequence from the same family– Not the sequence of any particular member
HBA_HUMAN ...VGA--HAGEY...HBB_HUMAN ...V----NVDEV...MYG_PHYCA ...VEA--DVAGH...GLB3_CHITP ...VKG------D...GLB5_PETMA ...VYS--TYETS...LGB2_LUPLU ...FNA--NIPKH...GLB1_GLYDI ...IAGADNGAGV... *** *****
![Page 14: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/14.jpg)
Deriving HMMs from multiple alignments
• Basic profile HMM parameterization– Aim: making the higher probability for
sequences from the family
• Parameters– the probabilities values : trivial if many of
independent alignment sequences are given.
– length of the model: heuristics or systematic way
'' ' )'(
)()(
a k
kk
l kl
klkl aE
aEae
A
Aa
![Page 15: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/15.jpg)
Sequence conservation: entropy profile of the emission probability distributions
![Page 16: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/16.jpg)
Searching with profile HMMs
• Main usage of profile HMMs– Detecting potential sequences in a family– Matching a sequence to the profile HMMs
• Viterbi algorithm or forward algorithm
– Comparing the resulting probability with random model
i
xiqRxP )|(
![Page 17: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/17.jpg)
Searching with profile HMMs
• Viterbi algorithm (optimal log-odd alignment)
;log)(
,log)(
,log)(
max)(
;log)1(
,log)1(
,log)1(
max)(
log)(
;log)1(
,log)1(
,log)1(
max)(
log)(
DDD
1
DII
1
DMM
1
D
IDD
III
IMM
II
MDD
1
MII
1
MMM
1MM
1
1
1
1
1
1
jj
jj
jj
jj
jj
jj
i
j
jj
jj
jj
i
j
aiV
aiV
aiV
iV
aiV
aiV
aiV
q
xeiV
aiV
aiV
aiV
q
xeiV
j
j
j
j
j
j
j
x
i
j
j
j
j
x
i
j
![Page 18: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/18.jpg)
Searching with profile HMMs
• Forward algorithm: summing over all potent alignments
))];(exp(
))(exp())(exp(log[)(
))];1(exp())1(exp(
))1(exp(log[)(
log)(
))];1(exp())1(exp(
))1(exp(log[)(
log)(
D1DD
I1DI
M1DM
D
DID
III
MIM
II
D1MD
I1MI
M1MM
MM
1
11
11
1
iFa
iFaiFaiF
iFaiFa
iFaq
xeiF
iFaiFa
iFaq
xeiF
j
jjj
jj
jx
i
j
jj
jx
i
j
jj
jjjj
jjjj
jj
i
j
jjjj
jj
i
j
![Page 19: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/19.jpg)
Variants for non-global alignments
• Local alignments (flanking model)– Emission prob. in flanking states use background
values qa.
– Looping prob. close to 1, e.g. (1- ) for some small .
Mj
Ij
Dj
Begin End
Q Q
![Page 20: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/20.jpg)
Variants for non-global alignments
• Overlap alignments– Only transitions to the first model state are allowed.– When expecting to find either present as a whole or
absent– Transition to first delete state allows missing first
residue
Begin Mj End
IjQ
Dj
Q
![Page 21: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/21.jpg)
Variants for non-global alignments
• Repeat alignments– Transition from right flanking state back to random
model– Can find multiple matching segments in query string
Mj
Ij
Dj
Begin EndQ
![Page 22: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/22.jpg)
Estimation of prob.
• Maximum likelihood (ML) estimation– given observed freq. cja of residue a in position j.
• Simple pseudocounts– qa: background distribution
– A: weight factor
' 'M )(
a ja
ja
c
cae
j
' '
M )(a ja
aja
cA
Aqcae
j
![Page 23: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/23.jpg)
Optimal model construction: mark columns
beg M M M end
II II
D DD
x x . . . xbat A G - - - Crat A - A G - Ccat A G - A A -gnat - - A A A Cgoat A G - - - C 1 2 . . . 3
(a) Multiple alignment:
(b) Profile-HMM architecture:
0 1 2 3 4
0 1 2 3A - 4 0 0C - 0 0 4G - 0 3 0T - 0 0 0A 0 0 6 0C 0 0 0 0G 0 0 1 0T 0 0 0 0M-M 4 3 2 4M-D 1 1 0 0M-I 0 0 1 0I-M 0 0 2 0I-D 0 0 1 0I-I 0 0 4 0D-M - 0 0 1D-D - 1 0 0D-I - 0 2 0
(c) Observed emission/transition counts
matchemissions
insertemissions
statetransitions
![Page 24: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/24.jpg)
Optimal model construction
• MAP (match-insert assignment)– Recursive calculation of a number Sj
• Sj: log prob. of the optimal model for alignment up to and including column j, assuming j is marked.
• Sj is calculated from Si and summed log prob. between i and j.
• Tij: summed log prob. of all the state transitions between marked i and j.
– cxy are obtained from partial state paths implied by marking i and j.
ID,M,,
logyx
xyxyij acT
![Page 25: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/25.jpg)
Optimal model construction
• Algorithm: MAP model construction– Initialization:
• S0 = 0, ML+1 = 0.
– Recurrence: for j = 1,..., L+1:
– Traceback: from j = L+1, while j > 0:• Mark column j as a match column
• j = j.
;maxarg
;max
1,10
1,10
jijijiji
j
jijijiji
j
IMTS
IMTSS
![Page 26: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/26.jpg)
Weighting training sequences
• Input sequences are random?
• “Assumption: all examples are independent samples” might be incorrect
• Solutions– Weight sequences based on similarity
![Page 27: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/27.jpg)
Weighting training sequences
• Simple weighting schemes derived from a tree– Phylogenetic tree is given.– [Thompson, Higgins & Gibson 1994b]– [Gerstein, Sonnhammer & Chothia 1994]
nk k
ini w
wtw
below leaves
![Page 28: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/28.jpg)
Weighting training sequences
t4 = 8t3 = 5
t2 = 2t1 = 2
t5 = 3
t6 = 3
5
6
7
1 2 3 4
I4I1+I2
I1+I2+I3
V5
V6
V7
I1 I2
I3
I1:I2:I3:I4 = 20:20:32:47w1:w2:w3:w4 = 35:35:50:64
![Page 29: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/29.jpg)
Multiple alignment by training profile HMM
• Sequence profiles could be represented as probabilistic models like profile HMMs.– Profile HMMs could simply be used in place of
standard profiles in progressive or iterative alignment methods.
– ML methods for building (training) profile HMM (described previously) are based on multiple sequence alignment.
– Profile HMMs can also be trained from initially unaligned sequences using the Baum-Welch (EM) algorithm
![Page 30: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/30.jpg)
Multiple alignment by profile HMM training- Multiple alignment with a known profile HMM
• Before we estimate a model and a multiple alignment simultaneously, we consider as simpler problem: derive a multiple alignment from a known profile HMM model.– This can be applied to align a large member
of sequences from the same family based on the HMM model built from the (seed) multiple alignment of a small representative set of sequences in the family.
![Page 31: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/31.jpg)
Multiple alignment with a known profile HMM
• Align a sequence to a profile HMMViterbi algorithm
• Construction a multiple alignment just requires calculating a Viterbi alignment for each individual sequence.– Residues aligned to the same match state in
the profile HMM should be aligned in the same columns.
![Page 32: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/32.jpg)
Multiple alignment with a known profile HMM
• Given a preliminary alignment, HMM can align additional sequences.
![Page 33: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/33.jpg)
Multiple alignment with a known profile HMM
![Page 34: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/34.jpg)
Multiple alignment with a known profile HMM
• Important difference with other MSA programs– Viterbi path through HMM identifies inserts– Profile HMM does not align inserts– Other multiple alignment algorithms align the
whole sequences.
![Page 35: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/35.jpg)
Profile HMM training from unaligned sequences
• Harder problem– estimating both a model and a multiple alignment
from initially unaligned sequences.– Initialization: Choose the length of the profile HMM
and initialize parameters.– Training: estimate the model using the Baum-Welch
algorithm (iteratively).– Multiple Alignment: Align all sequences to the final
model using the Viterbi algorithm and build a multiple alignment as described in the previous section.
![Page 36: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/36.jpg)
Profile HMM training from unaligned sequences
• Initial Model– The only decision that must be made in
choosing an initial structure for Baum-Welch estimation is the length of the model M.
– A commonly used rule is to set M be the average length of the training sequence.
– We need some randomness in initial parameters to avoid local maxima.
![Page 37: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/37.jpg)
Multiple alignment by profile HMM training
• Avoiding Local maxima– Baum-Welch algorithm is guaranteed to find a
LOCAL maxima.• Models are usually quite long and there are many
opportunities to get stuck in a wrong solution.
– Solution• Start many times from different initial models.• Use some form of stochastic search algorithm, e.g.
simulated annealing.
![Page 38: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/38.jpg)
Multiple alignment by profile HMM -similar to Gibbs sampling
• The ‘Gibbs sampler’ algorithm described by Lawrence et al.[1993] has substantial similarities.– The problem was to simultaneously find the motif
positions and to estimate the parameters for a consensus statistical model of them.
– The statistical model used is essentially a profile HMM with no insert or delete states.
![Page 39: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/39.jpg)
Multiple alignment by profile HMM training-Model surgery
• We can modify the model after (or during) training a model by manually checking the alignment produced from the model.– Some of the match states are redundant– Some insert states absorb too many sequences
• Model surgery– If a match state is used by less than ½ of training
sequences, delete its module (match-insert-delete states)– If more than ½ of training sequences use a certain insert
state, expand it into n new modules, where n is the average length of insertions
– ad hoc, but works well
![Page 40: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/40.jpg)
Phylo-HMMs: model multiple alignments of syntenic sequences
• A phylo-HMM is a probabilistic machine that generates a multiple alignment, column by column, such that each column is defined by a phylogenetic model
• Unlike single-sequence HMMs, the emission probabilities of phylo-HMMs are complex distributions defined by phylogenetic models
![Page 41: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/41.jpg)
Applications of Phylo-HMMs
• Improving phylogenetic modeling that allow for variation among sites in the rate of substitution (Felsenstein & Churchill, 1996; Yang, 1995)
• Protein secondary structure prediction (Goldman et al., 1996; Thorne et al., 1996)
• Detection of recombination from DNA multiple alignments (Husmeier & Wright, 2001)
• Recently, comparative genomics (Siepel, et. al. Haussler, 2005)
![Page 42: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/42.jpg)
Phylo-HMMs: combining phylogeny and HMMs
• Molecular evolution can be viewed as a combination of two Markov processes– One that operates in the dimension of space
(along a genome)– One that operates in the dimension of time
(along the branches of a phylogenetic tree)
• Phylo-HMMs model this combination
![Page 43: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/43.jpg)
Single-sequence HMM Phylo-HMM
![Page 44: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/44.jpg)
Phylogenetic models
• Stochastic process of substitution that operates independently at each site in a genome
• A character is first drawn at random from the background distribution and assigned to the root of the tree; character substitutions then occur randomly along the tree branches, from root to leaves
• The characters at the leaves define an alignment column
![Page 45: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/45.jpg)
Phylogenetic Models
• The different phylogenetic models associated with the states of a phylo-HMM may reflect different overall rates of substitution (e.g. in conserved and non-conserved regions), different patterns of substitution or background distributions, or even different tree topologies (as with recombination)
![Page 46: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/46.jpg)
Phylo-HMMs: Formal Definition
• A phylo-HMM is a 4-tuple :– : set of hidden states – : set of associated phylogenetic
models– : transition probabilities– : initial probabilities
(S,, A,b)
S {s1,,sM }
{1,M }
A {a j ,k} (1 j,k M)
b (b1,,bM )
![Page 47: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/47.jpg)
The Phylogenetic Model
• :– : substitution rate matrix– : background frequencies– : binary tree– : branch lengths
j (Q j , j , j , j )
Q j
j
j
j
![Page 48: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/48.jpg)
The Phylogenetic Model
• The model is defined with respect to an alphabet whose size is denoted d
• The substitution rate matrix has dimension dxd• The background frequencies vector has
dimension d• The tree has n leaves, corresponding to n
extant taxa• The branch lengths are associated with the
tree
![Page 49: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/49.jpg)
Probability of the Data
• Let X be an alignment consisting of L columns and n rows, with the ith column denoted Xi
• The probability that column Xi is emitted by state sj is simply the probability of Xi under the corresponding phylogenetic model,
• This is the likelihood of the column given the tree, which can be computed efficiently using Felsenstein’s “pruning” algorithm (which we will describe in later lectures)
P(X i | j )
![Page 50: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/50.jpg)
Substitution Probabilities
• Felsenstein’s algorithm requires the conditional probabilities of substitution for all bases a,b and branch lengths tj
• The probability of substitution of a base b for a base a along a branch of length t, denoted
is based on a continuous-time Markov model of substitution, defined by the rate matrix Qj
P(b | a, t, j )
![Page 51: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/51.jpg)
Substitution Probabilities
• In particular, for any given non-negative value t, the conditional probabilities for all a,b are given the dxd matrix , where
P(b | a, t, j )
Pj (t) exp(Q j t)
exp(Q j t) (Q j t)
k
k!k0
![Page 52: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/52.jpg)
Example: HKY model
j ( A , j ,C , j ,G, j ,T , j )
j represents the transition/transversion rate ratio for
j
‘-’s indicate quantities required to normalize each row.
![Page 53: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/53.jpg)
State sequences in Phylo-HMMs
• A state sequence through the phylo-HMM is a sequence such that
• The joint probability of a path and and alignment is
(1,,L )
i S 1i L
L
ii iii
XPaXPXP2
1 )|()|()|,(111
![Page 54: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/54.jpg)
Phylo-HMMs
• The likelihood is given by the sum over all paths (forward algorithm)
• The maximum-likelihood path is (Vertebi’s)
P(X |) P(, X |)
argmax P(,X |)
![Page 55: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/55.jpg)
Computing the Probabilities
• The likelihood can be computed efficiently using the forward algorithm
• The maximum-likelihood path can be computed efficiently using the Viterbi algorithm
• The forward and backward algorithms can be combined to compute the posterior probability
P(i j | X,)
![Page 56: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/56.jpg)
Higher-order Markov Models for Emissions
• It is common with gene-finding HMMs to condition the emission probability of each observation on the observations that immediately precede it in the sequence
• For example, in a 3-rd-codon-position state, the emission of a base xi=“A” might have a fairly high probability if the previous two bases are xi-2=“G” and xi-1=“A” (GAA=Glu), but should have zero probability if the previous two bases are xi-2=“T” and xi-1=“A” (TAA=stop)
![Page 57: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/57.jpg)
Higher-order Markov Models for Emission
• Considering the N observations preceding each xi corresponds to using an Nth order Markov model for emissions
• An Nth order model for emissions is typically parameterized in terms of (N+1)-tuples of observations, and conditional probabilities are computed as
![Page 58: HMM for multiple sequences](https://reader033.fdocuments.in/reader033/viewer/2022051623/5681584c550346895dc5a650/html5/thumbnails/58.jpg)
Nth Order Phylo-HMMs
Sum over all possible alignment columns Y(can be calculated efficiently by a slight modificationof Felsenstein’s “pruning” algorithm)
Probability of the N-tuple