EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

38
EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS

Transcript of EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Page 1: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

EVOLUTIONARY HMMSBAYESIAN APPROACH TO MULTIPLE ALIGNMENTSiva Theja MaguluriCS 598 SS

Page 2: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Goal

Given a set of sequences and a tree representing their evolutionary relationship, to find a multiple sequence alignment which maximizes the probability of the evolutionary relationships between the sequences.

Apr 22, 2023

2

Siva Theja Maguluri

Page 3: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Evolutionary Model

Pairwise likelihood for relation between two sequences

Reversibility

Additivity

Apr 22, 2023

3

Siva Theja Maguluri

Page 4: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Alignment can be inferred from the sequences using DP if Markov condition applies

Joint likelihood of a multiple alignment on a tree

Apr 22, 2023

4

Siva Theja Maguluri

Page 5: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Alignment Model

Substitution models

Apr 22, 2023

5

Siva Theja Maguluri

Page 6: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Links Model

Apr 22, 2023Siva Theja Maguluri

6

Birth Death process with Immigration ie each residue can either spawn a child or die

Birth rate λ, Death rate µ Immortal link at the left hand side Independent Homogenous Substitution

Page 7: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Probability evolution in Links Model

Apr 22, 2023Siva Theja Maguluri

7

Time evolution of the probability of a link surviving and spawning n descendants

Time evolution of the probability of a link dying before time t and spawning n descendants

Page 8: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Probability evolution in Links Model

Apr 22, 2023Siva Theja Maguluri

8

Time evolution of the probability of the immortal link spawning n descendants at time t

Page 9: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Probability evolution in Links Model

Apr 22, 2023Siva Theja Maguluri

9

Solution of these differential equations is

where

Page 10: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Probability evolution in Links Model

Apr 22, 2023Siva Theja Maguluri

10

Conceptually, α is the probability the ancestral residue survives

β is the probability of more insertions given one or more descendants

γ is the probability of insertion given ancestor did not survive

In the limit, immortal link generates residues according to geometric distribution

Page 11: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Links model as a Pair HMM

Apr 22, 2023Siva Theja Maguluri

11

Just like a standard HMM, but emits two sequences instead of one

Aligning two sequences with pair HMM, implicitly aligns the sequences

Page 12: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Pair HMM for Links model

Apr 22, 2023Siva Theja Maguluri

12

Either the residue lives or dies, spawning geometrically distributed residues in each case

Page 13: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Links model as a Pair HMM

Apr 22, 2023Siva Theja Maguluri

13

The path through the Pair HMM is π DP used to infer alignment of two

sequences Viterbi Algorithm for finding optimum

π Forward algorithm to sum over all

alignments or to sample from the posterior,

]Pr[ DA

Page 14: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Multiple HMMs

Apr 22, 2023Siva Theja Maguluri

14

Instead of emitting 2 sequences, emit N sequences

2N-1 emit states! Can develop such a model for any tree Viterbi and Forward algorithms use N

dimensional Dynamic programming Matrix Given a tree relating N sequences,

Multiple HMM can be constructed from Pair HMMs so that the likelihood function is

],},{Pr[ TS

Page 15: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Multiple HMMs

Apr 22, 2023Siva Theja Maguluri

15

Page 16: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Multiple HMMs

Apr 22, 2023Siva Theja Maguluri

16

Page 17: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Composing multiple alignment from branch alignments

Apr 22, 2023Siva Theja Maguluri

17

Residues Xi and Yj in a multiple alignment containing sequences X and Y are aligned iff They are in the same column That column contains no gaps for intermediate

sequences No deletion, re-insertion is allowed Ignoring all gap columns, provides and

unambiguous way of composing multiple alignment from branch alignments and vice versa

Page 18: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Eliminating internal nodes

Apr 22, 2023Siva Theja Maguluri

18

Internal nodes are Missing data Sum them out of the likelihood function Summing over indel histories will kill the

independence Sum over substitution histories using

post order traversal algorithm of Felsentein

Page 19: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Algorithm

Apr 22, 2023Siva Theja Maguluri

19

Progressive alignment – profiles of parents estimated by aligning siblings on a post order traversal – Impatient strategy

Iterative refinement – revisit branches following initial alignment phase – Greedy

Sample from a population of alignments, exploring suboptimal alignments in anticipation of long term improvements

Page 20: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Algorithm

Apr 22, 2023Siva Theja Maguluri

20

Moves to explore alignment space These moves need to be ergodic, i.e.

allow for transformation of any alignment into any other alignment

These moves need to satisfy detailed balance i.e. converges to desired stationary distribution

Page 21: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Move 1: Parent Sampling .

Apr 22, 2023Siva Theja Maguluri

21

Goal: Align two sibling nodes Y and Z and infer their parent X

Construct the multiple HMM for X,Y and Z

Sample an alignment of Y and Zusing the forward algorithm

This imposes an alignment of XZ and YZ Similar to sibling alignment step of

impatient-progressive alignment

Page 22: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Move 2: Branch Sampling

Apr 22, 2023Siva Theja Maguluri

22

Goal: realign two adjacent nodes X and Y Construct the pair HMM for X and Y,

fixing everything else Resample the alignment using the

forward algorithm This is similar to branch alignment step

of greedy-refined algorithm

Page 23: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Move 3: Node Sampling

Apr 22, 2023Siva Theja Maguluri

23

Goal: resample the sequence at an internal node X

Construct the multiple HMM and sample X, its parent W and children Y and Z, fixing everything else

Resample the sequence of X, conditioned on relative alignment of W,Y and Z

This is similar to inferring parent sequence lengths in impatient-progressive algorithms

Page 24: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Algorithm

Apr 22, 2023Siva Theja Maguluri

24

1. Parent sample up the guide tree and construct a multiple alignment

2. Visit each branch and node once for branch sampling or node sampling respectively

3. Repeat 2 to get more samples

Page 25: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Algorithm

Apr 22, 2023Siva Theja Maguluri

25

Replacing ‘sampling by Forward algorithm’ with ‘optimizing by Viterbi algorithm’

Impatient- Progressive is ML version of parent sampling

Greedy-refinement is ML version of Branch and node sampling

Page 26: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Gibbs sampling in ML context

Apr 22, 2023Siva Theja Maguluri

26

Periodically save current alignment, then take a greedy approach to record likelihood of refined alignment and get back to the saved alignment

Store this and compare likelihood to other alignments at the end of the run

Page 27: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Ordered over-relaxation

Apr 22, 2023Siva Theja Maguluri

27

Sampling is a random walk on Markov chain so follows Brownian motion ie rms drift grows as sqrt(n)

Would be better to avoid previously explored spaces ie ‘boldly go where no alignment has gone before’

Impose a strict weak order on alignments Sample N alignments at each stage and sort

them If the original sample ends up in position k,

choose the (N-k)th sample for the next emission

Page 28: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Implementation and results

Apr 22, 2023Siva Theja Maguluri

28

Page 29: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Implementation and results

Apr 22, 2023Siva Theja Maguluri

29

A True alignment B impatient progressive C greedy refined D Gibbs Sampling followed by Greedy

refinement E Gibbs sampling with simulated

annealing F Gibbs sampling with over relaxation G without Felsentein wild cards

Page 30: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Discussion

Outlines a very appealing Bayesian framework for multiple alignment

Performs very well, considering the simplicity of the model

Could add profile information and variable sized indels to the model to improve performance

Apr 22, 2023

30

Siva Theja Maguluri

Page 31: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Apr 22, 2023

31

Siva Theja Maguluri

Page 32: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Questions

Apr 22, 2023

32

Siva Theja Maguluri

Page 33: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Questions

Apr 22, 2023Siva Theja Maguluri

33

What is the assumption that enabled us to use this algorithm, enabling us to avoid the N dimensional matrices of DP ?

What is the importance of immortal link in the Links model ?

Page 34: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

References

“Evolutionary HMMs: a Bayesian approach to multiple alignment” - Holmes and Bruno. Bioinformatics 2001

Apr 22, 2023

34

Siva Theja Maguluri

Page 35: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

More results

Apr 22, 2023Siva Theja Maguluri

35

Page 36: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

More results

Apr 22, 2023Siva Theja Maguluri

36

Page 37: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

More results

Apr 22, 2023Siva Theja Maguluri

37

Page 38: EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

More results

Apr 22, 2023Siva Theja Maguluri

38

Poor performance on 4 is probably because Handel produces a global alignment and doesn’t handle affine gaps

Handle doesn’t incorporate any profile information

Handle cannot use BLOSUM (it’s not additive)