PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method....

24

Transcript of PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method....

Page 1: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution
Page 2: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

PhyloHMM: the future

• Bring back the HMM to the future

• Updating the biological model with phylogenetics

Combining Phylogenetic and HMMs in Biosequence Analysis

Adam Siepel & David Haussler (2004)

Page 3: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

What we’ll Discuss

• Motivation for updating HMMs

• Design of Phylo-HMM

• Tree Model

• DNA Substitution

• Evolutionary rate

➡Categories

➡Higher-order states

• Application to data & results

Page 4: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Recap on HMMs

• Dominant tool in biological sequence analysis

• Gene prediction, homology searching, structure ...

➡Balance simplicity and expressiveness

Page 5: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Antiquated HMMs

Your HMMs disregard three

decades of sequence evolution research.

...

“Biologist Biff ”“CS McFly”

Page 6: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Antiquated HMMs

• Sites are independent

• Substitutions are homogeneous

• Evolutionary rates are consistent

• Functional categories are disregarded Unrealistic model!

What’s the solution?

Page 7: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Enter Phylogeny

• Provides probabilistic models of evolution

• Based on

• Topology of tree (relatedness)

• Lengths of its branches (rates)

• Pattern of substitution (categories)

➡Time-based

➡Works across sequences

Page 8: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

HMM += Phylogeny

• Both are built on probabilistic models

• HMM operates along a sequence

• Phylogenetics operate between sequences

Space + Time = Phylo-HMM

Page 9: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

The Method

Page 10: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Input

• n aligned sequences of length L

• Phylogenetic tree relating the n taxa

! = (Q, ", #,$)

Substitution Matrix

Topology

Branch Lengths

Base Frequencies

Page 11: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Likelihood of a Tree

• Sites of the alignment are assumed independent

• Dynamic programming solution

P (X|!) =L!

i=1

P (Xi|!) ="

LP (L, Xi|!)

Labeling of ancestral nodes

Page 12: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

u

v w

tv tw

P (Xi|!) =!

a

"aP (Lr|a)

Recursion

Root Call

Page 13: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

DNA Substitution

• Probability that base b is substituted by base a over a branch of length t

P (b|a, t)

Page 14: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Evolutionary Rate

• Variate the rate of evolution by scaling the branches

• Discretize the gamma distribution into k rates

Scaling the branchesHow do we assign rates?

Page 15: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Rates HMM

• Autocorrelation (site i is the same as site i+1)

• Used in two step fitting process

1 3

2

k = 3

cj,l =1! !

k

cj,j = ! +1! !

k

Transitions

Page 16: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Category HMMs

• Use tree models for “functional categories”

• Topologies may vary, but are usually the same

Assignment

Transition

Tree Model

Emission

Page 17: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Category x Rates HMM

• Rate and function are orthogonal

• Create HMM that incorporates both

• Take the cross product of states, transitions

➡ scale the tree models What about slow evolving

coding regions?

Page 18: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Higher-Order States

• Emissions are context-dependent

• Adjust alphabet size to

• Increases complexity

• In practice, N = 2 or 3

|!|N+1 Complexity

O(nL|!|N+1)

Page 19: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Summary

- Assume k rate, q functional categories

- HMM of order N

- Estimate transition probabilities of categories

‣ Compute kq x L emission probabilities

‣ Train autocorrelation

‣ Run Viterbi

Page 20: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

The Results

Page 21: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Data

• Used portions of huge multiple alignment

• Trained using counting and annotations

Page 22: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Test

• Compared models with likelihood ratio test (LRT)

• 5 substitution models (includes higher order)

• REV, HKY, UNR, R2, R2S, U2S

• 3 rate variations -constant, gamma, autocorrelation

Page 23: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Results

• Higher states give largest boost

Ancestral Repeat WNT2

Page 24: PhyloHMM: the futurenakhleh/COMP571/Presentations/Troy.pdf · Space + Time = Phylo-HMM. The Method. ... Test • Compared models with likelihood ratio test (LRT) • 5 substitution

Questions?