Quantifying MCMC exploration of phylogenetic tree space
-
Upload
erick-matsen -
Category
Technology
-
view
759 -
download
3
description
Transcript of Quantifying MCMC exploration of phylogenetic tree space
![Page 1: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/1.jpg)
Quantifying MCMC explorationof phylogenetic tree space
Christopher Whidden and Frederick “Erick” A. Matsen IVFred Hutchinson Cancer Research Center
http://matsen.fhcrc.org @ematsen
![Page 2: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/2.jpg)
Phylogenetics: reconstruct evolutionary history from DNA
armadillo
giraffe
rat
human"phylogenetics"DNA or RNA sequence data
![Page 3: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/3.jpg)
Phylogenetics helps us learn how HIV-1 came to be
Etienne, Hahn, Sharp, Matsen and Emerman, Cell Host &Microbe, 2013
![Page 4: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/4.jpg)
We are fond of statistical approaches to phylogenetics
These are important when one would like a clear notion ofuncertainty (like medicine, epidemiology, and biodefense!)
![Page 5: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/5.jpg)
We are fond of statistical approaches to phylogenetics
In particular, Bayesian methods fall into this category and havebecome quite popular.
ACATGGCTC...ATACGTTCC...TTACGGTTC...ATCCGGTAC...ATACAGTCT...
...
We can’t solve for this posterior distribution, but we can satisfyour needs by getting a big sample from it.
![Page 6: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/6.jpg)
Markov chain Monte Carlo (MCMC)
Metropolis et al., 1953.
Set up a simulation such that the amount of time spent in a givenstate is proportional to the posterior probability of that state.
![Page 7: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/7.jpg)
Here we want a posterior on trees
If we want to use the same strategy to get a posterior onphylogenetic trees. . .
ACATGGCTC...ATACGTTCC...TTACGGTTC...ATCCGGTAC...ATACAGTCT...
...
we need a way to move from one phylogenetic tree to another.
![Page 8: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/8.jpg)
Subtree-prune-regraft (SPR) definition
1 4 5 61 2 3 4 5 6 1 2 34 5 6
2 3
![Page 9: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/9.jpg)
The set of trees as a graph connected by SPR moves(Figure from Mossel and Vigoda, Science, 2005).
![Page 10: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/10.jpg)
This graph is connected, and every tree has nonzeroposterior probability, so MCMC works†
We are guaranteed to converge to the posterior distribution ontrees by using Metropolis-Hastings moves built on these SPRs.
That is, by bouncing around “tree space” we can get a good ideaof a set of good trees.
† That is, it works if we run the MCMC forever
![Page 11: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/11.jpg)
We can’t run it forever.
News flash:
5 million < ∞
![Page 12: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/12.jpg)
With pathological data, can be hard to traverse peaks
goodness
![Page 13: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/13.jpg)
We wanted to know: does this happen in real data sets?
Lots of discussion in literature, but few clear conclusions.
In order to understand the reasons differentiating “easy” and“difficult” data sets for phylogenetic MCMC, we wanted to make itpossible to visualize tree space with a relevant geometry.
So, what trees are close to each other in terms of SPR moves?
![Page 14: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/14.jpg)
dSPR: how many SPR moves from one tree to another?
Say T1 T2 if there is an SPR transformation of T1 to T2.
dSPR(T ,S) = minT1 ··· Tk=S
k
This distance is NP-hard to compute. That’s no fun!
![Page 15: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/15.jpg)
Meet Chris Whidden, algorithms strongman
In a series of four very technical papers, Chris took exactcomputation of dSPR from O(infeasible) to O(feasible).
Then he joined my group!
![Page 16: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/16.jpg)
Let’s take some common data sets and see what we see
These are completely standard data sets of the sort that biologistsanalyze every day: slowly evolving nuclear, mitochondrial, orchloroplast genes.
Also used as examples in:
I Lakner et al., Syst. Biol., 2008I Hohna and Drummond, Syst. Biol., 2012I Larget, Syst. Biol., 2013
![Page 17: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/17.jpg)
Interested in high probability subsets of the SPR graph
![Page 18: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/18.jpg)
Summarize by subsetting to high probability nodes
node size proportional to posterior probability, and color shows distance to the highest PP tree.
![Page 19: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/19.jpg)
The top 4096 trees for a data set
![Page 20: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/20.jpg)
The top 4096 trees for a data set
What's up with this stuff?
Is it important? Is it difficult for the MCMC to see?
![Page 21: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/21.jpg)
Commute time definition
Commute time for a node y : how long to make the round tripfrom y to the highest posterior probability tree and back?
Any round trip path counts!
![Page 22: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/22.jpg)
Commute time definition
Commute time for a node y : how long to make the round tripfrom y to the highest posterior probability tree and back?
Any round trip path counts!
![Page 23: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/23.jpg)
Commute time plot for this data set
![Page 24: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/24.jpg)
The separation is problematic indeed
Yep, those parts of the posterior are important and MCMC has trouble entering them.
![Page 25: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/25.jpg)
Trees with 95% of posterior probability for another data set
![Page 26: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/26.jpg)
We can use our methods to identify source of bottlenecks
Bufo_valliceps
Ambystoma_mexicanum
Heterodon_platyrhinos
Gastrophryne_carolinensis
Rattus_norvegicus
Eleutherodactylus_cuneatus
Scaphiopus_holbrooki
Typhlonectes_natans
Grandisonia_alternans
Xenopus_laevis
Siren_intermedia
Amphiuma_tridactylum
Turdus_migratorius
Sceloporus_undulatus
Discoglossus_pictus
Homo_sapiens
Mus_musculus
Oryctolagus_cuniculus
Nesomantis_thomasseti
Plethodon_yonhalossee
Gallus_gallus
Ichthyophis_bannanicus
Hypogeophis_rostratus
Alligator_mississippiensis
Trachemys_scripta
Hyla_cinerea
Latimeria_chalumnae
Alligator_mississippiensis
Bufo_valliceps
Homo_sapiens
Amphiuma_tridactylum
Trachemys_scripta
Sceloporus_undulatus
Plethodon_yonhalossee
Scaphiopus_holbrooki
Oryctolagus_cuniculus
Siren_intermedia
Discoglossus_pictus
Ichthyophis_bannanicus
Nesomantis_thomasseti
Turdus_migratorius
Eleutherodactylus_cuneatus
Gastrophryne_carolinensis
Typhlonectes_natans
Ambystoma_mexicanum
Rattus_norvegicus
Gallus_gallus
Grandisonia_alternans
Heterodon_platyrhinos
Hyla_cinerea
Mus_musculus
Latimeria_chalumnae
Xenopus_laevis
Hypogeophis_rostratus
These are the trees at the two peaks of the connected components.Indeed, it’s very tricky to get between them!
![Page 27: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/27.jpg)
Multidimensional scaling visualizations via dSPR
![Page 28: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/28.jpg)
In general, a new way to explore tree space
![Page 29: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/29.jpg)
Our applications: it’s party time
I Automatic identification of (multiple) peaks in posteriorsI Performance of Metropolis-coupled Markov chain Monte Carlo
for getting between peaksI Accuracy of new “mean-field” posterior probability
approximationsI The first topological convergence diagnostic
These empirical investigations set the stage for additionaltheoretical development, and suggest new ways to move aroundtree space.
This will translate into better phylogenetic uncertainty estimates,and hence better preparedness and response to biological threats.
![Page 30: Quantifying MCMC exploration of phylogenetic tree space](https://reader034.fdocuments.in/reader034/viewer/2022052316/55849e55d8b42ac1328b52cf/html5/thumbnails/30.jpg)
Thank you
I Robert Beiko (Dalhousie University)I Aaron Darling (University of Technology, Sydney)I Connor McCoy (Fred Hutchinson Cancer Research Center)I NSF award 1223057