Carleton Biology talk : March 2014

47
@kcranstn http://slideshare.net/kcranstn Enabling science with the tree of life Karen Cranston National Evolutionary Synthesis Center (NESCent)

description

 

Transcript of Carleton Biology talk : March 2014

Page 1: Carleton Biology talk : March 2014

@kcranstn!http://slideshare.net/kcranstn

Enabling science with the tree of life

Karen Cranston!National Evolutionary Synthesis Center (NESCent)

Page 2: Carleton Biology talk : March 2014

The tree of life provides a means for organizing

and explaining biodiversity data

Weigmann et al. PNAS, 2011

Page 3: Carleton Biology talk : March 2014

What do we want from a Tree of Life?

❖ complete = contains all of biodiversity!

❖ dynamic = continuously updated with new data!

❖ available digitally = browse, query, download

Image: http://evolution.berkeley.edu

Page 4: Carleton Biology talk : March 2014

❖ Create a complete tree of life by synthesizing published phylogenetic data!

❖ Provide tools for managing, synthesizing & sharing phylogenetic data

http://opentreeoflife.org

Page 5: Carleton Biology talk : March 2014

Synthetic science❖ Novel methods & analysis tools!

❖ Big data from existing data

Biodiversity Synthesis Center / Encyclopedia of Life

National Evolutionary Synthesis Center

Page 6: Carleton Biology talk : March 2014

Challenges

❖ Incongruence: How do we detect and use conflict between trees?!

❖ Availability: What data do we have to construct a tree of life?!

❖ Synthesis: How do we combine data across the tree of life?

Page 7: Carleton Biology talk : March 2014

What can we learn from conflict between trees?

Page 8: Carleton Biology talk : March 2014

aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg...

Phylogenetic inference

Many likely trees

Gene tree uncertainty

Single gene alignment

Page 9: Carleton Biology talk : March 2014

Bayesian phylogenetic inference

Input: sequence data + evolutionary model

Output = list of sampled phylogenies

Page 10: Carleton Biology talk : March 2014

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Sampled trees

Pro

bab

ilit

y

Number of times sampled ∝ probability

Is there a stable backbone among the trees?!

What taxa have unstable placement?

Page 11: Carleton Biology talk : March 2014

Summarize with agreement subtrees

0.20 0.15

0.25Pr=0.40

1 23 4 5

1 2 3 4 51 23 4 5

1 2 3 4 5

Pr=1.00

0.20 0.15

0.25Pr=0.40

1 3 4 5 1 3 4 5

1 3 4 51 3 4 5

Page 12: Carleton Biology talk : March 2014

0.20 0.15

0.25Pr=0.40

1 23 4 5

1 2 4 3 51 23 4 5

1 2 3 4 5

Pr=0.85

0.20 0.15

0.25Pr=0.40

1 3 4 5 1 4 3 5

1 3 4 51 3 4 5

Cranston, K.A. and B.H. Rannala. Summarizing a posterior distribution of phylogenies using agreement subtrees. Systematic Biology 2007: 56(4), pp. 578-590.

Page 13: Carleton Biology talk : March 2014

Multiple sequence alignments

Concatenate

Supermatrix

Species tree

Supertrees

Gene duplication

Coalescent

Gene trees

Page 14: Carleton Biology talk : March 2014

Phylogenomics of rice (Oryza)820,000 BAC-end

sequences for 9 diploid Oryza species

1720 gene fragments!2.4 million nucleotides

Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010

What are the biological causes of gene tree

incongruence in rice?!

Do we need full genomes to answer these questions?

Page 15: Carleton Biology talk : March 2014

Phylogenomics of rice (Oryza)

Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010

Concatenated analysis

Page 16: Carleton Biology talk : March 2014

Gene trees in Oryza❖ Gene tree methods: recover every

possible topology!

❖ Species tree methods: many clades not statistically significant

Cranston, K.A., B. Hurwitz, D. Ware, L. Stein, R.A. Wing. Species trees from highly incongruent gene trees in rice. Systematic Biology. 2009: doi: 10.1093/syst- bio.syp054

Supermatrix topology

❖ Suggest incomplete lineage sorting and hybridization / introgression in evolutionary history of rice

Page 17: Carleton Biology talk : March 2014

What data do we have for creating a complete tree of life?

Page 18: Carleton Biology talk : March 2014

Gene tree signal in GenBank

How many trees can we build using all of the data in GenBank and how are those trees distributed across the tree of life?

Page 19: Carleton Biology talk : March 2014

All-vs-all BLAST at each NCBI taxonomy node

Sanderson, M.J., D.T. Boss, D. Chen, K.A. Cranston, and A. Wehe. The PhyLoTA Browser: Processing GenBank for molecular phylogenetics research. Systematic Biology 2008: 57(3).

Arachis hypogaea

Arachis hypogaea subsp. fastigiata

Arachis hypogaea subsp. hypogaea Arachis glabrata

subtree clusters

Arachis

Page 20: Carleton Biology talk : March 2014

All possible clusters, alignments and trees

aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg...

❖ ~90000 clusters, alignments, trees available for download!

❖ data availability matrix at each NCBI node

Page 21: Carleton Biology talk : March 2014
Page 22: Carleton Biology talk : March 2014

❖ complete = contains all of biodiversity!

❖ dynamic = continuously updated with new data!

❖ available digitally = browse, query, download

http://opentreeoflife.org

Page 23: Carleton Biology talk : March 2014

Gordon Burleigh Keith Crandall Karl Gude David Hibbett Mark Holder

Laura Katz Rick Ree Stephen Smith Doug Soltis Tiffani Williams

Computer science!Systematics!

Evolutionary theory!Computational biology!

Bioinformatics!Journalism

Page 24: Carleton Biology talk : March 2014
Page 25: Carleton Biology talk : March 2014

Even if there were phylogenies for all sequence clusters in GenBank, would only represent a

small fraction of biodiversity

Page 26: Carleton Biology talk : March 2014

Two types of inputs

Phylogeny!highly resolved!

computationally derived!limited coverage

Taxonomy!poorly resolved!

manually curated!much more complete

Page 27: Carleton Biology talk : March 2014

~7000 trees from ~2600 studies

Phylografter: Rick Ree, Field Museum of Natural History

Page 28: Carleton Biology talk : March 2014

thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.

To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution

Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.

Wiegmann et al. PNAS Early Edition | 3 of 6

EVOLU

TION

~ 4% of all published phylogenetic trees

Stoltzfus et al 2012

Trees generally published as pictures in PDFs

Page 29: Carleton Biology talk : March 2014

OpenTree Reference Taxonomy

+

+

+patch files for manual edits

+

3,133,028 nodes and 2,559,835 ‘species’

Jonathan Rees, NESCent

Page 30: Carleton Biology talk : March 2014

How do we combine data to build and use a tree of life?

Page 31: Carleton Biology talk : March 2014

Novel datastore for synthesis

Treemachine: Stephen Smith, Cody Hinchliff, Joseph Brown, U Michigan

Page 32: Carleton Biology talk : March 2014
Page 33: Carleton Biology talk : March 2014

Jim Allman, NESCent

Page 34: Carleton Biology talk : March 2014

Manual synthesis based on all data

Automated synthesis based on limited data

Page 35: Carleton Biology talk : March 2014

Inputs: Published phylogenies

Taxonomies

• filter / weight input trees • re-synthesize

• process feedback • input new trees

synthetic tree of life

Page 36: Carleton Biology talk : March 2014

Improving the synthetic tree

❖ Branch lengths & divergence times!

❖ Better synthesis using tree metadata!

❖ Community engagement!

❖ data deposition & curation!

❖ feedback & annotation

Page 37: Carleton Biology talk : March 2014

Moving beyond a single tree

❖ Detecting conflict and coverage!

❖ Visualization! !

❖ Enabling custom synthesis!

❖ Building out to other tools & resources

Page 38: Carleton Biology talk : March 2014

Leaf

Tree of LifeOPEN

What can we do with a tree of life?

Page 39: Carleton Biology talk : March 2014

aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg...

+ =

Image: Zephyris at the English language Wikipedia

10 million years

24 million years

Page 40: Carleton Biology talk : March 2014
Page 41: Carleton Biology talk : March 2014

Acer macrophyllum!Betula lutea!Aesculus glabra!Tilia americana!Ulmus rubra

Leaf patterns image from Walls RL: American Journal of Botany 2011, 98(2):244-253.

Acer macrophyllum

Betula alleghaniensis

Aesculus glabra

Tilia americana

Ulmus rubra

Page 42: Carleton Biology talk : March 2014

Stoltzfus, A., Lapp, H., Matasci, N., … Cranston, K.A., ... & Jordan, G. (2013). Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient. BMC bioinformatics, 14(1), 158.

Page 43: Carleton Biology talk : March 2014
Page 44: Carleton Biology talk : March 2014

Collaborative data collection!Validation of datasets!

Search & download across datasets

Page 45: Carleton Biology talk : March 2014

Get tree

Get tree

Page 46: Carleton Biology talk : March 2014

Leaf

Tree of LifeOPEN

What can we do with a tree of life?

Page 47: Carleton Biology talk : March 2014

University of Alberta: !! Bruce Rannala!!University of Arizona: !! Michael Sanderson!!NESCent:!! Jonathan Rees!! Jim Allman