Carleton Biology talk : March 2014

Post on 23-Aug-2014

140 views 1 download

Tags:

description

 

Transcript of Carleton Biology talk : March 2014

@kcranstn!http://slideshare.net/kcranstn

Enabling science with the tree of life

Karen Cranston!National Evolutionary Synthesis Center (NESCent)

The tree of life provides a means for organizing

and explaining biodiversity data

Weigmann et al. PNAS, 2011

What do we want from a Tree of Life?

❖ complete = contains all of biodiversity!

❖ dynamic = continuously updated with new data!

❖ available digitally = browse, query, download

Image: http://evolution.berkeley.edu

❖ Create a complete tree of life by synthesizing published phylogenetic data!

❖ Provide tools for managing, synthesizing & sharing phylogenetic data

http://opentreeoflife.org

Synthetic science❖ Novel methods & analysis tools!

❖ Big data from existing data

Biodiversity Synthesis Center / Encyclopedia of Life

National Evolutionary Synthesis Center

Challenges

❖ Incongruence: How do we detect and use conflict between trees?!

❖ Availability: What data do we have to construct a tree of life?!

❖ Synthesis: How do we combine data across the tree of life?

What can we learn from conflict between trees?

aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg...

Phylogenetic inference

Many likely trees

Gene tree uncertainty

Single gene alignment

Bayesian phylogenetic inference

Input: sequence data + evolutionary model

Output = list of sampled phylogenies

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Sampled trees

Pro

bab

ilit

y

Number of times sampled ∝ probability

Is there a stable backbone among the trees?!

What taxa have unstable placement?

Summarize with agreement subtrees

0.20 0.15

0.25Pr=0.40

1 23 4 5

1 2 3 4 51 23 4 5

1 2 3 4 5

Pr=1.00

0.20 0.15

0.25Pr=0.40

1 3 4 5 1 3 4 5

1 3 4 51 3 4 5

0.20 0.15

0.25Pr=0.40

1 23 4 5

1 2 4 3 51 23 4 5

1 2 3 4 5

Pr=0.85

0.20 0.15

0.25Pr=0.40

1 3 4 5 1 4 3 5

1 3 4 51 3 4 5

Cranston, K.A. and B.H. Rannala. Summarizing a posterior distribution of phylogenies using agreement subtrees. Systematic Biology 2007: 56(4), pp. 578-590.

Multiple sequence alignments

Concatenate

Supermatrix

Species tree

Supertrees

Gene duplication

Coalescent

Gene trees

Phylogenomics of rice (Oryza)820,000 BAC-end

sequences for 9 diploid Oryza species

1720 gene fragments!2.4 million nucleotides

Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010

What are the biological causes of gene tree

incongruence in rice?!

Do we need full genomes to answer these questions?

Phylogenomics of rice (Oryza)

Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010

Concatenated analysis

Gene trees in Oryza❖ Gene tree methods: recover every

possible topology!

❖ Species tree methods: many clades not statistically significant

Cranston, K.A., B. Hurwitz, D. Ware, L. Stein, R.A. Wing. Species trees from highly incongruent gene trees in rice. Systematic Biology. 2009: doi: 10.1093/syst- bio.syp054

Supermatrix topology

❖ Suggest incomplete lineage sorting and hybridization / introgression in evolutionary history of rice

What data do we have for creating a complete tree of life?

Gene tree signal in GenBank

How many trees can we build using all of the data in GenBank and how are those trees distributed across the tree of life?

All-vs-all BLAST at each NCBI taxonomy node

Sanderson, M.J., D.T. Boss, D. Chen, K.A. Cranston, and A. Wehe. The PhyLoTA Browser: Processing GenBank for molecular phylogenetics research. Systematic Biology 2008: 57(3).

Arachis hypogaea

Arachis hypogaea subsp. fastigiata

Arachis hypogaea subsp. hypogaea Arachis glabrata

subtree clusters

Arachis

All possible clusters, alignments and trees

aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg...

❖ ~90000 clusters, alignments, trees available for download!

❖ data availability matrix at each NCBI node

❖ complete = contains all of biodiversity!

❖ dynamic = continuously updated with new data!

❖ available digitally = browse, query, download

http://opentreeoflife.org

Gordon Burleigh Keith Crandall Karl Gude David Hibbett Mark Holder

Laura Katz Rick Ree Stephen Smith Doug Soltis Tiffani Williams

Computer science!Systematics!

Evolutionary theory!Computational biology!

Bioinformatics!Journalism

Even if there were phylogenies for all sequence clusters in GenBank, would only represent a

small fraction of biodiversity

Two types of inputs

Phylogeny!highly resolved!

computationally derived!limited coverage

Taxonomy!poorly resolved!

manually curated!much more complete

~7000 trees from ~2600 studies

Phylografter: Rick Ree, Field Museum of Natural History

thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.

To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution

Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.

Wiegmann et al. PNAS Early Edition | 3 of 6

EVOLU

TION

~ 4% of all published phylogenetic trees

Stoltzfus et al 2012

Trees generally published as pictures in PDFs

OpenTree Reference Taxonomy

+

+

+patch files for manual edits

+

3,133,028 nodes and 2,559,835 ‘species’

Jonathan Rees, NESCent

How do we combine data to build and use a tree of life?

Novel datastore for synthesis

Treemachine: Stephen Smith, Cody Hinchliff, Joseph Brown, U Michigan

Jim Allman, NESCent

Manual synthesis based on all data

Automated synthesis based on limited data

Inputs: Published phylogenies

Taxonomies

• filter / weight input trees • re-synthesize

• process feedback • input new trees

synthetic tree of life

Improving the synthetic tree

❖ Branch lengths & divergence times!

❖ Better synthesis using tree metadata!

❖ Community engagement!

❖ data deposition & curation!

❖ feedback & annotation

Moving beyond a single tree

❖ Detecting conflict and coverage!

❖ Visualization! !

❖ Enabling custom synthesis!

❖ Building out to other tools & resources

Leaf

Tree of LifeOPEN

What can we do with a tree of life?

aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg...

+ =

Image: Zephyris at the English language Wikipedia

10 million years

24 million years

Acer macrophyllum!Betula lutea!Aesculus glabra!Tilia americana!Ulmus rubra

Leaf patterns image from Walls RL: American Journal of Botany 2011, 98(2):244-253.

Acer macrophyllum

Betula alleghaniensis

Aesculus glabra

Tilia americana

Ulmus rubra

Stoltzfus, A., Lapp, H., Matasci, N., … Cranston, K.A., ... & Jordan, G. (2013). Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient. BMC bioinformatics, 14(1), 158.

Collaborative data collection!Validation of datasets!

Search & download across datasets

Get tree

Get tree

Leaf

Tree of LifeOPEN

What can we do with a tree of life?

University of Alberta: !! Bruce Rannala!!University of Arizona: !! Michael Sanderson!!NESCent:!! Jonathan Rees!! Jim Allman