Open Tree at UNCC Jan 2013

29
OPEN TREE OF LIFE: SYNTHESIZING PHYLOGENETIC DATA Karen Cranston National Evolutionary Synthesis Center (NESCent) @kcranstn http://www.slideshare.net/kcranstn

description

Talk given at UNC Charlotte Bioinformatics department about Open Tree of LIfe

Transcript of Open Tree at UNCC Jan 2013

Page 1: Open Tree at UNCC Jan 2013

OPEN TREE OF LIFE:SYNTHESIZING PHYLOGENETIC DATA

Karen CranstonNational Evolutionary Synthesis Center (NESCent)

@kcranstnhttp://www.slideshare.net/kcranstn

Page 2: Open Tree at UNCC Jan 2013

• ~2 million named species

• Millions more unnamed / undiscovered

Tree of life

Page 3: Open Tree at UNCC Jan 2013

Springer, M.S. et al. Nature (1997)Stanhope, M.J. et al. PNAS (1998)

Hedges, B. PNAS 98 (2001)

Nature © Macmillan Publishers Ltd 1997

letters to nature

NATURE | VOL 388 | 3 JULY 1997 61

12. Kutzbach, J. E. & Guetter, P. J. The influence of changing orbital patterns and surface boundaryconditions on climate simulations for the past 18,000 years. J. Atmos. Sci. 43, 1726–1759 (1986).

13. Hall, N. M. J., Valdes, P. J. & Dong, B. The maintenance of the last great ice sheets: a UGAMP GCMstudy. J. Clim. 9, 1004–1009 (1996).

14. Barnosky, C. W. A record of late-Quaternary vegetation from the southwestern Columbia Basin,Washington. Quat. Res. 23, 109–122 (1985).

15. Sarna-Wojcicki, A. J. in Late Quaternary Environments of the United States Vol. 2 (ed. Wright, H. E. Jr)52–77 (Univ. Minnesota, Minneapolis, 1983).

16. Berger, G. W. & Busacca, A. J. Thermoluminescence dating of Late Pleistocene loess and tephra fromeastern Washington and southern Oregon and implications for the eruptive history of Mount St.Helens. J. Geophys. Res. 100, 22361–22374 (1995).

17. Grimm, E. C. in Vegetation History (eds Huntley, B. & Webb, T.) 53–76 (Kluwer, Dordrecht, 1988).18. Martinson, D. G. et al. Age dating and the orbital theory of the ice ages: development of a high

resolution 0 to 300,000-year chronostratigraphy. Quat. Res. 27, 1–29 (1987).19. Mack, R. N. & Bryant, V. M. Jr Modern pollen spectra from the Columbia Basin, Washington.

Northwest Sci. 48, 183–194 (1974).20. Franklin, J. F. & Dyrness, C. T. Natural Vegetation of Oregon and Washington (Oregon State Univ.,

Corvallis, 1988).21. Bond, G. et al. Correlations between climate records from North Atlantic sediments and Greenland

ice. Nature 365, 143–147 (1993).22. Imbrie, J. et al. in Milankovitch and Climate (eds Berger, A., Imbrie, J., Hays, J., Kukla, G. & Saltzman,

B.) 269–305 (Reidel, Dordrecht, 1984).23. Berger, A. & Loutre, M. F. Insolation values for the last 10 million years. Quat. Sci. Rev. 10, 297–317

(1991).24. Greenland Ice-core Project (GRIP) Members. Climate instability during the last interglacial period

recorded in the GRIP ice core. Nature 364, 203–207 (1993).25. Thouveny, N. et al. Climate variations in Europe over the past 140 kyr deduced from rock magnetism.

Nature 371, 503–506 (1994).26. Morley, J. J., Pisias, N. G. & Leinen, M. Late Pleistocene time series of atmospheric and oceanic

variables recorded in sediments from the subarctic Pacific. Paleoceanography 2, 49–62 (1987).27. Stuiver, M. & Reimer, P. J. Extended 14C data base and revised CALIB 3.0 14C age calibration program.

Radiocarbon 35, 215–230 (1993).28. Bard, E., Hamelin, B., Fairbanks, R. G. & Zindler, A. Calibration of the 14C timescale over the past

30,000 years using mass spectrometric U-Th ages from Barbados corals. Nature 345, 405–410 (1990).29. Mazaud, A., Laj, C., Bard, E., Arnold, M. & Tric, A. E. Geomagnetic field control of 14C production

over the last 80 kyr: implications for the radiocarbon time-scale. Geophys. Res. Lett. 18, 1885–1888 (1991).30. Cleveland, W. S. Visualizing Data (Hobard, Summit, 1993).

Acknowledgements. We thank A. Sarna-Wojcicki for tephra identifications, R. J. Nickmann for help withthe pollen analysis, and J. Guiot and R. E. Gresswell for reviews. The work was supported by the NSF andthe Westinghouse-Hanford Paleoclimate Program.

Correspondence and requests for materials should be addressed to C.W. (e-mail: [email protected]).

EndemicAfricanmammalsshake thephylogenetic treeMark S. Springer*, Gregory C. Cleven*, Ole Madsen†,Wilfried W. de Jong†‡, Victor G. Waddell§,Heather M. Amrine* & Michael J. Stanhope§

* Department of Biology, University of California, Riverside, California 92521,USA† Department of Biochemistry, University of Nijmegen, PO Box 9101,6500 HB Nijmegen, The Netherlands‡ Institute for Systematics and Population Biology, University of Amsterdam, POBox 94766, 1090GT Amsterdam, The Netherlands§ Biology and Biochemistry, Queen’s University, 97 Lisburn Road,Belfast BT9 07BL, UK. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The order Insectivora, including living taxa (lipotyphlans) andarchaic fossil forms, is central to the question of higher-levelrelationships among placental mammals1. Beginning withHuxley2, it has been argued that insectivores retain many primi-tive features and are closer to the ancestral stock of mammals thanare other living groups3. Nevertheless, cladistic analysis suggeststhat living insectivores, at least, are united by derived anatomicalfeatures4. Here we analyse DNA sequences from three mito-chondrial genes and two nuclear genes to examine relationshipsof insectivores to other mammals. The representative insectivoresare not monophyletic in any of our analyses. Rather, golden molesare included in a clade that contains hyraxes, manatees, elephants,elephant shrews and aardvarks. Members of this group are ofpresumed African origin5,6. This implies that there was an exten-sive African radiation from a single common ancestor that gaverise to ecologically divergent adaptive types. 12S ribosomal RNAtransversions suggest that the base of this radiation occurredduring Africa’s window of isolation in the Cretaceous period

before land connections were developed with Europe in theearly Cenozoic era.

Relationships among orders of placental mammals have proveddifficult to resolve1. To extend the available mitochondrial (mt)sequences, a 2.6-kilobase (kb) segment containing the 12S rRNA,valine transfer RNA, and 16S rRNA genes was sequenced for ninetaxa to generate a data set that is representative of 12 of the 18placental orders and all three insectivore suborders4. Phylogeneticanalyses provide strong support for well-established mammalianclades such as carnivores, hominoids, and Cetacea plus Artiodactyla(Fig. 1a). In agreement with other molecular studies7–10 thatincluded an assortment of taxa, most interordinal associations arenot resolved at bootstrap values �75%. However, the mtDNA datado provide strong support for the association of the two paen-ungulates (hyrax, manatee) together, and of these with elephantshrews, aardvarks and golden moles (Fig. 1a and Table 1). Theassociation of hyraxes with proboscideans and sirenians was sug-gested by Cope11. A competing hypothesis is an association ofhyraxes with perissodactyls12. Our results agree with earlierprotein13,14 and DNA studies7–10 supporting Cope’s paenungulatehypothesis. In addition to bootstrap support, T-PTP15 and Kishino–Hasegawa16 tests also support paenungulate monophyly (Table 2).Anatomical data provide some evidence that aardvarks and/orelephant shrews may be related to paenungulates17,18 but suggestother hypotheses as well: for example, six osteological features areputative synapomorphies uniting elephant shrews with lagomorphsand rodents19. All the available sequence data, including amino-acidsequences13,14, DNA sequences for three nuclear genes8–10, and thepresent mitochondrial genes, support an association of aardvarksand elephant shrews with paenungulates. What is most unexpectedis that golden moles, a family of insectivores, are also part of thisclade. 12S rRNA sequences earlier suggested an association ofgolden moles with paenungulates, but did not provide convincingbootstrap support for this hypothesis7. Our expanded data setdemonstrates that insectivores are not monophyletic (Table 2)

Table 1 Bootstrap support for select clades based on different methods

Clade

Paenungulata Paenungulata � aardvark� elephant shrew

�golden mole

Mitochondrial DNAParsimony 99 95Transversion parsimony 64 90Minimum evolution

Tamura–Nei I 100 92Tamura–Nei II 100 78Logdet 99 90

Maximum likelihood 100 100

vWFParsimony

All positions 49 991st and 2nd positions 24 653rd positions 51 93

Transversion parsimony 30 95Minimum evolution

Tamura–Nei I 37 99Tamura–Nei II 30 99Logdet 43 97

Maximum likelihood 78 100

A2ABParsimony

All sites 71 881st and 2nd positions 49 813rd positions 31 67

Transversion parsimony 71 54Minimum evolution

Tamura–Nei I 83 84Tamura–Nei II 28 25Logdet 79 78

Maximum likelihood 81 89.............................................................................................................................................................................Only two of the three paenungulate orders were represented among the mitochondrial andA2AB sequences. Tamura–Nei27 I and II distances were calculated by using an equal-ratesassumption and a gamma-distribution of rates, respectively.

Page 4: Open Tree at UNCC Jan 2013

EVOLUTION OF HUMAN PATHOGENS

Smith et al, Nature, 2009

Page 5: Open Tree at UNCC Jan 2013

MODERN BIOLOGY NEEDS TREES

Modified from Garland and Carter, 1994

Conventional statistics assume:

Evolutionary trees provide:

Page 6: Open Tree at UNCC Jan 2013

0"

2000"

4000"

6000"

8000"

10000"

12000"

1978"1979"1980"1981"1982"1983"1984"1985"1986"1987"1988"1989"1990"1991"1992"1993"1994"1995"1996"1997"1998"1999"2000"2001"2002"2003"2004"2005"2006"2007"2008"

Num

ber'o

f'pap

ers'p

ublishe

d'

Year'

Phylogeny'papers,'1978;2008'

Source:"ISI"Web"of"Science""

Rapid"increase"in"applica?ons"of"phylogeny,"beginning"in"early"1990s"

graph from David Hillis

Page 7: Open Tree at UNCC Jan 2013
Page 8: Open Tree at UNCC Jan 2013

What does it mean to “have” the tree of life?

Page 9: Open Tree at UNCC Jan 2013

~4% of all published phylogenetic trees

Archiving sequence data is a community norm

Stoltzfus et al 2012

Page 10: Open Tree at UNCC Jan 2013

thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.

To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution

Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (!lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.

Wiegmann et al. PNAS Early Edition | 3 of 6

EVOLU

TION

Weigmann et al. PNAS, 2011

“Publishing a tree” = picture in a PDF

Page 11: Open Tree at UNCC Jan 2013

Lander et al. Nature 2001

Page 12: Open Tree at UNCC Jan 2013

Rod asks: Why do we need a database of trees?

Page 13: Open Tree at UNCC Jan 2013

assemblyalignmentinference

expertisetime$$$

thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.

To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution

Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (!lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.

Wiegmann et al. PNAS Early Edition | 3 of 6

EVOLU

TION

Page 14: Open Tree at UNCC Jan 2013
Page 15: Open Tree at UNCC Jan 2013

NSF IDEAS LAB

i. Pre-proposal / application

ii. 5 day highly facilitated workshop

iii. Self-assembly into groups

iv. Pitch high risk proposal ideas at end

v. NSF invited full proposals

Page 16: Open Tree at UNCC Jan 2013

1. Synthesize a complete draft tree of life from existing phylogenetic trees

2. Release with:

a. ability to improve tree by uploading new data

b. areas of uncertainty / conflict

c. links to source data and analysis methods

d. utilities to download whole tree and subtrees

Page 17: Open Tree at UNCC Jan 2013

Graph database holding thousands of input trees with

millions of nodes• filter / weight input trees• build synthetic trees

• compare to alternate trees• input new data sets

Page 18: Open Tree at UNCC Jan 2013

INPUTS

a AACCGTGAAb ACACGTTAAc AAACGTTAAd AGACGTTAAe AACCGTTAAf AAACGTGAA

e a c f b dcomputational

methodphylogenies

taxonomies

e a c f b d g h i

Page 19: Open Tree at UNCC Jan 2013

a dcb

a b c d

c,d

a,b,c,d

a,b

a b c d

c,d

a,b,c,d

a,b

dbca

Stephen Smith, U Michigan

Page 20: Open Tree at UNCC Jan 2013

a b c d

c,d

a,b,c,d

a,b

dbca

a b c d

c,d

a,b,c,d

a,b b,da,c

Stephen Smith, U Michigan

Page 21: Open Tree at UNCC Jan 2013

a b c d

c,d

a,b,c,d

a,b b,da,c

e ia f hd g

a b c d

c,d

a,b,c,d

a,b b,da,c

e,f,g,h,i

a,b,c,d,e,f,g,h,i

e f g h

Stephen Smith, U Michigan

Page 22: Open Tree at UNCC Jan 2013

a b c d

c,d

a,b,c,d

a,b b,da,c

e,f,g,h,i

a,b,c,d,e,f,g,h,i

e f g h a b c d

c,d

a,b,c,d

a,b b,da,c

e,f,g,h,i

a,b,c,d,e,f,g,h,i

e f g h

c

e f i

d

g h

a b

Stephen Smith, U Michigan

Page 23: Open Tree at UNCC Jan 2013

Dipsicales graphtaxonomy data (578 taxa) +Soltis et al APG III phylogeny (30 taxa)

Page 24: Open Tree at UNCC Jan 2013

Dipsicales graph Synthesized tree (favouring phylogenetic branches); contains all 578 taxa

Page 25: Open Tree at UNCC Jan 2013

2005 FLYNN ET AL.—SAMPLING AND RESOLUTION IN PHYLOGENY OF CARNIVORA 323

FIGURE 5. A schematic cladogram representing the major evolutionary relationships recovered in this analysis of Carnivora. Illustrations ofrepresentative taxa for major lineages include (from top): Nandinia binotata; Felidae (Lynx rufus); Viverridae (Viverra zibetha); Hyaenidae (Crocutacrocuta); Herpestidae (Mungos mungo); Malagasy carnivorans (Eupleres goudotii); Canidae (Canis lupus); Ursidae (Ursus americanus); Phocidae(Phoca vitulina); Otariidae (Zalophus californianus); Odobenidae (Odobenus rosmarus); Ailurus fulgens; Mephitidae (Mephitis mephitis); Procyonidae(Potos flavus); Mustelidae, basal/other mustelids (generalized schematic representing diverse taxa [African polecat and striped marten, badger,etc.]); Mustelidae, Martes-group (Gulo gulo); Mustelidae, Mustela (Mustela frenata); Mustelidae, Lutrinae (Lontra canadensis).

2005 FLYNN ET AL.—SAMPLING AND RESOLUTION IN PHYLOGENY OF CARNIVORA 323

FIGURE 5. A schematic cladogram representing the major evolutionary relationships recovered in this analysis of Carnivora. Illustrations ofrepresentative taxa for major lineages include (from top): Nandinia binotata; Felidae (Lynx rufus); Viverridae (Viverra zibetha); Hyaenidae (Crocutacrocuta); Herpestidae (Mungos mungo); Malagasy carnivorans (Eupleres goudotii); Canidae (Canis lupus); Ursidae (Ursus americanus); Phocidae(Phoca vitulina); Otariidae (Zalophus californianus); Odobenidae (Odobenus rosmarus); Ailurus fulgens; Mephitidae (Mephitis mephitis); Procyonidae(Potos flavus); Mustelidae, basal/other mustelids (generalized schematic representing diverse taxa [African polecat and striped marten, badger,etc.]); Mustelidae, Martes-group (Gulo gulo); Mustelidae, Mustela (Mustela frenata); Mustelidae, Lutrinae (Lontra canadensis).

2005 FLYNN ET AL.—SAMPLING AND RESOLUTION IN PHYLOGENY OF CARNIVORA 323

FIGURE 5. A schematic cladogram representing the major evolutionary relationships recovered in this analysis of Carnivora. Illustrations ofrepresentative taxa for major lineages include (from top): Nandinia binotata; Felidae (Lynx rufus); Viverridae (Viverra zibetha); Hyaenidae (Crocutacrocuta); Herpestidae (Mungos mungo); Malagasy carnivorans (Eupleres goudotii); Canidae (Canis lupus); Ursidae (Ursus americanus); Phocidae(Phoca vitulina); Otariidae (Zalophus californianus); Odobenidae (Odobenus rosmarus); Ailurus fulgens; Mephitidae (Mephitis mephitis); Procyonidae(Potos flavus); Mustelidae, basal/other mustelids (generalized schematic representing diverse taxa [African polecat and striped marten, badger,etc.]); Mustelidae, Martes-group (Gulo gulo); Mustelidae, Mustela (Mustela frenata); Mustelidae, Lutrinae (Lontra canadensis).

2005 FLYNN ET AL.—SAMPLING AND RESOLUTION IN PHYLOGENY OF CARNIVORA 323

FIGURE 5. A schematic cladogram representing the major evolutionary relationships recovered in this analysis of Carnivora. Illustrations ofrepresentative taxa for major lineages include (from top): Nandinia binotata; Felidae (Lynx rufus); Viverridae (Viverra zibetha); Hyaenidae (Crocutacrocuta); Herpestidae (Mungos mungo); Malagasy carnivorans (Eupleres goudotii); Canidae (Canis lupus); Ursidae (Ursus americanus); Phocidae(Phoca vitulina); Otariidae (Zalophus californianus); Odobenidae (Odobenus rosmarus); Ailurus fulgens; Mephitidae (Mephitis mephitis); Procyonidae(Potos flavus); Mustelidae, basal/other mustelids (generalized schematic representing diverse taxa [African polecat and striped marten, badger,etc.]); Mustelidae, Martes-group (Gulo gulo); Mustelidae, Mustela (Mustela frenata); Mustelidae, Lutrinae (Lontra canadensis).

community refinement

automatic updating

????

downstream analyses

education and outreach

Page 26: Open Tree at UNCC Jan 2013

• source / storage of underlying data

• provide complete phylogenetic framework

• link to biodiversity and systematics content

• API for downloading subtrees to analysis tools

Page 27: Open Tree at UNCC Jan 2013

• Open Data

• increasing availability of digital data associated with phylogeny publications

• synthetic tree open to community annotation and new data submission

• whole tree / subtrees available for download

?

Page 28: Open Tree at UNCC Jan 2013

• Open Science

• project wiki: http://opentree.wikispaces.com/

• open source software: https://github.com/OpenTreeOfLife

• public mailing list, meeting notes, management tools

?

Page 29: Open Tree at UNCC Jan 2013

opentreeoflife.org

Karen Cranston, lead PI (Duke)

Gordon Burleigh (Florida)

Keith Crandall (BYU)

Karl Gude (MSU)

David Hibbett (Clark)

Mark Holder (Kansas)

Laura Katz (Smith)

Rick Ree (FMNH)

Stephen Smith (Michigan)

Doug Soltis (Florida)

Tiffani Williams (TAMU)

AVAToL: Assembling, Visualizing and Analysis of the Tree of Life