EVE 161 Lecture 4

56
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Lecture 4 EVE 161: Microbial Phylogenomics Lecture #4: Era I: Modern View of the Tree of Life UC Davis, Winter 2014 Instructor: Jonathan Eisen

description

EVE 161 Lecture 4

Transcript of EVE 161 Lecture 4

Page 1: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Lecture 4

EVE 161:Microbial Phylogenomics

!Lecture #4:

Era I: Modern View of the Tree of Life !

UC Davis, Winter 2014 Instructor: Jonathan Eisen

Page 2: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Where we are going and where we have been

• Previous lecture: !3. Woese and the Tree of Life

• Current Lecture: !4. Modern view of Tree of Life

• Next Lecture: !5. Era II: rRNA from environment

!2

Page 3: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Two papers for today

Syst. Biol. 59(5):518–533, 2010c⃝ The Author(s) 2010. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.

For Permissions, please email: [email protected]:10.1093/sysbio/syq037Advance Access publication on July 23, 2010

Broadly Sampled Multigene Analyses Yield a Well-Resolved Eukaryotic Tree of Life

LAURA WEGENER PARFREY1, JESSICA GRANT2, YONAS I. TEKLE2,6, ERICA LASEK-NESSELQUIST3,4,HILARY G. MORRISON3, MITCHELL L. SOGIN3, DAVID J. PATTERSON5, AND LAURA A. KATZ1,2,∗

1Program in Organismic and Evolutionary Biology, University of Massachusetts, 611 North Pleasant Street, Amherst,MA 01003, USA; 2Department of Biological Sciences, Smith College, 44 College Lane, Northampton, MA 01063, USA; 3Bay Paul Center for

Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 4Department of Ecology andEvolutionary Biology, Brown University, 80 Waterman Street, Providence, RI 02912, USA; 5Biodiversity Informatics Group, Marine Biological

Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 6Present address: Department of Epidemiology and Public Health, Yale University School ofMedicine, New Haven, CT 06520, USA;

∗Correspondence to be sent to: Laura A. Katz, 44 College Lane, Northampton, MA 01003, USA; E-mail: [email protected] Wegener Parfrey and Jessica Grant have contributed equally to this work.

Received 30 September 2009; reviews returned 1 December 2009; accepted 25 May 2010Associate Editor: Cecile Ane

Abstract.—An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations underlying thediversity of microbial and macroscopic (e.g., plants and animals) eukaryotes. Previous work has divided eukaryotic diver-sity into a small number of high-level “supergroups,” many of which receive strong support in phylogenomic analyses.However, the abundance of data in phylogenomic analyses can lead to highly supported but incorrect relationships dueto systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or fewer) included in thesegenomic studies may exaggerate systematic error and reduce power to evaluate hypotheses. Here, we use a taxon-richstrategy to assess eukaryotic relationships. We show that analyses emphasizing broad taxonomic sampling (up to 451 taxarepresenting 72 major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic tree of life.The consistency across analyses with varying numbers of taxa (88–451) and levels of missing data (17–69%) supports theaccuracy of the resulting topologies. The resulting stable topology emerges without the removal of rapidly evolving genesor taxa, a practice common to phylogenomic analyses. Several major groups are stable and strongly supported in theseanalyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata” is rejected. Furthermore, ex-tensive instability among photosynthetic lineages suggests the presence of systematic biases including endosymbiotic genetransfer from symbiont (nucleus or plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionaryrelationships can be achieved with broad taxonomic sampling and a moderate number of genes. Finally, taxon-rich analy-ses such as presented here provide a method for testing the accuracy of relationships that receive high bootstrap support(BS) in phylogenomic analyses and enable placement of the multitude of lineages that lack genome scale data. [Excavata;microbial eukaryotes; Rhizaria; supergroups; systematic error; taxon sampling.]

Perspectives on the structure of the eukaryotic treeof life have shifted in the past decade as molecularanalyses provide hypotheses for relationships amongthe approximately 75 robust lineages of eukaryotes.These lineages are defined by ultrastructural identities(Patterson 1999)—patterns of cellular and subcellularorganization revealed by electron microscopy—and arestrongly supported in molecular analyses (Parfrey et al.2006; Yoon et al. 2008). Most of these lineages nowfall within a small number of higher level clades, thesupergroups of eukaryotes (Simpson and Roger 2004;Adl et al. 2005; Keeling et al. 2005). Several of theseclades—Opisthokonta, Rhizaria, and Amoebozoa—are increasingly well supported by phylogenomic(Rodrıguez-Ezpeleta et al. 2007a; Burki et al. 2008;Hampl et al. 2009) and phylogenetic (Parfrey et al.2006; Pawlowski and Burki 2009), analyses, whereassupport for “Archaeplastida” predominantly comesfrom some phylogenomic studies (Rodrıguez-Ezpeletaet al. 2005; Burki et al. 2007) or analyses of plastidgenes (Yoon et al. 2002; Parfrey et al. 2006). In con-trast, support for “Chromalveolata” and Excavata ismixed, often dependent on the selection of taxa in-cluded in analyses (Rodrıguez-Ezpeleta et al. 2005;Parfrey et al. 2006; Rodrıguez-Ezpeleta et al. 2007a;Burki et al. 2008; Hampl et al. 2009). We use quotation

marks throughout to note groups where uncertaintiesremain. Moreover, it is difficult to evaluate the overallstability of major clades of eukaryotes because phyloge-nomic analyses have 19 or fewer of the major lineagesand hence do not sufficiently sample eukaryotic diver-sity (Rodrıguez-Ezpeleta et al. 2007b; Burki et al. 2008;Hampl et al. 2009), whereas taxon-rich analyses with4 or fewer genes yield topologies with poor support atdeep nodes (Cavalier-Smith 2004; Parfrey et al. 2006;Yoon et al. 2008).

Estimating the relationships of the major lineagesof eukaryotes is difficult because of both the ancientage of eukaryotes (1.2–1.8 billion years; Knoll et al.2006) and complex gene histories that include hetero-geneous rates of molecular evolution and paralogy(Maddison 1997; Gribaldo and Philippe 2002; Tekleet al. 2009). A further issue obscuring eukaryotic re-lationships is the chimeric nature of the eukaryoticgenome—not all genes are vertically inherited due tolateral gene transfer (LGT) and endosymbiotic genetransfer (EGT)—that can also mislead efforts to re-construct phylogenetic relationships (Andersson 2005;Rannala and Yang 2008; Tekle et al. 2009). This is espe-cially true among photosynthetic lineages that comprise“Chromalveolata” and “Archaeplastida” where a largeportion of the host genome (approximately 8–18%) is

518

at University of C

alifornia, Davis - Library on January 16, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

first published online 24 October 2012, doi: 10.1098/rspb.2012.1795279 2012 Proc. R. Soc. B Tom A. Williams, Peter G. Foster, Tom M. W. Nye, Cymon J. Cox and T. Martin Embley the ArchaeaA congruent phylogenomic signal places eukaryotes within

Supplementary datatml http://rspb.royalsocietypublishing.org/content/suppl/2012/10/18/rspb.2012.1795.DC1.h

"Data Supplement"

References

http://rspb.royalsocietypublishing.org/content/279/1749/4870.full.html#related-urls Article cited in:

http://rspb.royalsocietypublishing.org/content/279/1749/4870.full.html#ref-list-1

This article cites 56 articles, 35 of which can be accessed free

This article is free to access

Subject collections

(178 articles)taxonomy and systematics (1595 articles)evolution

(25 articles)bioinformatics Articles on similar topics can be found in the following collections

Email alerting service hereright-hand corner of the article or click Receive free email alerts when new articles cite this article - sign up in the box at the top

http://rspb.royalsocietypublishing.org/subscriptions go to: Proc. R. Soc. BTo subscribe to

on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from

!3

Page 4: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Phylogeny Review

!4

Page 5: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Internal nodes represent hypothetical ancestral taxa

a b c d e f g h

root, root node

terminal (or tip) taxa

internal nodes

internal branches

u

v

wx

y

z

t

Terminal branches

Parts of a phylogenetic tree

!5

Page 6: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Characters

• A heritable feature of an organism is known as a character (also character trait or trait).

!

• The form that a character takes is known as its state (also known as character state). ! Note: Presence/absence can be a state !

• Example: ! Character = heart ! Character state = present/absent ! Character state = # of chambers

!6

Page 7: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Characters ancestry is critical to understand

• Characters that are inherited from a common ancestor are homologous.

• Species change over time ! Known (generally) as divergence, or divergent

evolution. ! Species change over time due to the combined

processes of mutation, recombination, drift, selection, etc

!7

Page 8: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Data matrices

!8

Page 9: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Sequence Alignment

!9

Page 10: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Tree reconstruction methods

!10

Page 11: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Some other bells and whistles

!11

Page 12: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Long branch attraction

!12

Page 13: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Homoplasy

!13

Page 14: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Bootstrapping

!14

Page 15: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Jacknifing

!15

Page 16: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Congruence

!16

Page 17: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Rooting

!17

Page 18: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Masking

!18

Page 19: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Concatenation

!19

Page 20: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Two papers for today

Syst. Biol. 59(5):518–533, 2010c⃝ The Author(s) 2010. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.

For Permissions, please email: [email protected]:10.1093/sysbio/syq037Advance Access publication on July 23, 2010

Broadly Sampled Multigene Analyses Yield a Well-Resolved Eukaryotic Tree of Life

LAURA WEGENER PARFREY1, JESSICA GRANT2, YONAS I. TEKLE2,6, ERICA LASEK-NESSELQUIST3,4,HILARY G. MORRISON3, MITCHELL L. SOGIN3, DAVID J. PATTERSON5, AND LAURA A. KATZ1,2,∗

1Program in Organismic and Evolutionary Biology, University of Massachusetts, 611 North Pleasant Street, Amherst,MA 01003, USA; 2Department of Biological Sciences, Smith College, 44 College Lane, Northampton, MA 01063, USA; 3Bay Paul Center for

Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 4Department of Ecology andEvolutionary Biology, Brown University, 80 Waterman Street, Providence, RI 02912, USA; 5Biodiversity Informatics Group, Marine Biological

Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 6Present address: Department of Epidemiology and Public Health, Yale University School ofMedicine, New Haven, CT 06520, USA;

∗Correspondence to be sent to: Laura A. Katz, 44 College Lane, Northampton, MA 01003, USA; E-mail: [email protected] Wegener Parfrey and Jessica Grant have contributed equally to this work.

Received 30 September 2009; reviews returned 1 December 2009; accepted 25 May 2010Associate Editor: Cecile Ane

Abstract.—An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations underlying thediversity of microbial and macroscopic (e.g., plants and animals) eukaryotes. Previous work has divided eukaryotic diver-sity into a small number of high-level “supergroups,” many of which receive strong support in phylogenomic analyses.However, the abundance of data in phylogenomic analyses can lead to highly supported but incorrect relationships dueto systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or fewer) included in thesegenomic studies may exaggerate systematic error and reduce power to evaluate hypotheses. Here, we use a taxon-richstrategy to assess eukaryotic relationships. We show that analyses emphasizing broad taxonomic sampling (up to 451 taxarepresenting 72 major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic tree of life.The consistency across analyses with varying numbers of taxa (88–451) and levels of missing data (17–69%) supports theaccuracy of the resulting topologies. The resulting stable topology emerges without the removal of rapidly evolving genesor taxa, a practice common to phylogenomic analyses. Several major groups are stable and strongly supported in theseanalyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata” is rejected. Furthermore, ex-tensive instability among photosynthetic lineages suggests the presence of systematic biases including endosymbiotic genetransfer from symbiont (nucleus or plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionaryrelationships can be achieved with broad taxonomic sampling and a moderate number of genes. Finally, taxon-rich analy-ses such as presented here provide a method for testing the accuracy of relationships that receive high bootstrap support(BS) in phylogenomic analyses and enable placement of the multitude of lineages that lack genome scale data. [Excavata;microbial eukaryotes; Rhizaria; supergroups; systematic error; taxon sampling.]

Perspectives on the structure of the eukaryotic treeof life have shifted in the past decade as molecularanalyses provide hypotheses for relationships amongthe approximately 75 robust lineages of eukaryotes.These lineages are defined by ultrastructural identities(Patterson 1999)—patterns of cellular and subcellularorganization revealed by electron microscopy—and arestrongly supported in molecular analyses (Parfrey et al.2006; Yoon et al. 2008). Most of these lineages nowfall within a small number of higher level clades, thesupergroups of eukaryotes (Simpson and Roger 2004;Adl et al. 2005; Keeling et al. 2005). Several of theseclades—Opisthokonta, Rhizaria, and Amoebozoa—are increasingly well supported by phylogenomic(Rodrıguez-Ezpeleta et al. 2007a; Burki et al. 2008;Hampl et al. 2009) and phylogenetic (Parfrey et al.2006; Pawlowski and Burki 2009), analyses, whereassupport for “Archaeplastida” predominantly comesfrom some phylogenomic studies (Rodrıguez-Ezpeletaet al. 2005; Burki et al. 2007) or analyses of plastidgenes (Yoon et al. 2002; Parfrey et al. 2006). In con-trast, support for “Chromalveolata” and Excavata ismixed, often dependent on the selection of taxa in-cluded in analyses (Rodrıguez-Ezpeleta et al. 2005;Parfrey et al. 2006; Rodrıguez-Ezpeleta et al. 2007a;Burki et al. 2008; Hampl et al. 2009). We use quotation

marks throughout to note groups where uncertaintiesremain. Moreover, it is difficult to evaluate the overallstability of major clades of eukaryotes because phyloge-nomic analyses have 19 or fewer of the major lineagesand hence do not sufficiently sample eukaryotic diver-sity (Rodrıguez-Ezpeleta et al. 2007b; Burki et al. 2008;Hampl et al. 2009), whereas taxon-rich analyses with4 or fewer genes yield topologies with poor support atdeep nodes (Cavalier-Smith 2004; Parfrey et al. 2006;Yoon et al. 2008).

Estimating the relationships of the major lineagesof eukaryotes is difficult because of both the ancientage of eukaryotes (1.2–1.8 billion years; Knoll et al.2006) and complex gene histories that include hetero-geneous rates of molecular evolution and paralogy(Maddison 1997; Gribaldo and Philippe 2002; Tekleet al. 2009). A further issue obscuring eukaryotic re-lationships is the chimeric nature of the eukaryoticgenome—not all genes are vertically inherited due tolateral gene transfer (LGT) and endosymbiotic genetransfer (EGT)—that can also mislead efforts to re-construct phylogenetic relationships (Andersson 2005;Rannala and Yang 2008; Tekle et al. 2009). This is espe-cially true among photosynthetic lineages that comprise“Chromalveolata” and “Archaeplastida” where a largeportion of the host genome (approximately 8–18%) is

518

at University of C

alifornia, Davis - Library on January 16, 2014

http://sysbio.oxfordjournals.org/D

ownloaded from

!20

Page 21: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Abstract.—An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations underlying the diversity of microbial and macroscopic (e.g., plants and animals) eukaryotes. Previous work has divided eukaryotic diver- sity into a small number of high-level “supergroups,” many of which receive strong support in phylogenomic analyses. However, the abundance of data in phylogenomic analyses can lead to highly supported but incorrect relationships due to systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or fewer) included in these genomic studies may exaggerate systematic error and reduce power to evaluate hypotheses. Here, we use a taxon-rich strategy to assess eukaryotic relationships. We show that analyses emphasizing broad taxonomic sampling (up to 451 taxa representing 72 major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic tree of life. The consistency across analyses with varying numbers of taxa (88–451) and levels of missing data (17–69%) supports the accuracy of the resulting topologies. The resulting stable topology emerges without the removal of rapidly evolving genes or taxa, a practice common to phylogenomic analyses. Several major groups are stable and strongly supported in these analyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata” is rejected. Furthermore, ex- tensive instability among photosynthetic lineages suggests the presence of systematic biases including endosymbiotic gene transfer from symbiont (nucleus or plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionary relationships can be achieved with broad taxonomic sampling and a moderate number of genes. Finally, taxon-rich analy- ses such as presented here provide a method for testing the accuracy of relationships that receive high bootstrap support (BS) in phylogenomic analyses and enable placement of the multitude of lineages that lack genome scale data. [Excavata; microbial eukaryotes; Rhizaria; supergroups; systematic error; taxon sampling.]

!21

Page 22: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !22

Page 23: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !23

Page 24: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !24

Page 25: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !25

Page 26: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !26

Page 27: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !27

Page 28: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !28

Page 29: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

451 Taxa

!29

Page 30: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

88 Taxa

!30

Page 31: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !31

Page 32: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !32

Page 33: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Just Rhizaria

!33

Page 34: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Just Excavata

!34

Page 35: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !35

Page 36: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !36

Page 37: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Two papers for today

first published online 24 October 2012, doi: 10.1098/rspb.2012.1795279 2012 Proc. R. Soc. B Tom A. Williams, Peter G. Foster, Tom M. W. Nye, Cymon J. Cox and T. Martin Embley the ArchaeaA congruent phylogenomic signal places eukaryotes within

Supplementary datatml http://rspb.royalsocietypublishing.org/content/suppl/2012/10/18/rspb.2012.1795.DC1.h

"Data Supplement"

References

http://rspb.royalsocietypublishing.org/content/279/1749/4870.full.html#related-urls Article cited in:

http://rspb.royalsocietypublishing.org/content/279/1749/4870.full.html#ref-list-1

This article cites 56 articles, 35 of which can be accessed free

This article is free to access

Subject collections

(178 articles)taxonomy and systematics (1595 articles)evolution

(25 articles)bioinformatics Articles on similar topics can be found in the following collections

Email alerting service hereright-hand corner of the article or click Receive free email alerts when new articles cite this article - sign up in the box at the top

http://rspb.royalsocietypublishing.org/subscriptions go to: Proc. R. Soc. BTo subscribe to

on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from

!37

Page 38: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Determining the relationships among the major groups of cellular life is important for understanding the evolution of biological diversity, but is difficult given the enormous time spans involved. In the textbook ‘three domains’ tree based on informational genes, eukaryotes and Archaea share a common ancestor to the exclusion of Bacteria. However, some phylogenetic analyses of the same data have placed eukaryotes within the Archaea, as the nearest relatives of different archaeal lineages. We compared the support for these competing hypotheses using sophisticated phylogenetic methods and an improved sampling of archaeal biodiversity. We also employed both new and existing tests of phylogenetic congruence to explore the level of uncertainty and conflict in the data. Our analyses suggested that much of the observed incongruence is weakly supported or associated with poorly fitting evolutionary models. All of our phylogenetic analyses, whether on small subunit and large subunit ribosomal RNA or concatenated protein-coding genes, recovered a monophyletic group containing eukaryotes and the TACK archaeal superphylum comprising the Thaumarchaeota, Aigarchaeota, Crenarchaeota and Korarchaeota. Hence, while our results provide no support for the iconic three-domain tree of life, they are consistent with an extended eocyte hypothesis whereby vital components of the eukaryotic nuclear lineage originated from within the archaeal radiation

!38

Page 39: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !39

Page 40: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !40

Page 41: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !41

Page 42: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !42

Page 43: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !43

Page 44: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !44

Page 45: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Archaeoglobus fulgidus

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Methanosarcina mazei

Thermoplasma volcanium

Giardia lamblia

Trichomonas vaginalisNaegleria gruberi

Arabidopsis thaliana

Homo sapiens

Saccharomyces cerevisiae

Thalassiosira pseudonana

Dictyostelium discoideum

Trypanosoma brucei

Entamoeba histolytica

Cenarchaeum symbiosum

Nitrosopumilus maritimus

Korarchaeum cryptofilum

Caldiarchaeum subterraneum

Caldivirga maquilingensis

Pyrobaculum aerophilum

Thermofilum pendens

Sulfolobus solfataricusStaphylothermus marinus

Hyperthermus butylicus

Ignicoccus hospitalis

Aeropyrum pernix

Campylobacter jejuni

Escherichia coli

Rhodopseudomonas palustris

Clostridium acetobutylicum

Synechocystis sp.

Treponema pallidum

Chlamydia trachomatis

Rhodopirellula baltica

1

1

1

1

0.83

1

0.2

(a)

Bacteria

Euryarchaeota

Crenarchaeota

Eukaryota

Trichomonas vaginalis

Arabidopsis thaliana

Giardia lamblia

Homo sapiens

Saccharomyces cerevisiae

Thalassiosira pseudonana

Dictyostelium discoideum

Trypanosoma brucei

Entamoeba histolytica

Naegleria gruberi

Archaeoglobus fulgidus

Methanosarcina mazei

Thermoplasma volcanium

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Caldivirga maquilingensis

Pyrobaculum aerophilum

Thermofilum pendens

Sulfolobus solfataricus

Hyperthermus butylicus

Staphylothermus marinus

Ignicoccus hospitalis

Aeropyrum pernix

Clostridium acetobutylicum

Synechocystis sp.

Campylobacter jejuni

Escherichia coli

Rhodopseudomonas palustris

Treponema pallidum

Chlamydia trachomatis

Rhodopirellula baltica

1

1

1

1

1

0.2

(b)

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Archaeoglobus fulgidus

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Methanosarcina mazei

Thermoplasma volcanium

Trichomonas vaginalis

Giardia lamblia

Naegleria gruberi

Entamoeba histolytica

Dictyostelium discoideum

Trypanosoma brucei

Arabidopsis thaliana

Homo sapiens

Saccharomyces cerevisiae

Thalassiosira pseudonana

Cenarchaeum symbiosum

Nitrosopumilus maritimus

Korarchaeum cryptofilum

Caldiarchaeum subterraneum

Caldivirga maquilingensis

Pyrobaculum aerophilum

Thermofilum pendens

Sulfolobus solfataricus

Hyperthermus butylicus

Ignicoccus hospitalis

Staphylothermus marinus

Aeropyrum pernix

Campylobacter jejuni

Escherichia coli

Rhodopseudomonas palustris

Clostridium acetobutylicum

Synechocystis sp.

Treponema pallidum

Chlamydia trachomatis

Rhodopirellula baltica

1

1

1

1

1

1

0.2

(c)

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Methanococcus jannaschii

Thermoplasma volcanium

Methanosarcina mazei

Archaeoglobus fulgidus

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Korarchaeum cryptofilum

Nitrosopumilus maritimus

Cenarchaeum symbiosum

Caldiarchaeum subterraneum

Giardia lamblia

Homo sapiens

Thalassiosira pseudonana

Saccharomyces cerevisiae

Trypanosoma brucei

Naegleria gruberi

Entamoeba histolytica

Trichomonas vaginalis

Dictyostelium discoideum

Arabidopsis thaliana

Thermofilum pendens

Pyrobaculum aerophilum

Caldivirga maquilingensis

Sulfolobus solfataricus

Staphylothermus marinus

Aeropyrum pernix

Ignicoccus hospitalis

Hyperthermus butylicus

Rhodopirellula baltica

Synechocystis sp.

Clostridium acetobutylicum

Treponema pallidum

Chlamydia trachomatis

Rhodopseudomonas palustris

Escherichia coli

Campylobacter jejuni

1

1

0.57

1

0.97

0.2

(d )

Figure 1. Phylogenies of Bacteria, Archaea and eukaryotes inferred from concatenated rRNA. (a) A Bayesian phylogeny of Bac-teria, Archaea and eukaryotes inferred under the GTR model, showing an eocyte-like topology in which eukaryotes emergefrom within the Archaea with maximal support (posterior probability (PP) ¼ 1). (b) Removal of recently characterized archaealgroups (the Thaumarchaeota, Aigarchaeota and Korarchaeota) converts this tree into a canonical three-domains topology,again with maximal support (PP ¼ 1), indicating that sampling plays an important role in the resolution of these ancientrelationships. Analyses of the full dataset using the better-fitting NDRH þNDCH (c) and CAT (d) models recover maximallysupported eocyte-like topologies; these models also recover eocyte-like topologies on the reduced dataset, without the TAKsequences (see the electronic supplementary material, figure S1). Branch lengths are proportional to substitutions per site.

Evolution of eukaryotes from Archaea T. A. Williams et al. 4873

Proc. R. Soc. B (2012)

on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from

!45

Page 46: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Archaeoglobus fulgidus

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Methanosarcina mazei

Thermoplasma volcanium

Giardia lamblia

Trichomonas vaginalisNaegleria gruberi

Arabidopsis thaliana

Homo sapiens

Saccharomyces cerevisiae

Thalassiosira pseudonana

Dictyostelium discoideum

Trypanosoma brucei

Entamoeba histolytica

Cenarchaeum symbiosum

Nitrosopumilus maritimus

Korarchaeum cryptofilum

Caldiarchaeum subterraneum

Caldivirga maquilingensis

Pyrobaculum aerophilum

Thermofilum pendens

Sulfolobus solfataricusStaphylothermus marinus

Hyperthermus butylicus

Ignicoccus hospitalis

Aeropyrum pernix

Campylobacter jejuni

Escherichia coli

Rhodopseudomonas palustris

Clostridium acetobutylicum

Synechocystis sp.

Treponema pallidum

Chlamydia trachomatis

Rhodopirellula baltica

1

1

1

1

0.83

1

0.2

(a)

Bacteria

Euryarchaeota

Crenarchaeota

Eukaryota

Trichomonas vaginalis

Arabidopsis thaliana

Giardia lamblia

Homo sapiens

Saccharomyces cerevisiae

Thalassiosira pseudonana

Dictyostelium discoideum

Trypanosoma brucei

Entamoeba histolytica

Naegleria gruberi

Archaeoglobus fulgidus

Methanosarcina mazei

Thermoplasma volcanium

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Caldivirga maquilingensis

Pyrobaculum aerophilum

Thermofilum pendens

Sulfolobus solfataricus

Hyperthermus butylicus

Staphylothermus marinus

Ignicoccus hospitalis

Aeropyrum pernix

Clostridium acetobutylicum

Synechocystis sp.

Campylobacter jejuni

Escherichia coli

Rhodopseudomonas palustris

Treponema pallidum

Chlamydia trachomatis

Rhodopirellula baltica

1

1

1

1

1

0.2

(b)

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Archaeoglobus fulgidus

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Methanosarcina mazei

Thermoplasma volcanium

Trichomonas vaginalis

Giardia lamblia

Naegleria gruberi

Entamoeba histolytica

Dictyostelium discoideum

Trypanosoma brucei

Arabidopsis thaliana

Homo sapiens

Saccharomyces cerevisiae

Thalassiosira pseudonana

Cenarchaeum symbiosum

Nitrosopumilus maritimus

Korarchaeum cryptofilum

Caldiarchaeum subterraneum

Caldivirga maquilingensis

Pyrobaculum aerophilum

Thermofilum pendens

Sulfolobus solfataricus

Hyperthermus butylicus

Ignicoccus hospitalis

Staphylothermus marinus

Aeropyrum pernix

Campylobacter jejuni

Escherichia coli

Rhodopseudomonas palustris

Clostridium acetobutylicum

Synechocystis sp.

Treponema pallidum

Chlamydia trachomatis

Rhodopirellula baltica

1

1

1

1

1

1

0.2

(c)

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Methanococcus jannaschii

Thermoplasma volcanium

Methanosarcina mazei

Archaeoglobus fulgidus

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Korarchaeum cryptofilum

Nitrosopumilus maritimus

Cenarchaeum symbiosum

Caldiarchaeum subterraneum

Giardia lamblia

Homo sapiens

Thalassiosira pseudonana

Saccharomyces cerevisiae

Trypanosoma brucei

Naegleria gruberi

Entamoeba histolytica

Trichomonas vaginalis

Dictyostelium discoideum

Arabidopsis thaliana

Thermofilum pendens

Pyrobaculum aerophilum

Caldivirga maquilingensis

Sulfolobus solfataricus

Staphylothermus marinus

Aeropyrum pernix

Ignicoccus hospitalis

Hyperthermus butylicus

Rhodopirellula baltica

Synechocystis sp.

Clostridium acetobutylicum

Treponema pallidum

Chlamydia trachomatis

Rhodopseudomonas palustris

Escherichia coli

Campylobacter jejuni

1

1

0.57

1

0.97

0.2

(d )

Figure 1. Phylogenies of Bacteria, Archaea and eukaryotes inferred from concatenated rRNA. (a) A Bayesian phylogeny of Bac-teria, Archaea and eukaryotes inferred under the GTR model, showing an eocyte-like topology in which eukaryotes emergefrom within the Archaea with maximal support (posterior probability (PP) ¼ 1). (b) Removal of recently characterized archaealgroups (the Thaumarchaeota, Aigarchaeota and Korarchaeota) converts this tree into a canonical three-domains topology,again with maximal support (PP ¼ 1), indicating that sampling plays an important role in the resolution of these ancientrelationships. Analyses of the full dataset using the better-fitting NDRH þNDCH (c) and CAT (d) models recover maximallysupported eocyte-like topologies; these models also recover eocyte-like topologies on the reduced dataset, without the TAKsequences (see the electronic supplementary material, figure S1). Branch lengths are proportional to substitutions per site.

Evolution of eukaryotes from Archaea T. A. Williams et al. 4873

Proc. R. Soc. B (2012)

on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from

With New Data

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Archaeoglobus fulgidus

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Methanosarcina mazei

Thermoplasma volcanium

Giardia lamblia

Trichomonas vaginalisNaegleria gruberi

Arabidopsis thaliana

Homo sapiens

Saccharomyces cerevisiae

Thalassiosira pseudonana

Dictyostelium discoideum

Trypanosoma brucei

Entamoeba histolytica

Cenarchaeum symbiosum

Nitrosopumilus maritimus

Korarchaeum cryptofilum

Caldiarchaeum subterraneum

Caldivirga maquilingensis

Pyrobaculum aerophilum

Thermofilum pendens

Sulfolobus solfataricusStaphylothermus marinus

Hyperthermus butylicus

Ignicoccus hospitalis

Aeropyrum pernix

Campylobacter jejuni

Escherichia coli

Rhodopseudomonas palustris

Clostridium acetobutylicum

Synechocystis sp.

Treponema pallidum

Chlamydia trachomatis

Rhodopirellula baltica

1

1

1

1

0.83

1

0.2

(a)

Bacteria

Euryarchaeota

Crenarchaeota

Eukaryota

Trichomonas vaginalis

Arabidopsis thaliana

Giardia lamblia

Homo sapiens

Saccharomyces cerevisiae

Thalassiosira pseudonana

Dictyostelium discoideum

Trypanosoma brucei

Entamoeba histolytica

Naegleria gruberi

Archaeoglobus fulgidus

Methanosarcina mazei

Thermoplasma volcanium

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Caldivirga maquilingensis

Pyrobaculum aerophilum

Thermofilum pendens

Sulfolobus solfataricus

Hyperthermus butylicus

Staphylothermus marinus

Ignicoccus hospitalis

Aeropyrum pernix

Clostridium acetobutylicum

Synechocystis sp.

Campylobacter jejuni

Escherichia coli

Rhodopseudomonas palustris

Treponema pallidum

Chlamydia trachomatis

Rhodopirellula baltica

1

1

1

1

1

0.2

(b)

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Archaeoglobus fulgidus

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Methanosarcina mazei

Thermoplasma volcanium

Trichomonas vaginalis

Giardia lamblia

Naegleria gruberi

Entamoeba histolytica

Dictyostelium discoideum

Trypanosoma brucei

Arabidopsis thaliana

Homo sapiens

Saccharomyces cerevisiae

Thalassiosira pseudonana

Cenarchaeum symbiosum

Nitrosopumilus maritimus

Korarchaeum cryptofilum

Caldiarchaeum subterraneum

Caldivirga maquilingensis

Pyrobaculum aerophilum

Thermofilum pendens

Sulfolobus solfataricus

Hyperthermus butylicus

Ignicoccus hospitalis

Staphylothermus marinus

Aeropyrum pernix

Campylobacter jejuni

Escherichia coli

Rhodopseudomonas palustris

Clostridium acetobutylicum

Synechocystis sp.

Treponema pallidum

Chlamydia trachomatis

Rhodopirellula baltica

1

1

1

1

1

1

0.2

(c)

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Methanococcus jannaschii

Thermoplasma volcanium

Methanosarcina mazei

Archaeoglobus fulgidus

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Korarchaeum cryptofilum

Nitrosopumilus maritimus

Cenarchaeum symbiosum

Caldiarchaeum subterraneum

Giardia lamblia

Homo sapiens

Thalassiosira pseudonana

Saccharomyces cerevisiae

Trypanosoma brucei

Naegleria gruberi

Entamoeba histolytica

Trichomonas vaginalis

Dictyostelium discoideum

Arabidopsis thaliana

Thermofilum pendens

Pyrobaculum aerophilum

Caldivirga maquilingensis

Sulfolobus solfataricus

Staphylothermus marinus

Aeropyrum pernix

Ignicoccus hospitalis

Hyperthermus butylicus

Rhodopirellula baltica

Synechocystis sp.

Clostridium acetobutylicum

Treponema pallidum

Chlamydia trachomatis

Rhodopseudomonas palustris

Escherichia coli

Campylobacter jejuni

1

1

0.57

1

0.97

0.2

(d )

Figure 1. Phylogenies of Bacteria, Archaea and eukaryotes inferred from concatenated rRNA. (a) A Bayesian phylogeny of Bac-teria, Archaea and eukaryotes inferred under the GTR model, showing an eocyte-like topology in which eukaryotes emergefrom within the Archaea with maximal support (posterior probability (PP) ¼ 1). (b) Removal of recently characterized archaealgroups (the Thaumarchaeota, Aigarchaeota and Korarchaeota) converts this tree into a canonical three-domains topology,again with maximal support (PP ¼ 1), indicating that sampling plays an important role in the resolution of these ancientrelationships. Analyses of the full dataset using the better-fitting NDRH þNDCH (c) and CAT (d) models recover maximallysupported eocyte-like topologies; these models also recover eocyte-like topologies on the reduced dataset, without the TAKsequences (see the electronic supplementary material, figure S1). Branch lengths are proportional to substitutions per site.

Evolution of eukaryotes from Archaea T. A. Williams et al. 4873

Proc. R. Soc. B (2012)

on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from

Without New Data

!46

Page 47: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Better Models

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Archaeoglobus fulgidus

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Methanosarcina mazei

Thermoplasma volcanium

Giardia lamblia

Trichomonas vaginalisNaegleria gruberi

Arabidopsis thaliana

Homo sapiens

Saccharomyces cerevisiae

Thalassiosira pseudonana

Dictyostelium discoideum

Trypanosoma brucei

Entamoeba histolytica

Cenarchaeum symbiosum

Nitrosopumilus maritimus

Korarchaeum cryptofilum

Caldiarchaeum subterraneum

Caldivirga maquilingensis

Pyrobaculum aerophilum

Thermofilum pendens

Sulfolobus solfataricusStaphylothermus marinus

Hyperthermus butylicus

Ignicoccus hospitalis

Aeropyrum pernix

Campylobacter jejuni

Escherichia coli

Rhodopseudomonas palustris

Clostridium acetobutylicum

Synechocystis sp.

Treponema pallidum

Chlamydia trachomatis

Rhodopirellula baltica

1

1

1

1

0.83

1

0.2

(a)

Bacteria

Euryarchaeota

Crenarchaeota

Eukaryota

Trichomonas vaginalis

Arabidopsis thaliana

Giardia lamblia

Homo sapiens

Saccharomyces cerevisiae

Thalassiosira pseudonana

Dictyostelium discoideum

Trypanosoma brucei

Entamoeba histolytica

Naegleria gruberi

Archaeoglobus fulgidus

Methanosarcina mazei

Thermoplasma volcanium

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Caldivirga maquilingensis

Pyrobaculum aerophilum

Thermofilum pendens

Sulfolobus solfataricus

Hyperthermus butylicus

Staphylothermus marinus

Ignicoccus hospitalis

Aeropyrum pernix

Clostridium acetobutylicum

Synechocystis sp.

Campylobacter jejuni

Escherichia coli

Rhodopseudomonas palustris

Treponema pallidum

Chlamydia trachomatis

Rhodopirellula baltica

1

1

1

1

1

0.2

(b)

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Archaeoglobus fulgidus

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Methanosarcina mazei

Thermoplasma volcanium

Trichomonas vaginalis

Giardia lamblia

Naegleria gruberi

Entamoeba histolytica

Dictyostelium discoideum

Trypanosoma brucei

Arabidopsis thaliana

Homo sapiens

Saccharomyces cerevisiae

Thalassiosira pseudonana

Cenarchaeum symbiosum

Nitrosopumilus maritimus

Korarchaeum cryptofilum

Caldiarchaeum subterraneum

Caldivirga maquilingensis

Pyrobaculum aerophilum

Thermofilum pendens

Sulfolobus solfataricus

Hyperthermus butylicus

Ignicoccus hospitalis

Staphylothermus marinus

Aeropyrum pernix

Campylobacter jejuni

Escherichia coli

Rhodopseudomonas palustris

Clostridium acetobutylicum

Synechocystis sp.

Treponema pallidum

Chlamydia trachomatis

Rhodopirellula baltica

1

1

1

1

1

1

0.2

(c)

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Methanococcus jannaschii

Thermoplasma volcanium

Methanosarcina mazei

Archaeoglobus fulgidus

Methanothermobacter thermautotrophicus

Pyrococcus furiosus

Korarchaeum cryptofilum

Nitrosopumilus maritimus

Cenarchaeum symbiosum

Caldiarchaeum subterraneum

Giardia lamblia

Homo sapiens

Thalassiosira pseudonana

Saccharomyces cerevisiae

Trypanosoma brucei

Naegleria gruberi

Entamoeba histolytica

Trichomonas vaginalis

Dictyostelium discoideum

Arabidopsis thaliana

Thermofilum pendens

Pyrobaculum aerophilum

Caldivirga maquilingensis

Sulfolobus solfataricus

Staphylothermus marinus

Aeropyrum pernix

Ignicoccus hospitalis

Hyperthermus butylicus

Rhodopirellula baltica

Synechocystis sp.

Clostridium acetobutylicum

Treponema pallidum

Chlamydia trachomatis

Rhodopseudomonas palustris

Escherichia coli

Campylobacter jejuni

1

1

0.57

1

0.97

0.2

(d )

Figure 1. Phylogenies of Bacteria, Archaea and eukaryotes inferred from concatenated rRNA. (a) A Bayesian phylogeny of Bac-teria, Archaea and eukaryotes inferred under the GTR model, showing an eocyte-like topology in which eukaryotes emergefrom within the Archaea with maximal support (posterior probability (PP) ¼ 1). (b) Removal of recently characterized archaealgroups (the Thaumarchaeota, Aigarchaeota and Korarchaeota) converts this tree into a canonical three-domains topology,again with maximal support (PP ¼ 1), indicating that sampling plays an important role in the resolution of these ancientrelationships. Analyses of the full dataset using the better-fitting NDRH þNDCH (c) and CAT (d) models recover maximallysupported eocyte-like topologies; these models also recover eocyte-like topologies on the reduced dataset, without the TAKsequences (see the electronic supplementary material, figure S1). Branch lengths are proportional to substitutions per site.

Evolution of eukaryotes from Archaea T. A. Williams et al. 4873

Proc. R. Soc. B (2012)

on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from

!47

Page 48: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !48

Page 49: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !49

Page 50: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !50

Page 51: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Concatenated Proteins

complete datasets are presented in figure 2; phylogeniesinferred from the CONCATERPILLAR-derived congruentsubsets are provided in the electronic supplementarymaterial, figures S2 and S3. The support from all theseanalyses for current hypotheses on eukaryotic origins issummarized in the electronic supplementary material,tables S4 and S5.

Our analyses including bacterial outgroups consist-ently supported the monophyly of eukaryotes with theTACK superphylum of Archaea, to the exclusion ofthe euryarchaeotes, although the strength of support forthis eocyte-like hypothesis varied with the subset of thedata analysed (see figure 2 and the electronic supplemen-tary material, tables S4 and S5). By contrast, we found nosupport for the three-domains hypothesis and the mono-phyly of Archaea from any of these analyses. While themonophyly of eukaryotes and the TACK superphylumwas consistently recovered, the specific relationshipswithin this clade were more ambiguous. The phylogenyinferred from the 63AE dataset recovered eukaryotes andthe TACK superphylum as separate clusters (figure 2b);in contrast, the 29BAE dataset and the two largestCONCATERPILLAR-derived congruent subsets inferredfrom the 64AE dataset supported the nesting of theeukaryotes within the TACK superphylum, either as

the neighbour of Korarchaeum or with the relationshipunresolved (see figure 2a and the electronic supplemen-tary material, tables S4 and S5). Given the sparsesampling of Korarchaeota and their relatives, and thelong branch leading to eukaryotes, this finding must betreated with caution [14]. In particular, when the bac-terial sequences were removed from the 29BAE datasetand the analyses were repeated, the relationship betweeneukaryotes, Korarchaeum and the rest of the TACK super-phylum collapsed to a trichotomy, suggesting that theKorarchaeum/eukaryote link is not strongly supported (seethe electronic supplementary material, figure S6). FurtherKorarchaeal genome sequences are likely to be veryinformative about this part of the tree of life. It is interest-ing to note that we consistently recovered a stronglysupported Thaumarchaeota/Aigarchaeota clade withinthe TACK group, confirming the relationship betweenthese groups [20] and suggesting that they do not representthe earliest-diverging archaeal lineage [16,49]; in ourtrees, the eukaryotes and the TACK superphylum consist-ently form a monophyletic group to the exclusion ofeuryarchaeotes (figure 2, electronic supplementarymaterial, figure S2).

With the exception of Korarchaeum, our analyses did notprovide support for a specific relationship between any

Bacteria

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Methanothermobacter thermautotrophicus

Methanococcus jannaschii

Thermoplasma volcanium

Methanosarcina mazei

Archaeoglobus fulgidus

Pyrococcus furiosus

Giardia lamblia

Trichomonas vaginalis

Thalassiosira pseudonana

Phytophthora ramorum

Saccharomyces cerevisiae

Homo sapiens

Entamoeba histolytica

Dictyostelium discoideum

Leishmania major

Arabidopsis thaliana

Korarchaeum cryptofilum

Nitrosopumilus maritimus

Nitrosoarchaeum limnia

Cenarchaeum symbiosum

Caldiarchaeum subterraneum

Thermofilum pendens

Pyrobaculum aerophilum

Caldivirga maquilingensis

Staphylothermus marinus

Sulfolobus solfataricus

Ignicoccus hospitalis

Aeropyrum pernix

Hyperthermus butylicus

Rhodopseudomonas palustris

Escherichia coli

Treponema pallidum

Rhodopirellula baltica

Chlamydia trachomatis

Synechocystis sp.

Clostridium acetobutylicum

Campylobacter jejuni

1

0.51

0.81

0.99

0.99

1

0.99

1

1

0.2

(a)

Euryarchaeota

Korarchaeota

Crenarchaeota

Aigarchaeota

Thaumarchaeota

Eukaryota

Pyrococcus furiosus

Methanococcus jannaschii

Methanothermobacter thermautotrophicus

Thermoplasma acidophilum

Archaeoglobus fulgidus

Methanosarcina mazei

Trichomonas vaginalis

Giardia lamblia

Entamoeba histolytica

Naegleria gruberi

Leishmania major

Dictyostelium discoideum

Saccharomyces cerevisiae

Homo sapiens

Arabidopsis thaliana

Thalassiosira pseudonana

Phytophthora ramorum

Korarchaeum cryptofilum

Caldiarchaeum subterraneum

Cenarchaeum symbiosum

Nitrosopumilus maritimus

Nitrosoarchaeum limnia

Thermofilum pendens

Pyrobaculum aerophilum

Caldivirga maquilingensis

Sulfolobus solfataricus

Ignicoccus hospitalis

Staphylothermus marinus

Hyperthermus butylicus

Aeropyrum pernix

1

1

1

0.99

1

1

0.5

(b)

Figure 2. Phylogenies of Bacteria, Archaea and eukaryotes inferred from conserved protein-coding genes. (a) A phylogenyinferred from 29 concatenated proteins conserved between Bacteria, Archaea and eukaryotes. An eocyte topology was recov-ered with strong (PP ¼ 0.99) support. In this phylogeny, the eukaryotes emerge as the sister group of Korarchaeum, nested withthe TACK superphylum. (b) A phylogeny inferred from 63 concatenated proteins shared between Archaea and eukaryotes. Theposition of the root is not explicitly indicated. However, based on the result from (a) and the electronic supplementary material,table S4, it is likely to be either within, or on the branch leading to, the Euryarchaea. If this position is correct, then the treeshows the eukaryotes emerging as the sister group to the TACK superphylum, including Korarchaeum. These trees wereinferred using the CAT model in PHYLOBAYES. Branch lengths are proportional to substitutions per site, except the truncatedbacterial branch in (a).

4874 T. A. Williams et al. Evolution of eukaryotes from Archaea

Proc. R. Soc. B (2012)

on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from

!51

Page 52: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !52

Page 53: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !53

Page 54: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Tree Congruence

3. CONCLUSIONSUnder the three-domains hypothesis, important com-ponents of the eukaryotic genetic machinery werevertically inherited from a common ancestor sharedwith Archaea, and this relationship is taken to explainthe shared properties of both groups. In an eocyte-likescenario, those same eukaryotic components were verti-cally inherited from an ancestor that was already anarchaeon, and the phylogenetic position of this ancestorcould be particularly informative about the genetic andmetabolic context of early eukaryotic evolution and for

theories of eukaryotic origins [1]. Here, we have com-pared support for these hypotheses and others, usingconserved components of the genetic machinery. Withan updated sampling of archaeal diversity, we found nosupport for the three-domains hypothesis either fromrRNA or protein-coding genes under any phylogeneticmodel. Instead, we detected a congruent phylogeneticsignal that placed essential informational genes of theeukaryotic nuclear lineage within the archaeal radiation,sharing common ancestry with the TACK superphylum.The monophyly of eukaryotic genes with the TACK

distance

freq

uenc

y

1 2 3 4 5

no. t

ests

pas

sed

(P >

0.0

5)

saturation andhomoplasy

site-specific biochemical diversity

compositionalheterogeneity

0

10

20

30

40

50

60model

CAT20LG

(b)

0

50

100

150

200

250

300

(a)

1.0 1.5 2.0 2.5 3.0

dens

ity

modelCAT20LG

0

0.2

0.4

0.6

0.8

1.0

1.2

(c)

distance

Figure 3. Analysing incongruence using a novel measure of distance between gene trees. We used distributions of pairwise geo-desic distances between gene trees to compare levels of incongruence inferred under different evolutionary models. (a) Thedistribution of distances under a single model (CAT20) can be used to identify obvious outliers corresponding to highly incon-gruent gene trees; a single gene was responsible for the peak highlighted in red, and was removed from subsequent analyses.(b) Overview of model-fitting tests (posterior predictive simulations) for each gene in the 64AE dataset. The height of the barsindicates the proportion of genes that ‘passed’ a test under a particular model; we said that a test was passed when the value ofthe test statistic on the real data fell within the central 95% of the distribution of values produced by posterior predictive simu-lation. The results suggest that CAT20 fits better than LG, successfully accounting for the observed levels of saturation andhomoplasy in all but one of the alignments. Both models do a poor job of modelling the site-specific selective constraints inour dataset, although again CAT20 performs better than LG (13 passes as opposed to 0). (c) Comparison of the distance dis-tributions inferred under the CAT20 and LG models. The trees inferred under the better-fitting CAT20 model are significantlymore congruent than those inferred under LG (mean distance: 2.68 versus 3.22, p , 0.0001). The significance of this differ-ence was assessed using a permutation test that took the correlations between pairwise distances into account (see §4). Theseresults suggest that a significant portion of the incongruence in this dataset of informational genes can be attributed to modelmisspecification, rather than genuinely distinct evolutionary histories.

4876 T. A. Williams et al. Evolution of eukaryotes from Archaea

Proc. R. Soc. B (2012)

on January 16, 2014rspb.royalsocietypublishing.orgDownloaded from

!54

Page 55: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !55

Page 56: EVE 161 Lecture 4

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !56