Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in...

24
Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janou skovec a,b,c,d,1 , Gregory S. Gavelis e , Fabien Burki c,2 , Donna Dinh c , Tsvetan R. Bachvaroff f , Sebastian G. Gornik g , Kelley J. Bright h , Behzad Imanian c , Suzanne L. Strom h , Charles F. Delwiche i , Ross F. Waller j , Robert A. Fensome k , Brian S. Leander c,d,e , Forest L. Rohwer b,d , and Juan F. Saldarriaga c a Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, United Kingdom; b Biology Department, San Diego State University, San Diego, CA 92182; c Botany Department, University of British Columbia, Vancouver, BC V6T 1Z4, Canada; d Program in Integrated Microbial Diversity, Canadian Institute for Advanced Research, Toronto, ON M5G 1Z8, Canada; e Zoology Department, University of British Columbia, Vancouver, BC V6T 1Z4, Canada; f Institute for Marine and Environmental Technology, University of Maryland Center for Environmental Sciences, Baltimore, MD 21202; g Centre for Chromosome Biology, School of Natural Sciences, National University of Ireland, Galway, Ireland; h Shannon Point Marine Center, Western Washington University, Anacortes, WA 98221; i Department of Cell Biology and Molecular Genetics and Agricultural Experiment Station, University of Maryland, College Park, MD 20742; j Department of Biochemistry, University of Cambridge, Cambridge CB2 1QW, United Kingdom; and k Bedford Institute of Oceanography, Geological Survey of Canada (Atlantic), Dartmouth, NS B2Y 4A2, Canada Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved November 28, 2016 (received for review September 8, 2016) Dinoflagellates are key species in marine environments, but they remain poorly understood in part because of their large, complex genomes, unique molecular biology, and unresolved in-group relationships. We created a taxonomically representative dataset of dinoflagellate transcriptomes and used this to infer a strongly supported phylogeny to map major morphological and molecular transitions in dinoflagellate evolution. Our results show an early- branching position of Noctiluca, monophyly of thecate (plate-bearing) dinoflagellates, and paraphyly of athecate ones. This represents un- ambiguous phylogenetic evidence for a single origin of the groups cellulosic theca, which we show coincided with a radiation of cellu- lases implicated in cell division. By integrating dinoflagellate molec- ular, fossil, and biogeochemical evidence, we propose a revised model for the evolution of thecal tabulations and suggest that the late acquisition of dinosterol in the group is inconsistent with dino- flagellates being the source of this biomarker in pre-Mesozoic strata. Three distantly related, fundamentally nonphotosynthetic dinoflagellates, Noctiluca, Oxyrrhis, and Dinophysis, contain cryp- tic plastidial metabolisms and lack alternative cytosolic pathways, suggesting that all free-living dinoflagellates are metabolically de- pendent on plastids. This finding led us to propose general mech- anisms of dependency on plastid organelles in eukaryotes that have lost photosynthesis; it also suggests that the evolutionary origin of bioluminescence in nonphotosynthetic dinoflagellates may be linked to plastidic tetrapyrrole biosynthesis. Finally, we use our phylogenetic framework to show that dinoflagellate nuclei have recruited DNA-binding proteins in three distinct evolutionary waves, which included two independent acquisitions of bacterial histone-like proteins. dinoflagellates | phylogeny | theca | plastids | dinosterol D inoflagellates comprise approximately 2,400 named extant species, of which approximately half are photosynthetic (1). However, this represents a fraction of their estimated diversity: in surface marine waters, dinoflagellates are some of the most abun- dant and diverse eukaryotes known (2). Dinoflagellatesecological significance befits their abundance: photosynthetic species are dominant marine primary producers, and phagotrophic species play an important role in the microbial loop through predation and nutrient recycling. Approximately 7580% of the toxic eukaryotic phytoplankton species are dinoflagellates, and they cause shellfish poisoning and harmful algal blooms of global importance. Symbiotic genera like Symbiodinium participate in interactions with metazoans and are essential for the formation of reef ecosystems, and parasitic forms play a central role in the collapse of harmful algal blooms, including those caused by dinoflagellates themselves (3). Dinofla- gellates synthesize important secondary metabolites including ste- rols, polyketides, toxins, and dimethylsulfide, and several of them have evolved bioluminescence. They have a nonnucleosomal system of nuclear DNA packaging, widespread trans-splicing in mRNAs, and highly unusual plastid and mitochondrial genomes with com- plex transcript modifications (48). Their photosynthesis relies on unique light-harvesting complexes, and its frequent loss in the group makes dinoflagellates a model for understanding the basis of evo- lutionary reliance on nonphotosynthetic plastid organelles. Detailed understanding of dinoflagellate biology has been limited by a paucity of sequence data, especially unusual features such as the organization of their very large and complex nuclear genomes (9, 10). Poorly resolved dinoflagellate trees have fur- ther complicated predictions of how specific metabolic pathways evolved and how they are distributed in uncultured members of the group. To date, molecular phylogenies have established the deep-branching positions of Oxyrrhis marina (here included in the dinoflagellates) and the parasitic Syndiniales [possibly sev- eral lineages (11)], but the internal relationships in the so-called core dinoflagellates, that is, all other orders and most species in the group, have remained unresolved except at low taxonomic levels (1214). Traditionally, dinoflagellate taxonomy has been Significance We created a dataset of dinoflagellate transcriptomes to resolve internal phylogenetic relationships of the group. We show that the dinoflagellate theca originated once, through a process that likely involved changes in the metabolism of cellulose, and sug- gest that a late origin of dinosterol in the group is at odds with dinoflagellates being the source of this important biomarker before the Mesozoic. We also show that nonphotosynthetic di- noflagellates have retained nonphotosynthetic plastids with vital metabolic functions, and propose that one of these may be the evolutionary source of dinoflagellate bioluminescence. Finally, we reconstruct major molecular and morphological transitions in dinoflagellates and highlight the role of horizontal gene transfer in the origin of their unique nuclear architecture. Author contributions: J.J. and J.F.S. designed research; J.J., G.S.G., F.B., D.D., T.R.B., S.G.G., K.J.B., B.I., S.L.S., C.F.D., R.F.W., R.A.F., B.S.L., F.L.R., and J.F.S. performed research; J.J. analyzed data; and J.J. and J.F.S. wrote the paper with contributions from R.A.F. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Data deposition: The sequences reported in this paper have been deposited in the iMic- robe database (project code CAM_P_0001000) and GenBank Transcriptome Shotgun As- sembly (TSA) Sequence Database (accession nos. GELK00000000 and GEMP00000000). 1 To whom correspondence should be addressed. Email: [email protected]. 2 Present address: Department of Organismal Biology, Uppsala University, 75236 Uppsala, Sweden. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1614842114/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1614842114 PNAS Early Edition | 1 of 10 EVOLUTION PNAS PLUS

Transcript of Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in...

Page 1: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

Major transitions in dinoflagellate evolution unveiledby phylotranscriptomicsJan Janou�skoveca,b,c,d,1, Gregory S. Gavelise, Fabien Burkic,2, Donna Dinhc, Tsvetan R. Bachvarofff, Sebastian G. Gornikg,Kelley J. Brighth, Behzad Imanianc, Suzanne L. Stromh, Charles F. Delwichei, Ross F. Wallerj, Robert A. Fensomek,Brian S. Leanderc,d,e, Forest L. Rohwerb,d, and Juan F. Saldarriagac

aDepartment of Genetics, Evolution and Environment, University College London, London WC1E 6BT, United Kingdom; bBiology Department, San DiegoState University, San Diego, CA 92182; cBotany Department, University of British Columbia, Vancouver, BC V6T 1Z4, Canada; dProgram in IntegratedMicrobial Diversity, Canadian Institute for Advanced Research, Toronto, ON M5G 1Z8, Canada; eZoology Department, University of British Columbia,Vancouver, BC V6T 1Z4, Canada; fInstitute for Marine and Environmental Technology, University of Maryland Center for Environmental Sciences, Baltimore,MD 21202; gCentre for Chromosome Biology, School of Natural Sciences, National University of Ireland, Galway, Ireland; hShannon Point Marine Center,Western Washington University, Anacortes, WA 98221; iDepartment of Cell Biology and Molecular Genetics and Agricultural Experiment Station, Universityof Maryland, College Park, MD 20742; jDepartment of Biochemistry, University of Cambridge, Cambridge CB2 1QW, United Kingdom; and kBedfordInstitute of Oceanography, Geological Survey of Canada (Atlantic), Dartmouth, NS B2Y 4A2, Canada

Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved November 28, 2016 (received for review September 8, 2016)

Dinoflagellates are key species in marine environments, but theyremain poorly understood in part because of their large, complexgenomes, unique molecular biology, and unresolved in-grouprelationships. We created a taxonomically representative datasetof dinoflagellate transcriptomes and used this to infer a stronglysupported phylogeny to map major morphological and moleculartransitions in dinoflagellate evolution. Our results show an early-branching position of Noctiluca, monophyly of thecate (plate-bearing)dinoflagellates, and paraphyly of athecate ones. This represents un-ambiguous phylogenetic evidence for a single origin of the group’scellulosic theca, which we show coincided with a radiation of cellu-lases implicated in cell division. By integrating dinoflagellate molec-ular, fossil, and biogeochemical evidence, we propose a revisedmodel for the evolution of thecal tabulations and suggest that thelate acquisition of dinosterol in the group is inconsistent with dino-flagellates being the source of this biomarker in pre-Mesozoicstrata. Three distantly related, fundamentally nonphotosyntheticdinoflagellates, Noctiluca, Oxyrrhis, and Dinophysis, contain cryp-tic plastidial metabolisms and lack alternative cytosolic pathways,suggesting that all free-living dinoflagellates are metabolically de-pendent on plastids. This finding led us to propose general mech-anisms of dependency on plastid organelles in eukaryotes that havelost photosynthesis; it also suggests that the evolutionary originof bioluminescence in nonphotosynthetic dinoflagellates may belinked to plastidic tetrapyrrole biosynthesis. Finally, we use ourphylogenetic framework to show that dinoflagellate nuclei haverecruited DNA-binding proteins in three distinct evolutionarywaves, which included two independent acquisitions of bacterialhistone-like proteins.

dinoflagellates | phylogeny | theca | plastids | dinosterol

Dinoflagellates comprise approximately 2,400 named extantspecies, of which approximately half are photosynthetic (1).

However, this represents a fraction of their estimated diversity: insurface marine waters, dinoflagellates are some of the most abun-dant and diverse eukaryotes known (2). Dinoflagellates’ ecologicalsignificance befits their abundance: photosynthetic species aredominant marine primary producers, and phagotrophic species playan important role in the microbial loop through predation andnutrient recycling. Approximately 75–80% of the toxic eukaryoticphytoplankton species are dinoflagellates, and they cause shellfishpoisoning and harmful algal blooms of global importance. Symbioticgenera like Symbiodinium participate in interactions with metazoansand are essential for the formation of reef ecosystems, and parasiticforms play a central role in the collapse of harmful algal blooms,including those caused by dinoflagellates themselves (3). Dinofla-gellates synthesize important secondary metabolites including ste-rols, polyketides, toxins, and dimethylsulfide, and several of them

have evolved bioluminescence. They have a nonnucleosomal systemof nuclear DNA packaging, widespread trans-splicing in mRNAs,and highly unusual plastid and mitochondrial genomes with com-plex transcript modifications (4–8). Their photosynthesis relies onunique light-harvesting complexes, and its frequent loss in the groupmakes dinoflagellates a model for understanding the basis of evo-lutionary reliance on nonphotosynthetic plastid organelles.Detailed understanding of dinoflagellate biology has been

limited by a paucity of sequence data, especially unusual featuressuch as the organization of their very large and complex nucleargenomes (9, 10). Poorly resolved dinoflagellate trees have fur-ther complicated predictions of how specific metabolic pathwaysevolved and how they are distributed in uncultured members ofthe group. To date, molecular phylogenies have established thedeep-branching positions of Oxyrrhis marina (here included inthe dinoflagellates) and the parasitic Syndiniales [possibly sev-eral lineages (11)], but the internal relationships in the so-calledcore dinoflagellates, that is, all other orders and most species inthe group, have remained unresolved except at low taxonomiclevels (12–14). Traditionally, dinoflagellate taxonomy has been

Significance

We created a dataset of dinoflagellate transcriptomes to resolveinternal phylogenetic relationships of the group. We show thatthe dinoflagellate theca originated once, through a process thatlikely involved changes in the metabolism of cellulose, and sug-gest that a late origin of dinosterol in the group is at odds withdinoflagellates being the source of this important biomarkerbefore the Mesozoic. We also show that nonphotosynthetic di-noflagellates have retained nonphotosynthetic plastids with vitalmetabolic functions, and propose that one of these may be theevolutionary source of dinoflagellate bioluminescence. Finally,we reconstruct major molecular and morphological transitions indinoflagellates and highlight the role of horizontal gene transferin the origin of their unique nuclear architecture.

Author contributions: J.J. and J.F.S. designed research; J.J., G.S.G., F.B., D.D., T.R.B., S.G.G.,K.J.B., B.I., S.L.S., C.F.D., R.F.W., R.A.F., B.S.L., F.L.R., and J.F.S. performed research; J.J.analyzed data; and J.J. and J.F.S. wrote the paper with contributions from R.A.F.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequences reported in this paper have been deposited in the iMic-robe database (project code CAM_P_0001000) and GenBank Transcriptome Shotgun As-sembly (TSA) Sequence Database (accession nos. GELK00000000 and GEMP00000000).1To whom correspondence should be addressed. Email: [email protected] address: Department of Organismal Biology, Uppsala University, 75236 Uppsala,Sweden.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1614842114/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1614842114 PNAS Early Edition | 1 of 10

EVOLU

TION

PNASPL

US

Page 2: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

based on their tabulation, the arrangement of vesicles in the cellcortex that may or may not contain cellulosic thecal plates(collectively the theca). Whether the dinoflagellate theca origi-nated once or multiple times has been controversial. Dinofla-gellates have left a fossil record that is one of the richest amongprotists, and many preserve a detailed record of tabulationthrough reflection of thecal plates that provide insights into thehistory of some modern taxa, as well as extinct groups. They havealso left an extensive biogeochemical record (i.e., sterols), butreconciling this evidence with poorly resolved gene phylogenieshas been difficult (15, 16).We circumvented the difficulties inherent to the sequencing of

large dinoflagellate genomes by compiling a phylogenetically rep-resentative transcriptomic dataset to illuminate dinoflagellate bi-ology and evolution. We infer a strongly resolved phylogeny fordinoflagellates and provide phylogenetic evidence for a single or-igin of the theca, which coincides with major predicted changes incellulose metabolism. We propose a model for the evolution oftabulation, and show that pre-Mesozoic biomarkers that have oftenbeen associated with the group are unlikely to come from dino-flagellate sources. Three distantly related, nonphotosynthetic di-noflagellates were found to be dependent on plastid metabolism,and we propose that this dependency is likely to apply to all free-living (i.e., nonparasitic) dinoflagellates and that plastidial metab-olites are likely to represent the evolutionary origin of dinoflagellatebioluminescence. Finally, we reconstruct character evolution indinoflagellates and show that their modern-day biology was shapedby stepwise molecular, metabolic, and morphological innovations,including nuclear DNA-binding proteins of a bacterial origin.

Results and DiscussionDinoflagellate Phylogeny.Representative, strongly resolved phylogeny for dinoflagellates. An in-ability to resolve dinoflagellate relationships has hindered evolu-tion-driven predictions of their biology and a full integration of thegroup’s rich fossil record with molecular-based schemes of evolu-tion. Our aim was to overcome these limitations by erecting aframework for character mapping rooted in a representative phy-logeny of all major dinoflagellate lineages. We generated tran-scriptomes from key species lacking deep-coverage sequencedata—Noctiluca scintillans, Togula jolla, Protoceratium reticulatum,Polarella glacialis, Hematodinium spp., Amphidinium carterae, andtwo isolates of Amoebophrya sp. parasites together with their hosts,Karlodinium veneficum and Akashiwo sanguinea—and com-plemented these with data from recent sequencing projects (9,17–19) (SI Appendix, Table S1). Sequences were added into align-ments of conserved proteins previously used in eukaryotic phylog-enies (20), and their orthology was verified in individual proteintrees (Materials and Methods); 101 orthologous alignments with thefewest missing data were selected and concatenated into three phy-logenetic matrices that differ by the root (Fig. 1A and SI Appendix,Table S1). The matrices include six dinoflagellate lineages previouslyabsent in multiprotein phylogenies: Noctilucales, Gymnodiniaceae s.s.,Togula, Akashiwo, Prorocentrales, and Dinophysiales, representinga broadly sampled large dinoflagellate datasets. Maximum-likelihoodand Bayesian inferences on all three matrices gave consistent andwell-supported topologies (Fig. 1 A and B). Relationships betweenthe outgroups and the early-branching Oxyrrhis, Hematodinium,

Perkinsus marinusOxyrrhis marina

Hematodinium sp. ex NephropsAmoebophrya sp. ex A. sanguinea

Amoebophrya sp. ex K. veneficumNoctiluca scintillansAmphidinium carterae

Karenia brevisKarlodinium veneficum

Togula jollaGymnodinium catenatum

Polykrikos lebouraeAkashiwo sanguinea

Dinophysis acuminataProrocentrum minimum

Alexandrium spp.Lingulodinium polyedrum

Protoceratium reticulatumPolarella glacialis

Symbiodinium sp. CassKB8Symbiodinium minutum

Heterocapsa spp.Scrippsiella trochoidea

Durinskia balticaKryptoperidinium foliaceum CCMP1326Kryptoperidium foliaceum CCAP1116/3A

Root 3 (R3)

Root 2 (R2)

0.1

99/99/1

51/54/dt

72/64/1

50/-/dt

50/53/dt

Peridiniales

Symbiodi-niaceae

Gonyaulacales

Gym

nodi

nial

es

Syndiniales

Cor

e di

nofl

agel

late

DIN

OFL

AG

ELLA

TES

1

2

3

4

5

6

Gymno-diniaceae

s.s.

Kareniaceae

PerSym

GonDin

Pro

C PerSym

GonDin

ProPro

Pro

1

3

56

4

2Noctiluca (and Amphidinium) early-branchingGymnodiales paraphyletic to thecates, andTogula sister group of Gymnodiniaceae s.s.Akashiwo sister group to thecates

Heterocapsa with other Peridiniales (not early-branching)Thecates monophyletic (unambiguous support)

Symbiodiniaceae nested within thecates (not intermediate) Cellulosic thecal plates

Principal findings:

R1Noc + othercore dino. R2

R1R2

R2R3

R2R3

R2R3

Het +other Per

Thecates

Aka +thecates

Noc+Amp

Sym + Per

Sym + Pro

R1

R1

R1

R2R3

R1

R2R3

R1

86/86/190/87/1

97/99/199/97/1

dt/dt/dtdt/dt/.99

68/62/1

98/98/1

100/100/dt

dt/dt/dtdt/dt/dt

84/87/1dt/dt/1

All - Din

80/67/184/79/dt

88/85/190/87/1

100/99/1

dt/dt/dtdt/dt/.99

100/100/dt

72/64/1

99/99/1

51/54/dtdt/dt/dt

dt/dt/1dt/dt/1

All

57/48/157/65/dt

84/87/184/84/1

dt/dt/dtdt/dt/.99

56/48/1

100/100/dt

96/96/193/93/1

99/98/1

All - Din & Pro

not applicable

86/84/185/88/1

99/100/1

dt/dt/dtdt/dt/.99

62/54/1

100/100/dt

99/99/198/99/1

100/100/1

not applicable

All - Pro

97/95/1

B CladeSpecies presence

1

3

5

6

4

R2R3

R1

R2R3

R1100/100/.82

2 100/100/.92100/100/.94100/100/.92

100/100/.94100/100/.98

100/100/.95100/100/.98

Kar +Gym,TogAka+thecates

Gym,Tog+Aka + thecates

Matrix

SINGLE ORIGINOF THECA

Schizochytrium aggregatumSaprolegnia parasitica

Ectocarpus siliculosusAureococcus anophageferrens

Thalassiosira pseudonana

Paramecium tetraureliaTetrahymena thermophilaIchthyophthirius multifiliis

Cryptosporidium parvumCryptosporidium muris

Toxoplasma gondiiEimeria tenella

Plasmodium falciparumBabesia microti

Theileria annulataBabesia bovis

99/99/1

Apicomplexans

Ciliates

StramenopilesRoot 1 (R1)

Oxytricha trifallax

Noctilucales

Perkinsozoa

Fig. 1. Multiprotein phylogeny of dinoflagellates. (A) Best maximum-likelihood tree (IQ-Tree) of dinoflagellates and relatives based on 101-protein dataset(root 1 matrix, 43 species, 29,400 sites). Branches show ultrafast bootstraps (IQ-Tree)/nonparametric bootstraps (RAxML)/posterior probabilities (PhyloBayes)(dash indicates <50/50/0.5 support; filled circles indicate 100/100/1 support; dt indicates a different topology). Roots of alternative matrices (Perkinsus, root 2,30,780 sites; and Noctiluca, root 3, 30,988 sites) are shown by arrows. (B) Overview of branch supports for principal findings (taxon and matrix abbreviations asunderlined in A) in phylogenies of 12 matrices that differ by their root (R1–R3) and species presence (All, All - Din, All - Pro, All - Din & Pro; SI Appendix, TableS1). (C) Two placements of Dinophysis (Din) relative to Gon, Per, and Sym thecates and a variable position of Prorocentrum (Pro) as identified in phylogeniesof the 12 matrices (SI Appendix, Table S3, provides tree topology tests).

2 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1614842114 Janou�skovec et al.

Page 3: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

and Amoebophyra spp. are fully resolved and congruent with ear-lier studies (11). Core dinoflagellates are monophyletic, and sev-eral longstanding issues about their relationships can be resolved(Fig. 1).Early position of Noctilucales and athecate paraphyly. Athecate dinofla-gellates have long confounded dinoflagellate molecular phyloge-nies as a result of their intermixing with thecate taxa, for examplewithin the so-called Gymnodiniales–Peridiniales–Prorocentrales(GPP) complex (21), or as a result of the unstable position ofcertain outliers like the Noctilucales, which have at times beenplaced as basal or nested deeply inside the group (12, 16, 22). Ouranalyses resolve these issues and help reconcile dinoflagellatemorphological and molecular data in several important ways. First,we find that athecate dinoflagellates represent a paraphyletic as-semblage with respect to the thecates (Fig. 1A), suggesting thatearlier mixed groupings like the GPP complex are artifacts causedby limited phylogenetic resolution. Second, N. scintillans andA. carterae are the earliest-branching core dinoflagellates, with Noc-tiluca positioned at the base in most analyses, except for Bayesianinferences on Root 2 matrix, in which it is also basal but togetherwith Amphidinium (Fig. 1B). Statistical evaluation of alternativetree topologies by approximately unbiased test and expected-likelihood weights test rejects topologies other than Noctilucarepresenting the earliest branch of core dinoflagellates (P = 0.01;see SI Appendix, Table S2 and SI Materials and Methods). Thisposition is reinforced by the absence of a cox3 split inNoctiluca (asdetailed later) and resolves the long-problematic position of theNoctilucales (12–14, 16, 22), making them central to understand-ing the biology of the core dinoflagellate ancestor. Third, the pre-viously mysterious Togula (23) is related to the Gymnodiniaceaesensu stricto (a clade represented here by Gymnodinium s.s. andPolykrikos). Finally, Akashiwo is placed as the sister taxon tothecate dinoflagellates in all analyses, although an alternativetopology as a sister to Gymnodiaceae s.s. and Togula cannot berejected (SI Appendix, Table S2). Statistical support for themonophyly of Akashiwo and the thecates increases when thedivergent outgroup sequences are excluded in both phylogeniesand tree topology tests (Fig. 1B and SI Appendix, Table S2). Thissuggests that the relationship is likely genuine, making Akashiwothe closest investigated athecate relative of thecate dinoflagellates.(Fig. 1B). Overall, the order Gymnodiniales represents multipleparaphyletic lineages at the base of the core dinoflagellates,despite their close morphological similarity [Akashiwo, theKareniaceae, and even one member of the Noctilucales were, untilrecently, classified in the genus Gymnodinium (13, 24)], suggestingthat their conserved morphological characteristics were ancestral toall core dinoflagellates.Monophyly of thecate dinoflagellates and nested position of Symbiodiniaceae.In molecular phylogenies, thecate dinoflagellates have been mixedwith athecate species and are only exceptionally recovered asmonophyletic in specific datasets and with low support (12, 14).Our large-scale phylogenies, which include all five major thecategroups, recover thecate dinoflagellates as monophyletic, always withmaximal or near-maximal support (Fig. 1 A and B). This providesunambiguous phylogenetic support for the single origin of the di-noflagellate theca. Peridiniales, Gonyaulacales, and Symbiodinia-ceae (represented by Symbiodinium and Polarella) are monophyleticin all our analyses. The long-problematic Heterocapsa, previouslyplaced at the base of dinoflagellates (25), or away from the Peri-diniales (14), is strongly resolved as the sister group to other Peri-diniales (Fig. 1 A and B), a position consistent with its modifiedperidinialean tabulation (26). The placement of Prorocentrum andDinophysis, both representatives of poorly sampled and morpho-logically divergent orders, remains unresolved within the thecates:Dinophysis is placed at the base of the Gonyaulacales or of allthecates with low support, and the position of Prorocentrum is evenmore unstable (Fig. 1C). Analyses excluding Dinophysis, Pro-rocentrum, or both confirm the common origin and monophyly and

of the other thecate lineages, that is, the Gonyaulacales, Symbio-diniaceae, and Peridiniales inclusive of Heterocapsa (Fig. 1B). Thebranching order of these core thecate lineages is also conserved: theGonyaulacales always branch comparatively early, and the Sym-biodiniaceae are always late-branching within the thecates andconsistently recovered close to the Peridiniales. This topology isweakly supported, but support increases when the problematicProrocentrum is excluded (Fig. 1C). An exhaustive testing of alter-native tree topologies (SI Appendix, Table S3 and SI Materials andMethods) rejects all topologies in which the Symbiodiniaceae ap-pear as the sister group of other thecates at the significance level ofP = 0.05 (and also at P = 0.01 except for a single dataset in whichboth Dinophysis and Prorocentrum are absent). Symbiodiniaceae(Symbiodinium, Polarella, and related forms) are frequently classi-fied together with the early fossil genus Suessia as the “Suessiales”(26) or even within the “Suessiaceae” (27, 28), but, if this is correct,the Symbiodiniaceae should appear as the sister group of all otherliving thecates, a topology never recovered in phylogenies. Mor-phological evidence does not support the combination of the twogroups either: although tabulations in symbiodiniaceans and sues-siaceans have more series of thecal plates than most thecate dino-flagellates, determining the homologies of individual plates is notpossible (26, 29). Thus, we use the family Symbiodiniaceae (26) forthe clade uniting Symbiodinium, Polarella, and their modern rela-tives (27, 28) to separate them from the exclusively fossil Suessia-ceae (Suessia and related forms). It remains possible (but not likely)that the Suessiaceae developed their theca independently, but allother fossil and modern thecate lineages seem to have originatedfrom a common ancestor. Four independent lines of evidencesupport this: monophyly of the modern thecates in multiproteinphylogenies (Fig. 1), rapid emergence of fossils reflecting the pos-session of the theca during the early Mesozoic (30), similarities intabulation patterns between different thecate lineages (15, 26), andthe presence of theca-associated cellulases of a common evolu-tionary origin in modern thecates (Fig. 2).

Thecal Evolution and Dinoflagellate Paleohistory.Phylogeny-driven model for theca origin, evolution, and loss. Most the-cate dinoflagellates (both living and fossil) belong to theGonyaulacales and Peridiniales, two orders with tabulations in-volving five to six latitudinal series of thecal plates. The details ofthese tabulations are consistently distinct and longstanding in thefossil record, a pattern consistent with the fact that, in molecularphylogenies, the two orders are not closely related within thethecates (Fig. 1). These patterns suggest that dinoflagellates withgonyaulacoid–peridinoid tabulations originated comparativelyearly: the extinct rhaetogonyaulacoids (Fig. 2A) in the Middle toLate Triassic (31) and true, modern-looking gonyaulacoids andperidinoids in the later Early Jurassic. Even if the phylogeneticposition of the Dinophysiales and Prorocentrales in moleculartrees remains unresolved, their tabulation patterns are mor-phologically divergent and unlikely to represent ancestral ortransitional states: the fossil Nannoceratopsis suggests, for ex-ample, that the dinophysioid tabulation type is evolutionarilyderived (Fig. 2A). As explained earlier, we suggest that thesuessioid and gymnodinioid tabulations of the Symbiodiniaceaeand their sister group, the Borghiellaceae (27), are also derivedsecondarily from gonyaulacoid–peridinioid ancestors and origi-nated by a secondary increase in plate number (Fig. 2A); they donot represent early intermediates in theca evolution, as con-sidered by some earlier models (15, 32). In contrast, the LateTriassic suessioid fossils such as Suessia could represent anintermediate stage between gymnodinioid and gonyaulacoid–peridinioid tabulation types or an independent example of de-crease in primary plate number from gymnodinioid ancestors(Fig. 2A). All in all, paleontological and molecular phylogeneticdata suggest that all living thecate dinoflagellates originatedfrom ancestors with a gonyaulacoid–peridinoid tabulation and

Janou�skovec et al. PNAS Early Edition | 3 of 10

EVOLU

TION

PNASPL

US

Page 4: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

argue for the derived position of the Symbiodiniaceae. Themodel is limited by the incompleteness of the fossil record and willbe further developed by understanding the tabulations and phy-logenies of little known or morphologically divergent incertae sedisthecates like Heterodinium, Thecadinium, or Cladopyxis (26). Nosimple scenario [plate decrease, increase, and fragmentationmodels (32)] can account for the evolution of thecal tabulationfrom a phylogeny-driven perspective (Fig. 1): secondary increase inplate number is observed not only in symbiodiniaceans but also inPyrophacus (Gonyaulacales), a genus with a multiplated tabulationderived from ancestors with a gonyaulacoid tabulation, whereasother thecates have gone through a process of plate decrease, e.g.,Dinophysiales and Prorocentrales (in the hyposome) and theLate Triassic to Middle Jurassic fossil Valvaeodinium. Our modelalso strongly suggests that the theca can be lost: some species inthe Symbiodiniaceae and Borghiellaceae lack visible cellulose inamphiesmal vesicles altogether (28, 33), and their phylogeneticpositions suggest that their thecae were lost more than once (Fig.2A). Finally, a broad, negative relationship between the numberand relative surface area of amphiesmal vesicles and the amount ofcellulose contained in them emerges. The Gymnodiniales havenumerous, small amphiesmal vesicles that lack cellulose, whereasthe Gonyaulacales, Peridiniales, Prorocentrales, and Dinophysialeshave few, large amphiesmal vesicles containing thick thecal plates,the ancestral state for all living thecate dinoflagellates (Fig. 2A).Symbiodiniaceans that have moderate plate numbers in 7–10

latitudinal series have only thin cellulosic plates, but those mem-bers of the Symbiodiniaceae and Borghiellaceae that reverted to agymnodinoid tabulation often lack cellulose altogether (Fig. 2A)(e.g refs. 28, 33, but see also ref. 27). Additional data for examplefrom the Borghiellaceae and Pyrophacus will make it possible totest these trends, but, as things stand now, it seems that the acqui-sition of thick cellulosic plates within amphiesmal vesicles is con-strained with their surface area and number. Subsequent reductionsand losses of cellulose in the Symbiodiniaceae and Borghelliaceaerelaxed this constraint, leading to a partial or complete reversal tonumerous small-sized amphiesmal vesicles.Origin of theca coincides with onset of cellulase radiation. The origin ofthe dinoflagellate theca is intimately linked to the biosynthesis ofcellulose, its building material, but investigations into the detailsof cellulose production in dinoflagellates have been limited to rareultrastructural and labeling studies (34). Recently, production of ahighly expressed cellulase [dCel1 from Glycosyl hydrolase family 7(GH7)] was shown to be coupled to the cell cycle progression inCrypthecodinium cohnii and was immunolocalized to the cell wallin several dinoflagellates, suggesting an important role in celluloseprocessing during division (31). We identified multiple diversifiedparalogs of GH7 genes in all thecates and one to three closelyrelated paralogs in four athecate dinoflagellates in our dataset(SI Appendix, Table S4). A eukaryote-wide phylogeny of 184slow-evolving GH7 protein sequences (Fig. 2B and SI Appendix,Fig. S1 and SI Materials and Methods) suggests that the thecate

DinophysisProrocentrum

Gonyaulacales Symbiodiniaceae

Peridiniales

Thecate dinoflagellates:

Fungi, Daphnia

Oomycetes CrustaceansAmphimedon Amoebozoans

Hypermastigotes Aureococcus

Emiliania

0.2

dCel1

dCel

2

Kareniabrevis

Oxyrrhis marina

Noctiluca scintillans

Amphidinium carterae

Bigellowiella natans

gonyaulacoid-peridinioid

e.g., Peridiniumgymnodinioid

e.g., Gymnodiniume.g., Symbiodinium

& Leiocephalium

suessioid* & gymnodinioid*

amphiesmal vesiclecross-section

singleorigin

of theca

cellulose

prorocentroide.g., Prorocentrum

nannoceratopsioide.g., Nannoceratopsis

(Jurassic)

rhaeto-gonyaulacoide.g., Rhaeto-gonyaulax(Triassic)

suessioide.g., Suessia

(Triassic)

dinophysioide.g., Dinophysis

CELLULOSICTHECA

theca reduction

or loss

?

Jurassic CretaceousProt. Cambrian Ordov. Devon. Carbonifer. Perm. TriassicSil.

Sam

ples

(%) b

y pe

riod

with

TA-

Dino

ster

oids

(c)

Rela

tive

Num

ber o

f Spe

cies (

a an

d b)

DINOFLAGELLATES

ACRITARCHS

b

c

a

0

10

20

30

40

50

60

70

80

90

100

H1 H2

LCA ofmodernthecates

H1 H2

LCA ofmodernthecates

BA

C

Fig. 2. Thecal evolution and dinoflagellate paleohistory. (A) Phylogeny-driven model of changes between major modern and fossil (crosses) tabulationaltypes. Gymnodinoid tabulation with numerous small, empty amphiesmal vesicles is ancestral and gave rise to the gonyaulacoid–peridinioid tabulation with afew large, cellulose-rich thecal plates. Suessioid and gymnodinioid tabulations in modern Symbiodiniaceae and Borghiellaceae (asterisk) are derived in-dependently of the standard gymnodiniod and Triassic suessioid tabulations (Suessia), and are characterized by decrease or loss of cellulose content. Pro-rocentroid and dinophysioid tabulations are derived from the gonyaulacoid–peridiniod tabulation (the latter probably via a nannoceratopsioid intermediate).Triassic suessioid and rhaetogonyaulacoid tabulations may represent evolutionary intermediates or independent experiments in thecal plate reduction. (B)Maximum-likelihood phylogeny (IQ-Tree) of 184 eukaryotic GH7 proteins reveals cellulases in athecate dinoflagellates (underlined) and their radiation in thethecate (color-coded). Black rectangles indicate 50% reduction in branch length. Known GH7 cellulases in P. lunula (dCel1) and Lingulodinium polyedrum(dCel2) are shown. Further details are provided in SI Appendix, Fig. S1 and Table S4. (C) Alternative hypotheses (H1 and H2) on the first emergence of tri-aromatic dinosteranes attributable to dinoflagellates or their direct ancestors (H2 is preferred by our data). Relative species numbers of dinoflagellates (a) andacritarchs (b) and percentage of dinosterane-positive samples (c; see ref. 35 for sample data) from the Proterozoic (green), Paleozoic (red), and Mesozoic(blue) are shown together with the predicted emergence of the last common ancestor (LCA) of modern thecates. Reprinted with permission from refs. 26, 28(www.sciencedirect.com/science/journal/14344610), 35 (permission conveyed through Copyright Clearance Center, Inc.), and 74.

4 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1614842114 Janou�skovec et al.

Page 5: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

paralogs are derived by multiple rounds of duplication followed byselective lineage sorting. The branching pattern is poorly resolved,but indicates a common origin for most thecate GH7 proteinstogether with sequences from the athecate Karenia brevis andA. carterae and algae Bigelowiella natans and Thalassiosira oceanica(the latter two are nested within dinoflagellates and were pre-sumably spread horizontally). Some duplications in the thecateGH7 occurred at the level of genera or orders, but at least eightand possibly twice as many paralogs apparently originated earlier(SI Appendix, Fig. S1)—presumably in the common ancestor ofall thecates. These observations suggest that the radiation ofGH7 genes in thecate dinoflagellates is linked to the evolu-tionary origin and subsequent evolution of the theca. The GH7protein identified in K. brevis (SI Appendix, Table S4) likelycorresponds to the dCel1 homolog previously immunolocalized inthe cell cortex (31). Interestingly, A. sanguinea, the likely sistergroup of thecate dinoflagellates, is immunopositive for that sameprotein (31), although the corresponding GH7 sequence remainsunknown (our mixed transcriptome of Akashiwo cells infectedby Amoebophrya sp. lacks it). The function of GH7 enzymes inathecate species has not been studied, but they are likely involvedin the metabolism of cellulose or related polysaccharides, whichmay have been an important precondition for the acquisition ofthe cellulosic thecal plates. Unlike cellulose breakdown, cellulosebiosynthesis in dinoflagellates is not understood at the molecularlevel (34). We identified three types of algal cellulose synthase(CESA-like) homologs in thecate and athecate dinoflagellates,candidates for elucidating their cellulose biosynthesis (SI Appendix,Table S4).Dinosterol is absent in deep-branching dinoflagellates. The diversity andabundance of dinoflagellates in Mesozoic and younger sedimentscorrelates with levels of triaromatic dinosterols, derivatives of thefossilizing biomarker 4-methyl sterol, dinosterol (4α, 23, 24R-tri-methyl-5α-cholest-22E-en-3β-ol) (15, 35). Dinosteranes also occurin Late Proterozoic and early Paleozoic sediments that are oftenenriched with acritarchs (microfossils of uncertain origin, some ofwhich have been speculatively attributed to dinoflagellates or theirdirect ancestors), and this has led to the proposal that dinofla-gellates are ancient and acquired dinosterol biosynthesis early intheir evolution (35–37). We compared this hypothesis (Fig. 2C,H1) to a Mesozoic origin of the dinoflagellate dinosterol (Fig. 2C,H2) by mapping sterol distribution onto our updated phylogeny ofdinoflagellates (Fig. 1). Dinosterol and other 4-methyl sterols areabsent from all dinoflagellate relatives with known sterol profiles,including ciliates, perkinsids, apicomplexans, Chromera, andVitrella, but also Oxyrrhis (38) and Amoebophrya, which likely onlyacquires 4-methyl sterols from its host (39, 40). In core dinofla-gellates, 4-methyl sterols are ubiquitous, but dinosterol itself is ab-sent in three of their earliest branches: Noctiluca, Amphidinium, andthe Kareniaceae (e.g., refs. 41–43). Gyrodinium dominans, likelyanother early core dinoflagellate (14), also lacks dinosterol (38).This suggests that dinosterol appeared first in the last commonancestor of Gymnodiniaceae s.s., Akashiwo, and thecate dinofla-gellates (although broader testing for its presence in early-branchingdinoflagellates is needed). We suggest that pre-Mesozoic dinoster-anes are unlikely to originate from dinoflagellates for four reasons.First, dinosteranes from the Late Proterozoic and early Paleozoicgreatly predate unambiguous dinoflagellate fossils, and dinosterolpresence in modern species is restricted to close relatives of thethecates (Fig. 1), which originated in the early Mesozoic. Second,Paleozoic acritarch microfossils bear no demonstrable morpholog-ical similarity to dinoflagellates (26). Third, dinosteranes prevalencein Paleozoic and Proterozoic samples is highly variable comparedwith Mesozoic samples (35). They seem to be entirely absentfrom the Carboniferous and Permian (35), a discontinuity thatcontrasts with their almost universal preservation in Mesozoicand younger sediments and species. Finally, small amounts ofdinosterol are known from a modern species of diatom (44), and

traces of dinosteranes are also present in Archean bitumens, wheredinoflagellates could not have possibly existed (45). All this suggeststhat different organisms in different geological eras evolved dinos-terol biosynthesis independently of dinoflagellates and that dinos-terol production by certain acritarchs ended with their mid-Paleozoicextinction. We also note that the phylogenetic distance betweenthe origin of dinosterol-producing athecates and the origin ofmodern thecate dinoflagellates (see Fig. 1) is consistent with thetime lapse between the Early Triassic dinosterane increase and theappearance of modern thecate orders in the Early Jurassic sedi-ments (Fig. 2C). We therefore suggest that abundant dinosteranesin some Scythian (Early Triassic) sediments predating the earliestthecate fossils (Middle Triassic) (35) are derived from athecatedinoflagellates alone, which gained the ability to produce dinosterolsnear the Permian/Triassic boundary and became abundant shortlyafter it (Fig. 2C, H2).

Plastid Metabolism and Dependency.Plastid metabolism in nonphotosynthetic dinoflagellates. Approximatelyhalf of the described dinoflagellate species are nonphotosyntheticand are traditionally considered to lack plastids. The other halfcontains a photosynthetic peridinin-pigmented plastid that, insome lineages, has been replaced by other types of plastids. Theperidinin plastid was inherited from the plastid in the commonphotosynthetic ancestor of dinoflagellates and apicomplexans (46,47), but whether cryptic, nonpigmented plastids have been retainedin nonphotosynthetic dinoflagellates remains contentious: Cryp-thecodinium and Oxyrrhis appear to contain plastid-derived genes(48, 49), whereas Hematodinium lacks all traces of the organelle(50). We investigated whether plastid and cytosolic pathways forisoprenoid, tetrapyrrole, and fatty acid biosynthesis were present intwo distantly related nonphotosynthetic dinoflagellates, N. scintil-lans and O. marina, as well as in Dinophysis acuminata, a fun-damentally nonphotosynthetic species that nevertheless carrieskleptoplastids. For each metabolic enzyme in these pathways, weelaborated a single protein phylogeny and classified its origin asplastidic (in a clade with photosynthetic eukaryotes only), cytosolic(in a clade containing heterotrophic eukaryotes), or bacterial (in aclade with bacteria, putative recent horizontal transfer), a meth-odology informed by published localizations in model eukaryotes(e.g., ref. 51) and by in silico targeting predictions in selectedproteins (Fig. 3 and SI Appendix, SI Materials and Methods).All three investigated dinoflagellates contain an isoprenoid

pathway of plastid origin (all seven enzymes are present in Noc-tiluca and Dinophysis) and lack the cytosolic pathway variant (Fig.3A), This is exemplified by their retention of cyanobacterial IspCenzymes (Fig. 3B), which branch among orthologs from pho-tosynthetic dinoflagellates and other algae. Similarly, all threenonphotosynthetic dinoflagellates contain multiple components ofthe plastid tetrapyrrole pathway (an essentially complete enzymeset is present in Noctiluca and Dinophysis), but only two to threecomponents of that in mitochondria and the cytosol. Comparingour data to the Symbiodiniumminutum genome, we propose that asingle tetrapyrrole pathway of a predominantly plastid origin thatinitiates from glutamate (Fig. 3A, GTR and GSA) is present in allcore dinoflagellates, a feature typical of eukaryotic plastids [mi-tochondrial aminolevulinic acid synthase (ALA) synthase is pre-sent in the early-branching Hematodinium, Oxyrrhis, and Perkinsus(50)]. None of the three nonphotosynthetic dinoflagellates containproteins for plastid fatty acid biosynthesis, suggesting that thispathway is dispensable in dinoflagellates in the absence of pho-tosynthesis (Fig. 3A; FabI in Dinophysis is unusual; SI Appendix, SIMaterials and Methods). Genes for plastid iron–sulfur cluster as-sembly (SufB, C, D), ferredoxin (Fd) redox system [i.e., Fd NADP+

reductase (FNR)], and triose phosphate membrane translocators(TPTs) are also present in the three species (SI Appendix,Table S5).

Janou�skovec et al. PNAS Early Edition | 5 of 10

EVOLU

TION

PNASPL

US

Page 6: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

Plastid protein targeting and genome loss. We further investigated 56protein sequences in Noctiluca, Oxyrrhis, and Dinophysis of aplastidic origin (SI Appendix, Table S5). Most are incomplete, butseven are complete (they contain a partial spliced leader at the 5′terminus of the corresponding transcript), and another 28 carry anextension of more than 50 aa at their N terminus. Proteins fromthe latter two categories were tested for the presence of plastid-targeting peptides in silico, and 17 of them carry bipartite targetingsignatures comprising signal and transit peptides (SI Appendix,Table S6). Thirteen of these contain a phenylalanine at or near thepredicted signal peptide cleavage site, and three Oxyrrhis proteinscontain a second transmembrane region, all characteristics of

targeting to plastids but not to other subcellular compartments indinoflagellates (52, 53). In silico predictions have limited accuracy,but the consistent presence of N-terminal extensions and signalpeptides in proteins is congruent only with a plastidic origin. Forexample, cyanobacterial Fds in Noctiluca and Dinophysis with fourconserved cysteine residues required for Fe-S formation containN-terminal extensions with signal and transit peptides for plastidtargeting (truncated in Oxyrrhis; SI Appendix, Fig. S2). Noctilucaand Dinophysis also contain a plastid-targeted Fd NADP+ re-ductase (i.e., FNR; SI Appendix, Tables S5 and S6), suggesting thattheir Fd–FNR redox system might have a similar function to thatin the nonphotosynthetic plastid of Plasmodium (54). SufB and

absentnon-photosynthetic (genome likely absent)non-photosynthetic (genome present)photosynthetic (genome present)

Plastid type:

Metabolic dependency on plastidsA

c=apicomplexans (G)b=myzozoan ancestor

Fatty acid biosynthesiscytosolic

FASI / PKSIplastidic FASII

KS AT DH

ER KR FabD

FabH

FabG

FabZ

FabI

FabB

/F

FAAL

ACP

TRD

d=chrompodellids (G)

a=

ACP

mitochondrial/cytosolic C4

plastidic C5Tetrapyrrole biosynthesis

ALAS

ALAD

PBG

DU

ROS

URO

DCP

OXPP

OXFE

CHG

TRG

SA ALAD

PBG

DU

ROS

URO

DCP

OXPP

OXFE

CH

Isoprenoid biosynthesiscytosolic MEV plastidic

MEP/DOXP

HM

GCS

HM

GCR

MVK

PMVK

MVD

DXS

IspC

IspD

IspE

IspF

IspG

IspH

ELO

PHS

KCR

TECR

ERelongation

a b

Met

. upt

ake

c

d

dependencyon plastidisoprenoids

?

Perkinsus marinus (G)

Protein origin by phylogeny:plastidiccytosolic

not identified (in genomes)bacterial

Dependency:

plastid isoprenoids &tetrapyrroles

parasite hostplastid isoprenoids only

Symbiodinium minutum(G)

Lingulodinium polyedrum(T) *

*

* *

* *

*

*

fNoctiluca scintillans (T) *

Hematodinium sp. (T)

e

e=core dinoflagellates ** *( )

Oxyrrhis marina (T)f=dinoflagellates **? ?

*

?

?plastid

loss

plastid loss *

Other:

domain fusion (order may differ)

enzyme variant presentvariant uncertain

in some species ( ) predicted*

?*

Dinophysis acuminata (T) *

*

*

Met

. upt

ake

*

parasitic descendants: free-living descendants:ALL dependent on plastidsheterotrophic mixotrophic with a new

endosymbiontmixotrophic

& phototrophicheterotrophic with

kleptoplastidyNoctiluca, OxyrrhisCrypthecodinium

Voromonas, ColpodellaDinophysis Durinskia

KryptoperidiniumKarenia

KarlodiniumHeterocapsa

LingulodiniumChromera

heterotrophicancestor

DPlastid dependency

CCore metabolism in non-photosynthetic plastids

Plasmodium (P), Toxoplasma (P) Eimeria (P), Alphamonas

Noctiluca, Oxyrrhis, DinophysisCrypthecodinium Voromonas, Colpodella

Perkinsus (P)Theileria (P), Babesia (P)

Fatty acid synthesis

Tetrapyrrole biosynthesis

Isoprenoid unit biosynthesis

Ferredoxin redox system(Fdx/FNR)

Fe-S assembly (Suf)

Met

. upt

ake

d ddnon-photosynthetic

plastid

photosyntheticendosymbiont / plastid d

new photosynthetic endosymbiont / plastid

d d kleptoplastid

metabolicfunction

SOME dependent on plastids

* *

early photosynthe-tic ancestor

d

PerkinsusToxoplasmaPlasmodium

HematodiniumAmoebophrya

Cryptosporidium

d

host d

earlyphotosynthetic

ancestor

dependencyprotistcell

B Plastid IspCphylogeny

Galdieria sulphuraria

Oxyrrhis marina MMETSP468Oxyrrhis marina CCMP1378

Togula jolla

Protoceratium reticulatumAlexandrium catenella

Polarella glacialis

Karlodinium veneficumKarenia brevis

Heterocapsa rotundata

Pyrocystis lunula

Symbiodinium minutum

Scrippsiella trochoideaCrypthecodinium cohnii

DINO-FLAGELLATES

0.2

OTHERPLASTIDS

Dinophysis acuminata

Apicomplexans &Chrompodellids (n=6)

Perkinsus marinus

Noctiluca scintillansGymnodinium catenatumProrocentrum minimum

Kryptoperidinium foliaceum CCMP1326Durinskia baltica

Kryptoperidinium foliaceum CCAP1161/3

Haptophytes (n=3)Aureococcus anophagefferens

Diatoms (n=2)

Cyanidioschyzon merolae

Red algae (n=3)

Ectocarpus siliculosusGuillardia theta

CYANOBACTERIA (n=17)

Viridiplantae (n=7)Glaucophytes (n=2)

98 99

99

99

99

98

97

9797

93

92

92

91

9899

6584

60

60

57

80

878775

77

75

71

Fig. 3. Plastid metabolism and dependency in nonphotosyntetic dinoflagellates. (A) Phylogeny-driven reconstruction of plastid and nonplastid variants ofcore metabolism (isoprenoid, tetrapyrrole, and fatty acid biosynthesis) in genomes (marked as “G”) or transcriptomes (“T”) of dinoflagellates and relatives.Individual enzymes (SI Appendix, Table S5) were classified by protein phylogenies and color-coded as to their presence/absence and origin. The data suggestthat Oxyrrhis, Noctiluca, and Dinophysis are metabolically dependent on plastids. Metabolite (Met.) uptake was summarized from the literature. (B) Maxi-mum-likelihood phylogeny (IQ-Tree) reveals IspCs of cyanobacterial origin in nonphotosynthetic dinoflagellates and relatives (bold); ultrafast bootstraps atbranches are shown (>50 shown; ≥95 highlighted; filled circles, 100). (C) Three grades in functional organization of core metabolic pathways in non-photosynthetic plastids in dinoflagellates (blue) and relatives (“P” represents parasites). (D) Model for evolutionary dependency on plastids in dinoflagellatesand relatives, which is applicable to other eukaryotes. Ancestral dependency (marked as “d”) on plastid metabolism (loss of cytosolic isoprenoid biosynthesis;later reinforced by the loss of C4 tetrapyrrole biosynthesis in some taxa) led to retention of plastids in all free-living and many parasitic descendants. Thedependency can be transferred onto a new plastidial symbiont (Kareniaceae) or host organism (in parasites dependent solely on host-derived metabolites);only the latter leads to an outright loss of the plastid.

6 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1614842114 Janou�skovec et al.

Page 7: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

ClpC have essential functions in plastids, but, in apicomplexans,they also constitute key barriers to the loss of the plastid ge-nome (47, 55). SufB carries a bipartite plastid-targeting signaturein Oxyrrhis (SI Appendix, Table S6), an apparently incompleteN-terminal extension in Dinophysis, and is encoded on GC-richcontigs in all three species (55–66.7% GC), all typical of a nuclearbut not plastidial localization. Similarly to sufB, all three non-photosynthetic dinoflagellates contain plastid-like clpC fragmentson GC-rich, likely nuclear contigs (SI Appendix, SI Materials andMethods). SufB and clpC are also nucleus-encoded in Perkinsusand Symbiodinium (47, 56), and this indicates that both genes wererelocated from the plastid genome early in their evolution. Be-cause plastids in photosynthetic dinoflagellates encode only pho-tosystem genes (7, 56) and ancestral reconstruction identifies noadditional barriers to genome loss (47), evidence increasingly in-dicates that plastid genomes in nonphotosynthetic dinoflagellatesand Perkinsus were lost with the loss of photosynthesis.Principles of plastid dependency in dinoflagellates and eukaryotes. Noc-tiluca, Oxyrrhis, and Dinophysis are metabolically dependent oncryptic plastids for the biosynthesis of isoprenoid units, and Noc-tiluca and Dinophysis for tetrapyrroles; evidence for this are mul-tiple proteins in pathways of plastidial origin (as determined byphylogenies), presequences for plastid targeting, the absence ofcytosolic pathway variants, and plastid localization of homologs inmodel species (Fig. 3 and SI Appendix, Tables S5 and S6). A fullrelocalization of either pathway to the cytosol is unprecedented inany organism, and the dependency on plastid pathways is sup-ported by the fact that we obtain similar results from threedistantly related heterotrophs and also from closely relatedphototrophs, one of which has genome data available (9). Addi-tional plastid pathways—Fd redox system and Fe-S assembly—arepresent in Noctiluca, Oxyrrhis, and Dinophysis; these are essentialfor the function of the plastid but not for the host cell. Metabolismof amino acids remains insufficiently known in dinoflagellates andis absent in the plastids of apicomplexans (51). A comparison ofnonphotosynthetic plastids in the broader group (Fig. 3C) revealsthree functional grades in core metabolism that reflect dispens-ability of individual pathways: the biosynthesis of isoprenoid units(and required cofactors) is ubiquitous and is the only core plastidpathway in piroplasmid apicomplexans and Perkinsus, whereasplastid fatty acid biosynthesis was retained only in apicomplexansand Alphamonas (47) (Fig. 3 A and C).The pattern of plastid dependency in dinoflagellates parallels

that in apicomplexans and chrompodellids [chromerids and col-podellids (47)] and reinforces conclusions that their common an-cestor had a plastid (46) and was reliant on it for isoprenoid unitsafter it lost the capability to synthesize them in the cytosol (47).Despite rare secondary losses of plastids in certain parasites (50,57) and ongoing uncertainties about plastid presence in someorganisms (e.g., gregarines, Psammosa, Eudubosquella), plastidsare indispensable in all free-living members of this group yet ex-amined (Fig. 3A) (47, 48), including multiple uncultured forms(58). This pattern suggests that the metabolic dependency onplastids in free-living species cannot be bypassed by obtaining therelevant compounds from the environment or ingested prey (Fig.3D). Rather, it has only increased with time as redundant cytosolicand mitochondrial pathways continue being lost (Fig. 3A) (47).For example, the loss of mitochondrial delta-ALA in core dino-flagellates has extended their plastid dependency to tetrapyrrolebiosynthesis (Fig. 3A), much like in apicomplexans and chrom-podellids (47). Most parasites retain plastids (59, 60), but theirdependency on the organelle can be reduced or bypassed com-pletely by the uptake of host metabolites (Fig. 3A) (57)—searchesin transcriptomic data indicate that Amoebophrya parasites lackthe plastid, which was likely lost in their common ancestor withHematodinium (50). Based on these patterns, we suggest that allfree-living (but not all parasitic) dinoflagellates rely on plastidorganelles that are derived from the ancestral peridinin plastid

(Fig. 3D). These include phagotrophs (Noctiluca), osmotrophs(Crypthecodinium), and species with kleptoplastidy (Dinophysis)and new endosymbionts (Durinskia) except where these endo-symbionts have substituted metabolite dependency on the ances-tral plastid (likely in the Kareniaceae; Fig. 3D). This provides abroad rationale for why dinoflagellates with diatom endosymbi-onts contain two types of plastid isoprenoid and tetrapyrrolepathways (61) (Fig. 3 B and D). It also explains why Dinophysiscontains a plastid Fd and TPT of a dinoflagellate ancestry (62):both proteins contain bipartite targeting presequences with sig-nal peptides (SI Appendix, Fig. S2 and Table S6; the latter wastruncated in ref. 62), suggesting they are targeted into a crypticthree-membrane plastid (Fig. 3D), not the kleptoplastid as pre-viously argued (62). Finally, we emphasize that a complete lossof a plastid organelle has never been confirmed in free-livingeukaryotes, and we posit that this would be hard to achieve inestablished endosymbioses given the dependency patterns thatexist in free-living dinoflagellates and related organisms (Fig. 3 Aand D) (47).Plastid tetrapyrroles and the evolution of bioluminescence. Several speciesof dinoflagellates are bioluminescent (63). In the photosyntheticspecies Pyrocystis lunula, the light-emitting compound luciferin hasan open tetrapyrrole structure thought to be synthesized from thestructurally similar chlorophyll a (64): the organism incorporatesradioactively labeled chlorophyll precursors into chlorophyll andluciferin, suggesting that their biosynthesis is linked (65). How-ever, other bioluminescent dinoflagellates like Noctiluca, Proto-peridinium, and certain Polykrikos species are nonphotosynthetic(63) and not known to synthesize chlorophyll. The prediction thatthey acquire chlorophyll from their prey (66) is inconsistent withprey-independent bioluminescence in at least one of them, Pro-toperidinium crassipes (67). Our finding of the plastid tetrapyrrolepathway in Noctiluca, which also leads to the precursors of chlo-rophyll, offers an alternative explanation of luciferin presence: itmay be obtained by biosynthesis rather than scavenging, at least insome species. The plastid tetrapyrrole pathway is apparently in-dispensable as a key requirement for heme synthesis in all coredinoflagellates (Fig. 3A), and could therefore account for luciferinproduction in any bioluminescent dinoflagellate, irrespective of thepresence of photosynthesis. This biosynthesis scenario also opensthe possibility that luciferin is not derived via chlorophyll per se,but via an earlier intermediate in its biogenesis, perhaps a chlor-ophyllide or chlorine-like tetrapyrrole. Although this remains tobe tested experimentally, our finding of the plastid tetrapyrrolepathway supports the possibility that bioluminescence in non-photosynthetic dinoflagellates relies on a biosynthetic machin-ery repurposed from heme and chlorophyll production.

Character Evolution in Dinoflagellates.Nuclear evolution: Stepwise horizontal gene gain. Dinoflagellates haveunique nuclei that have lost bulk nucleosomal DNA packaging,and instead condense DNA by using two types of basic proteinsthat are different from histones. Dinoflagellate/viral nucleoproteins(DVNPs) are similar to uncharacterized proteins from phycodna-viruses, are distributed in all dinoflagellates yet examined, andrepresent a family of basic proteins with high DNA-binding affinity(4). In contrast, dinoflagellate histone-like proteins (HLPs) are ofbacterial origin and have been found only in certain core di-noflagellate species; they are primarily detected at the chromosomeperiphery, where they are predicted to organize extended DNAloops during transcription (68). We identified DVNPs in all tran-scriptomes in our dataset, confirming their ubiquitous distributionamong dinoflagellates. Our searches also confirm that HLPsare absent in all early-branching taxa (Oxyrrhis, Hematodinium,and Amoebophrya spp.) and are ubiquitous in core dinoflagel-lates. Unexpectedly, however, we found that HLPs in Noctiluca,Amphidinium, Togula, and Gymnodinium are dissimilar in se-quence to HLPs in other dinoflagellates despite their similar length

Janou�skovec et al. PNAS Early Edition | 7 of 10

EVOLU

TION

PNASPL

US

Page 8: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

and structure (SI Appendix, Table S4). We reconstructed the phy-logeny of dinoflagellate HLPs together with a representative se-lection of their closest orthologs, the bacterial HU-like proteins(HLPs in other eukaryotes are not closely related to those in di-noflagellates). The outcome confirms a wide separation betweenthe dinoflagellate type known previously, HLP-I [e.g., HCc3 inCrypthecodinium (68)], and the HLP-II (Fig. 4 and SI Appendix, Fig.S3). Interestingly, HLP-I and HLP-II have mutually exclusive dis-tributions (Fig. 4 and SI Appendix, Fig. S3), which suggest thatHLP-II rather than HLP-I was ancestral to core dinoflagellates.HLP-I most likely appeared in the ancestor of Kareniaceae andother core dinoflagellates (it temporarily coexisted with HLP-Ifollowed by selective loss or spread horizontally later between theKareniaceae and thecates and Akashiwo; Fig. 5). Because HLP-Iand HLP-II are monophyletic but not closely related to eachother, and HU-like proteins are present in a wide range of bac-terial phyla, the dinoflagellate HLPs are likely derived from HU-

like proteins and not vice versa (this is in contrast to DVNPs, inwhich the direction of transfer with phycodnaviruses cannot beestablished). The unique molecular architecture of dinoflagellatenuclei thus resulted from at least three independent waves ofprotein gain (Fig. 5). The recruitment of DVNPs took place in thegroup’s ancestor, leading to a decrease in the nuclear protein:DNA ratio and potentially the loss of bulk nucleosomal packagingand increase in the genome size in dinoflagellates. HLPs wereacquired later than DVNPs by at least two independent horizontaltransfers from different bacterial donors. The initial gain of HLPsin the ancestor core dinoflagellates coincided with the emergenceof liquid crystalline chromosomes with arched DNA fibrils, whichare condensed permanently in most species.Organelle evolution: Plastid reduction and mitochondrial cox3 split. Evi-dence of a dependency on plastids in nonphotosynthetic dinofla-gellates (Fig. 3) corroborates earlier conclusions that the commonancestor of dinoflagellates and apicomplexans was photosynthetic(46) and dependent on plastid-generated isoprenoids (47). Ourphylogeny also supports the prediction that more than a dozendescendant lineages of this dinoflagellate–apicomplexan ancestorhave lost photosynthesis (46, 69). At least two parasites, Crypto-sporidium and Hematodinium, have lost the plastid outright, butthis is not the case in other parasites and in any free-living lineagesthat have been investigated with sufficient detail (six independenttransitions to heterotrophy). We thus posit that plastid loss in di-noflagellates and apicomplexans is less frequent than their re-tention after the loss of photosynthesis, and is limited to a fewparasites (47). After the split with apicomplexans but at least by thetime Amphidinium diverged, the dinoflagellate plastid acquired thephotosynthetic carotenoid peridinin, peridin–chlorophyll bindingproteins, and a reduced, minicircular genome (6). Our resultssuggest that during this transition the plastid sufB and clpC genes(key barriers to plastid genome loss in apicomplexans) were relo-cated to the nucleus in dinoflagellates. This made the dinoflagel-late plastid genome dispensable in the absence of photosynthesis,likely explaining why all heterotrophic representatives studied todate appear to lack it. In at least four distantly related photosynthetic

HCc3

Dino agellate HLP-II

Dino agellate HLP-I

Bacterial HU-like

Bacterial HU-like

7684

6959

0.2

Togula jollaGymnodinium catenatumAmphidinium carteraeNoctiluca scintillans

all other coredino agellates

reference sequence(NCBI): ACJ04919

reference sequence(NCBI): AAM97522

Fig. 4. Evolution of histone-like proteins. Phylogeny of bacterial (HU-like)and dinoflagellate HLPs reveals a dinoflagellate-type histone-like protein,HLP-II, in early-branching core dinoflagellates. HLP-II has a mutually exclusivedistribution with HLP-I (e.g., the characterized HCc3 in C. cohnii, in bold).Further details are provided in SI Appendix, Fig. S3 and Table S4.

Apic

ompl

exa

and

Chro

mpo

delli

ds

Colp

onem

ids

and

Cilia

tes

Perk

inso

zoa

Oxy

rrhi

daSy

ndin

iace

ae a

nd

Amoe

boph

ryac

eae

Noc

tiluc

ales

Amph

idin

ium

sens

u st

ricto

Kare

niac

eae

Akas

hiw

o

Din

ophy

sial

es

Pror

ocen

tral

es

Gon

yaul

acal

es

Sues

sial

es

Plastid presence + oligo U-tailing in plastid mRNAs & RuBisCO type II

HLP-II

Theca & multiple paralogs of GH7 cellulases

Theca (cellulosic cell wall)

Spliced leader trans-splicing in mRNAs

Spliced leader trans-splicing

4-methyl sterols

Dinosterol

DVNPs + Loss of bulk nucleosomal DNA packaging, decrease in protein:DNA ratio, increase in genome size

HLP-I (earliest emergence)

Perid

inia

les

incl

. H

eter

ocap

sace

ae

Liquid crystalline interphase-condensed chromosomes with arched DNA fibrills in at least one life cycle stage

Gymnodiniales

Gym

nodi

niac

eae

s.s.

and

Tog

ula

CORE DINOFLAGELLATES

Cingulum & sulcus: Shallow , True Dinoflagellate/viral nucleoproteins (DVNPs)

Cellulase GH7 expansion

Mitochondrial cox3 split & trans-splicingMinicircular,highly reducedplastid DNAPeridinin &peridinin-chlorophyll a-binding proteins

Minicircular plastid DNA, Peridinin & PCPs

Striated strand & wave on transversal flagellum

Shallow cingulum & sulcus, discernible epi- & hyposome

True cing. & sulcus

Histone-like proteins: HLP-I , HLP-II 4-methyl sterols , +Dinosterol

Striated strand & wave on transv. flagellum

Liquid crystalline chromosomes throughout life cycle

Liquid cryst. chr.: temporary , permanent

present (different colors)absentnot applicable

DINOFLAGELLATES

Mitochondrial cox3 split and trans-splicing

Fig. 5. Model for character evolution in dinofla-gellates. Ancestral character states (filled circles) ofconserved traits are reconstructed on the consensusphylogeny of dinoflagellates and their relatives byparsimony (arrowheads). Dotted branches in thethecate lineages indicate uncertain placement. Gapsindicate missing data, and “not applicable” denotesplastid genome absence or the presence of a differ-ent plastid genome type (Kareniaceae). The verticalsquare bracket indicates an evolutionary range inwhich traits emerged. Photos of dinoflagellates (byG. S. Gavelis), left to right: Kofoidinium sp. (Noctilu-cales), Nematodinium sp. (Gymnodiniaceae s.s.), Neo-ceratium praelongum (Gonyaulacales),Dinophysis miles(Dinophysiales), and Heterocapsa sp. (Peridiniales).

8 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1614842114 Janou�skovec et al.

Page 9: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

dinoflagellates, the expression of plastid genes is accompanied bysubstitutional editing of corresponding mRNAs (Fig. 5) (7). Theorigin of plastid editing is, however, uncertain: it appeared sometime after the divergence of apicomplexans and chrompodellids(70) and possibly became more widespread after the divergence ofAmphidinium (71), but pinpointing its origin more precisely willrequire an analysis on deep-branching photosynthetic dinoflagel-lates such as Spatulodinium pseudonoctiluca (13).The mtDNA in at least five lineages of core dinoflagellates

including Amphidinium contains a unique feature: cox3 is split inthe same region into two fragments that are trans-spliced at theRNA level (8, 72). The split is absent inHematodinium and earlierdiverging species, but, to our knowledge, its presence in theNoctilucales was not known until now (Fig. 5). We identified acox3 contig in the Noctiluca transcriptome corresponding to a full-length protein (terminated by a canonical stop codon rare in thegroup; SI Appendix, SI Materials and Methods). Mapping individualRNA read pairs onto the contig demonstrated continuous tran-scription across the split region and provides no support for theexistence of two transcripts and their trans-splicing. PCR ampli-fication by using Noctiluca genomic DNA as a template produceda single product spanning both sides of the cox3 split, the identityof which was confirmed by sequencing (SI Appendix, SI Mate-rials and Methods). Because the phylogenetic distribution andthe unique character of the cox3 split are indicative of a singleevolutionary origin, the uninterrupted cox3 in Noctiluca corrobo-rates the early position of the Noctilucales among core dinofla-gellates (Fig. 1).Character map: Framework for evolutionary and functional predictions. Byusing parsimony, we reconstructed ancestral character states ofmajor conserved morphological and molecular traits at differentpoints of the dinoflagellate phylogeny (Fig. 5 and SI Appendix, SIMaterials and Methods). Newly mapped transitions include thegain of 4-methyl sterols, dinosterol, nuclear HLP-I and II, themitochondrial cox3 split, the theca, and the gain of multipleparalogs of GH7 cellulases. Two additional transitions map at thecommon ancestor of Amphidinium and later-diverging taxa: thegain of condensed liquid crystalline chromosomes throughoutthe life cycle and the gain of a proteinaceous striated rod in thetransverse flagellum, which produces a strongly pronounced fla-gellar wave (Fig. 5). The corresponding characteristics in theNoctilucales are little understood as yet—chromosomes in one oftheir life stages, the trophont, are relaxed and the transversalflagellum in their gametes is trailing, wave-less, and contains only athin filament in place of the striated rod (73). Detailed analysis isrequired to determine whether these states represent true evolu-tionarily intermediates or secondary modifications associated withthe unusual morphology of this order. The origin of other di-noflagellate characteristics was established previously and isreinforced within our framework: gain of plastids, RuBisCO formII and oligoU-tailing in plastid mRNAs before the split with api-complexans (46), and the acquisition of spliced leader trans-splicing of mRNAs in their common ancestor with perkinsids (Fig.5). DVNPs, ubiquitous in the species in our dataset, are ancestralto dinoflagellates and associated with changes in protein:DNAratio and genome size. The ancestor of syndinians and core di-noflagellates had a life stage with a shallow sulcus and cingulum(flagellar grooves), the latter dividing the cell into an upper epi-some and a lower hyposome, a transitional morphology between

short flagellar grooves in Oxyrrhis and Psammosa and deeply en-graved perpendicular flagellar grooves in core dinoflagellates (Fig.5). Altogether, most transitions map to the branch correspondingto the ancestor of core dinoflagellates, but other characteristicsare scattered widely along the evolutionary backbone (Fig. 5).Thus, the ecological success of dinoflagellates has resulted from aseries of independent changes to the morphology, metabolism,and molecular biology of their ancestors.

ConclusionsWe used sequence data to illuminate dinoflagellate biology andevolution. Evidence from our multiprotein phylogenies resolvesnumerous issues relating to dinoflagellate relationships, providesstrong support for the single origin of the theca, and helps rec-oncile several apparent contradictions in dinoflagellate fossil,biogeochemical, and molecular data (Figs. 1 and 2). The originof the theca coincides with a radiation of cell wall-localizedcellulases involved in cell division (Fig. 2B). Plastid biosyntheticpathways exist in the nonphotosynthetic Noctiluca, Oxyrrhis, andDinophysis, and cytosolic pathway variants do not (Fig. 3). Thissuggests that all free-living dinoflagellates are metabolically de-pendent on plastids that have taken over important cellularfunctions, apparently early in the evolution of the group; plas-tidial tetrapyrrole biosynthesis may also explain the existence ofbioluminescent luciferin in nonpigmented dinoflagellates. Theorigin of the liquid crystalline nuclei coincides with the acquisi-tion of bacterial histone-like proteins, which occurred in twodistinct evolutionary phases (Fig. 4), suggesting that horizontalgene transfers were the ultimate origin of key dinoflagellatefeatures. By producing a map of the major transitions in theevolutionary history of dinoflagellates (Fig. 5), we provide apredictional framework that will facilitate the investigation ofmany aspects of the group’s cell biology (nuclear organization,plastid evolution), molecular biology, and paleobiology.

Materials and MethodsRNAwasextractedbyRNAqueous kit or TRIzol Plus RNAkit. Paired-end50-bp or100-bp Illumina sequence reads were generated and assembled in Trinityversion 2 or as part of the Marine Microbial Eukaryote Transcriptome Se-quencing Project pipeline (19). Phylogenetic matrices were prepared fromalignments in MAFFT version 7.215 stripped of hypervariable sites in BlockMapping and Gathering with Entropy version 1.1. Phylogenies were computedin IQ-Tree (1,000 ultrafast bootstraps), RAxML version 8 (300 nonparametricbootstraps), and Phylobayes (where applicable). Plastid targeting signals wereanalyzed in SignalP 4.1 (D-score cutoff 0.45) and ChloroP 1.1 at 0.45 cTP-scorecutoff. Species culturing and sequencing, phylogenetic inferences, and analy-ses of plastid metabolism and protein targeting are detailed in SI Appendix, SIMaterials and Methods.

ACKNOWLEDGMENTS. We thank Bill MacMillan for technical support andPatrick Keeling for facilities and support. This work was supported by aUniversity College London Excellence Fellowship (to J.J.), a CIFAR Global ScholarFellowship (to J.J.), a University of British Columbia Four-Year PhD Fellowship (toJ.J.), Gordon and Betty Moore Foundation Grant 2637 to the National Center forGenome Resources (NCGR), National Science and Engineering Research Councilof Canada Grant NSERC 2014-05258 (to B.S.L. and G.S.G.), a Tula FoundationGrant to Patrick Keeling (F.B.), the Centre for Microbial Biodiversity andEvolution (F.B.), Australian Research Council Grant DP1093395 (to S.G.G.),Science Foundation Ireland Grant 13/SIRG/2125 (to S.G.G.), NSF Grant EF-0629625 (to C.F.D. and T.R.B.), and Canadian Institute for Health ResearchGrant MOP-42517 to Patrick Keeling. MMETSP samples were sequenced,assembled, and annotated at NCGR. This is ESS contribution no. 20160099.

1. Gómez F (2012) A quantitative review of the lifestyle, habitat and trophic diversity ofdinoflagellates (Dinoflagellata, Alveolata). Syst Biodivers 10(3):267–275.

2. de Vargas C, et al.; Tara Oceans Coordinators (2015) Ocean plankton.Eukaryoticplankton diversity in the sunlit ocean. Science 348(6237):1261605.

3. Velo-Suárez L, BrosnahanML, Anderson DM, McGillicuddy DJ, Jr (2013) A quantitativeassessment of the role of the parasite Amoebophrya in the termination of Alexan-drium fundyense blooms within a small coastal embayment. PLoS One 8(12):e81150.

4. Gornik SG, et al. (2012) Loss of nucleosomal DNA condensation coincides with ap-pearance of a novel nuclear protein in dinoflagellates. Curr Biol 22(24):2303–2312.

5. Wong JTY, New DC, Wong JCW, Hung VKL (2003) Histone-like proteins of the di-noflagellate Crypthecodinium cohnii have homologies to bacterial DNA-bindingproteins. Eukaryot Cell 2(3):646–650.

6. Zhang Z, Green BR, Cavalier-Smith T (1999) Single gene circles in dinoflagellatechloroplast genomes. Nature 400(6740):155–159.

7. Wang Y, Morse D (2006) Rampant polyuridylylation of plastid gene transcripts in thedinoflagellate Lingulodinium. Nucleic Acids Res 34(2):613–619.

8. Nash EA, Nisbet RER, Barbrook AC, Howe CJ (2008) Dinoflagellates: A mitochondrialgenome all at sea. Trends Genet 24(7):328–335.

Janou�skovec et al. PNAS Early Edition | 9 of 10

EVOLU

TION

PNASPL

US

Page 10: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

9. Shoguchi E, et al. (2013) Draft assembly of the Symbiodinium minutum nuclear ge-nome reveals dinoflagellate gene structure. Curr Biol 23(15):1399–1408.

10. Lin S, et al. (2015) The Symbiodinium kawagutii genome illuminates dinoflagellategene expression and coral symbiosis. Science 350(6261):691–694.

11. Bachvaroff TR, et al. (2014) Dinoflagellate phylogeny revisited: Using ribosomal proteinsto resolve deep branching dinoflagellate clades. Mol Phylogenet Evol 70:314–322.

12. Hoppenrath M, Leander BS (2010) Dinoflagellate phylogeny as inferred from heatshock protein 90 and ribosomal gene sequences. PLoS One 5(10):e13220.

13. Gómez F, Moreira D, López-García P (2010) Molecular phylogeny of noctilucoid di-noflagellates (Noctilucales, Dinophyceae). Protist 161(3):466–478.

14. Orr RJS, Murray SA, Stüken A, Rhodes L, Jakobsen KS (2012) When naked becamearmored: An eight-gene phylogeny reveals monophyletic origin of theca in dinofla-gellates. PLoS One 7(11):e50004.

15. Fensome RA, Saldarriaga JF, Taylor “Max” FJR (1999) Dinoflagellate phylogeny revisited:Reconciling morphological and molecular based phylogenies. Grana 38(2):66–80.

16. Saldarriaga JF, Taylor “Max” FJR, Cavalier-Smith T, Menden-Deuer S, Keeling PJ (2004)Molecular data and the evolutionary history of dinoflagellates. Eur J Protistol 40(1):85–111.

17. Imanian B, Keeling PJ (2014) Horizontal gene transfer and redundancy of tryptophanbiosynthetic enzymes in dinotoms. Genome Biol Evol 6(2):333–343.

18. Gavelis GS, White RA, Suttle CA, Keeling PJ, Leander BS (2015) Single-cell tran-scriptomics using spliced leader PCR: Evidence for multiple losses of photosynthesis inpolykrikoid dinoflagellates. BMC Genomics 16(1):528.

19. Keeling PJ, et al. (2014) The Marine Microbial Eukaryote Transcriptome SequencingProject (MMETSP): Illuminating the functional diversity of eukaryotic life in theoceans through transcriptome sequencing. PLoS Biol 12(6):e1001889.

20. Burki F, Okamoto N, Pombert J-F, Keeling PJ (2012) The evolutionary history ofhaptophytes and cryptophytes: Phylogenomic evidence for separate origins. Proc BiolSci 279(1736):2246–2254.

21. Saunders GW, Hill DRA, Sexton JP, Andersen RA (1997) Small-subunit ribosomal RNAsequences from selected dinoflagellates: Testing classical evolutionary hypotheses withmolecular systematic methods. Origins of Algae and Their Plastids, ed Bhattacharya D(Springer, Vienna), pp 237–259.

22. Fukuda Y, Endoh H (2008) Phylogenetic analyses of the dinoflagellate Noctilucascintillans based on beta-tubulin and Hsp90 genes. Eur J Protistol 44(1):27–33.

23. Jørgensen MF, Murray S, Daugbjerg N (2004) A new genus of athecate interstitialdinoflagellates, Togula gen. nov., previously encompassed within Amphidinium sensulato: Inferred from light and electron microscopy and phylogenetic analyses of partiallarge subunit ribosomal DNA sequences. Phycol Res 52(3):284–299.

24. Daugbjerg N, Hansen G, Larsen J, Moestrup Ø (2000) Phylogeny of some of the majorgenera of dinoflagellates based on ultrastructure and partial LSU rDNA sequencedata, including the erection of three new genera of unarmoured dinoflagellates.Phycologia 39(4):302–317.

25. Zhang H, Bhattacharya D, Lin S (2007) A three-gene dinoflagellate phylogeny sug-gests monophyly of prorocentrales and a basal position for Amphidinium and Het-erocapsa. J Mol Evol 65(4):463–474.

26. Fensome RA, et al. (1993) A Classification of Living and Fossil Dinoflagellates. Micro-paleontology Special Publication 7 (American Museum of Natural History, New York).

27. Moestrup Ø, Lindberg K, Daugbjerg N (2009) Studies on woloszynskioid dinoflagel-lates IV: The genus Biecheleria gen. nov. Phycol Res 57(3):203–220.

28. Takahashi K, Moestrup Ø, Jordan RW, Iwataki M (2015) Two new freshwater wo-loszynskioids Asulcocephalium miricentonis gen. et sp. nov. and Leiocephaliumpseudosanguineum gen. et sp. nov. (Suessiaceae, Dinophyceae) lacking an apicalfurrow apparatus. Protist 166(6):638–658.

29. Medlin LK, Fensome RA (2013) Dinoflagellate macroevolution: Some considerations basedon an integration of molecular, morphological and fossil evidence. Biological and Geo-logical Perspectives of Dinoflagellates, eds Lewis JM, Marret F, Bradley L. The Micro-palaeontological Society, Special Publications (Geological Society, London), pp 255–266.

30. Fensome RA, MacRae RA, Moldowan JM, Taylor FJR, Williams GL (1996) The earlyMesozoic radiation of dinoflagellates. Paleobiology 22(3):329–338.

31. Bujak JP, Williams GL (1981) The evolution of dinoflagellates. Can J Bot 59(11):2077–2087.32. Hansen G, Daugbjerg N, Henriksen P (2007) Baldinia anauniensis gen. et sp. nov.: A

“new” dinoflagellate from Lake Tovel, N. Italy. Phycologia 46(1):86–108.33. Sekida S, Horiguchi T, Okuda K (2004) Development of thecal plates and pellicle in the

dinoflagellate Scrippsiella hexapraecingula (Peridiniales, Dinophyceae) elucidated bychanges in stainability of the associated membranes. Eur J Phycol 39(1):105–114.

34. Kwok ACM, Wong JTY (2010) The activity of a wall-bound cellulase is required for andis coupled to cell cycle progression in the dinoflagellate Crypthecodinium cohnii.Plant Cell 22(4):1281–1298.

35. Moldowan JM, et al. (1996) Chemostratigraphic reconstruction of biofacies: Molecular evi-dence linking cyst-forming dinoflagellates with pre-Triassic ancestors.Geology 24(2):159–162.

36. Moldowan JM, Talyzina NM (1998) Biogeochemical evidence for dinoflagellate an-cestors in the early cambrian. Science 281(5380):1168–1170.

37. Summons RE, Walter MR (1990) Molecular fossils and microfossils of prokaryotes andprotists from Proterozoic sediments. Am J Sci 290:212–244.

38. Chu F-LE, et al. (2008) Sterol production and phytosterol bioconversion in two species ofheterotrophic protists, Oxyrrhis marina and Gyrodinium dominans. Mar Biol 156(2):155–169.

39. Place AR, Bai X, Kim S, Sengco MR, Wayne Coats D (2009) Dinoflagellate host-parasitesterol profiles dictate karlotoxin sensitivity(1). J Phycol 45(2):375–385.

40. Leblond JD, Sengco MR, Sickman JO, Dahmen JL, Anderson DM (2006) Sterols of thesyndinian dinoflagellate Amoebophrya sp., a parasite of the dinoflagellate Alexan-drium tamarense (Dinophyceae). J Eukaryot Microbiol 53(3):211–216.

41. Teshima SI, Kanazawa A, Tago A (1980) Sterols of the dinoflagellate Noctiluca milialis.Mem Fac Fish Kagoshima Univ 29:319–326.

42. Withers NW, Goad LJ, Goodwin TW (1979) A new sterol, 4α-methyl-5α-ergosta-8(14),24(28)-dien-3β-ol, from the marine dinoflagellate Amphidinium carterae.Phytochemistry 18(5):899–901.

43. Leblond JD, Chapman PJ (2002) A survey of the sterol composition of the marinedinoflagellates Karenia brevis, Karenia mikimotoi, and Karlodinium micrum: Distri-bution of sterols within other members of the class Dinophyceae. J Phycol 38(4):670–682.

44. Volkman JK, Barrett SM, Dunstan GA, Jeffrey SW (1993) Geochemical significance ofthe occurrence of dinosterol and other 4-methyl sterols in a marine diatom. OrgGeochem 20(1):7–15.

45. Brocks JJ, Buick R, Summons RE, Logan GA (2003) A reconstruction of Archean bi-ological diversity based on molecular fossils from the 2.78 to 2.45 billion-year-oldMount Bruce Supergroup, Hamersley Basin, Western Australia. Geochim CosmochimActa 67(22):4321–4335.

46. Janou�skovec J, Horák A, Oborník M, Luke�s J, Keeling PJ (2010) A common red algalorigin of the apicomplexan, dinoflagellate, and heterokont plastids. Proc Natl AcadSci USA 107(24):10949–10954.

47. Janou�skovec J, et al. (2015) Factors mediating plastid dependency and the origins ofparasitism in apicomplexans and their close relatives. Proc Natl Acad Sci USA 112(33):10200–10207.

48. Sanchez-Puerta MV, Lippmeier JC, Apt KE, Delwiche CF (2007) Plastid genes in a non-photosynthetic dinoflagellate. Protist 158(1):105–117.

49. Slamovits CH, Keeling PJ (2008) Plastid-derived genes in the nonphotosynthetic al-veolate Oxyrrhis marina. Mol Biol Evol 25(7):1297–1306.

50. Gornik SG, et al. (2015) Endosymbiosis undone by stepwise elimination of the plastidin a parasitic dinoflagellate. Proc Natl Acad Sci USA 112(18):5767–5772.

51. Seeber F, Soldati-Favre D (2010) Metabolic pathways in the apicoplast of apicom-plexa. Int Rev Cell Mol Biol 281:161–228.

52. Nassoury N, Cappadocia M, Morse D (2003) Plastid ultrastructure defines the proteinimport pathway in dinoflagellates. J Cell Sci 116(pt 14):2867–2874.

53. Patron NJ, Waller RF, Archibald JM, Keeling PJ (2005) Complex protein targeting todinoflagellate plastids. J Mol Biol 348(4):1015–1024.

54. Pandini V, et al. (2002) Ferredoxin-NADP+ reductase and ferredoxin of the protozoanparasite Toxoplasma gondii interact productively in vitro and in vivo. J Biol Chem277(50):48463–48471.

55. Howe CJ, Purton S (2007) The little genome of apicomplexan plastids: Its raison d’etreand a possible explanation for the ‘delayed death’ phenomenon. Protist 158(2):121–133.

56. Mungpakdee S, et al. (2014) Massive gene transfer and extensive RNA editing of asymbiotic dinoflagellate plastid genome. Genome Biol Evol 6(6):1408–1422.

57. Abrahamsen MS, et al. (2004) Complete genome sequence of the apicomplexan,Cryptosporidium parvum. Science 304(5669):441–445.

58. Janou�skovec J, Horák A, Barott KL, Rohwer FL, Keeling PJ (2012) Global analysis of plastiddiversity reveals apicomplexan-related lineages in coral reefs. Curr Biol 22(13):R518–R519.

59. McFadden GI, Reith ME, Munholland J, Lang-Unnasch N (1996) Plastid in humanparasites. Nature 381(6582):482.

60. Matsuzaki M, Kuroiwa H, Kuroiwa T, Kita K, Nozaki H (2008) A cryptic algal groupunveiled: A plastid biosynthesis pathway in the oyster parasite Perkinsus marinus.MolBiol Evol 25(6):1167–1179.

61. Hehenberger E, Imanian B, Burki F, Keeling PJ (2014) Evidence for the retention oftwo evolutionary distinct plastids in dinoflagellates with diatom endosymbionts.Genome Biol Evol 6(9):2321–2334.

62. Wisecaver JH, Hackett JD (2010) Transcriptome analysis reveals nuclear-encodedproteins for the maintenance of temporary plastids in the dinoflagellate Dinophysisacuminata. BMC Genomics 11:366.

63. Marcinko CLJ, Painter SC, Martin AP, Allen JT (2013) A review of the measurementand modelling of dinoflagellate bioluminescence. Prog Oceanogr 109:117–129.

64. Topalov G, Kishi Y (2001) Chlorophyll Catabolism leading to the skeleton of di-noflagellate and Krill luciferins: Hypothesis and model studies. Financial support fromthe National Institutes of Health (NS 12108) is gratefully acknowledged. Angew ChemInt Ed Engl 40(20):3892–3894.

65. Wu C, Akimoto H, Ohmiya Y (2003) Tracer studies on dinoflagellate luciferin with[15N]-glycine and [15N]-l-glutamic acid in the dinoflagellate Pyrocystis lunula.Tetrahedron Lett 44(6):1263–1266.

66. Liu L, Hastings JW (2007) Two different domains of the luciferase gene in the het-erotrophic dinoflagellate Noctiluca scintillans occur as two separate genes in pho-tosynthetic species. Proc Natl Acad Sci USA 104(3):696–701.

67. Yamaguchi A, Horiguchi T (2008) Culture of the heterotrophic dinoflagellate Proto-peridinium crassipes (Dinophyceae) with noncellular food items(1). J Phycol 44(4):1090–1092.

68. Chan Y-H, Wong JTY (2007) Concentration-dependent organization of DNA by thedinoflagellate histone-like protein HCc3. Nucleic Acids Res 35(8):2573–2583.

69. Saldarriaga JF, Taylor FJ, Keeling PJ, Cavalier-Smith T (2001) Dinoflagellate nuclear SSU rRNAphylogeny suggests multiple plastid losses and replacements. J Mol Evol 53(3):204–213.

70. Janou�skovec J, et al. (2013) Split photosystem protein, linear-mapping topology, andgrowth of structural complexity in the plastid genome of Chromera velia. Mol BiolEvol 30(11):2447–2462.

71. Barbrook AC, et al. (2012) Polyuridylylation and processing of transcripts from mul-tiple gene minicircles in chloroplasts of the dinoflagellate Amphidinium carterae.Plant Mol Biol 79(4-5):347–357.

72. Jackson CJ, Waller RF (2013) A widespread and unusual RNA trans-splicing type indinoflagellate mitochondria. PLoS One 8(2):e56777.

73. Soyer M-O (1970) Etude ultrastructurale de l’endoplasme et des vacuoles chez deuxtypes de Dinoflagellés appartenant aux genres Noctiluca (Suriray) et Blastodinium(Chatton). Z Zellforsch Mikrosk Anat 105(3):350–388.

74. Lee SY, et al. (2014) Morphological characterization of Symbiodinium minutum andS. psygmophilum belonging to clade B. Algae 29(4):299–310.

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1614842114 Janou�skovec et al.

Page 11: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

Supporting InformationJanouškovec et al.: Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics

SI Materials and Methods:Dinoflagellate culturing, sequencing, and sequence assembly. Noctiluca scintillans SPMC136 (MMETSP0253) was grown on Prorocentrum micans CCMP691 (primary preferred prey) in filtered (0.2 µm) autoclaved seawater (30 psu) with a dilute trace metal amendment (1). Scaled-up cultures were captured on a 80 µm sieve and maintained on Dunaliella tertiolecta (secondary non-preferred prey) for 25 days to ensure a complete removal of Prorocentrum (which was visually absent by the day 15 following the transfer). Noctiluca cells were then re-captured on the sieve and their total RNA was extracted by using the RNAqueous kit (Ambion). Togula jolla CCCM725 (project MMETSP0224), Protoceratium reticulatum CCCM535 (project MMETSP0228), and Polarella glacialis CCMP1383 (project MMETSP0227) were grown in the natural seawater medium HESNW (2), and their RNA was purified by the TRIzol Plus RNA kit (Thermo Fisher). Sequencing libraries were built and transcriptomic reads generated, processed, and assembled at the National Center for Genome Resourcesas described previously (3). Assembled contigs and predicted proteomes were downloaded from the MMETSP website (http://data.imicrobe.us/project/view/104). A second independent assembly of Noctiluca reads was generated by using Trinity v2.0.6 at default settings; the resulting contigs were found to contain longer 5' regions compared to the MMETSP assembly and were used in the analysis ofN-termini of plastid-targeted proteins. This transcriptomic assembly has been deposited in GenBank (TSA) under the accession GELK00000000. Each Noctiluca protein used in this study (extracted from either of the assemblies) was first screened by BLASTP against the predicted proteome of Prorocentrum minimum CCMP2233 (Table S1), and its affiliation to dinoflagellates was then verified in a Maximum likelihood phylogeny. No sequences of Prorocentrum and rare, well-identifiable sequences of Dunaliella were detected in the assemblies (all assembled nuclear ribosomal RNA contigsbelong to Noctiluca). The transcriptome assembly of Hematodinium sp. was deposited in GenBank (TSA; GEMP00000000. Data from Amphidinium carterae and two Amoebophrya isolates were generated as described in (4, 5). Data from other species were obtained as detailed in Table S1.

Multiprotein phylogenies. Dinoflagellate sequences were added into alignments of conserved proteins that were previously used in eukaryotic phylogenies (6), and those with 30% or less of taxa missing were selected. The alignments were re-aligned by the 'localpair' algorithm in MAFFT v7.215 (7), stripped of hypervariable sites (-b 4 -g 0.4 settings) in BMGE v1.1 (8) and the orthology of sequences within was verified by comparing their RAxML v.8 (9) maximum likelihood phylogenies (LG + Gamma 4 + F model) with known relationships based on published phylogenies. Paralogous, highly divergent or contaminant sequences were identified in several species and removed; where ambiguous, all paralogs for a given species were removed, and where multiple ambiguities were identified, the whole gene alignment was discarded. Single protein alignments were concatenated in Scafos v1.25 (10)by using 'o=gclv gamma=yes l=1 m=1' settings. Chimeric sequences were created for species where overlapping fragments or non-overlapping fragments of a congruent phylogenetic position were recovered (Table S1). A total of 12 phylogenetic matrices were concatenated independently: three variants of the outgroup times four variants of species presence among thecate taxa (Table S1; Fig. 1). Maximum Likelihood phylogenies of the concatenated matrices were inferred in IQ-Tree v1.41 (11) by using the LG + I + GAMMA4 + F settings (-m TEST was run first to select this model) with 1000 ultrafast bootstraps, and RAxML by using the LG + GAMMA4 + F with 300 non-parametric bootstraps. Bayesian phylogenies were inferred in PhyloBayes MPI v1.5a (12) on CIPRESS Science Gateway (13) by using GTR + CAT + GAMMA4, -dc, and maxdiff<0.1 settings. Approximately unbiased (AU) and Expected likelihood weights (ELW) test scores for alternative tree topologies were computed in Consel (14) and IQ-Tree (11), respectively (Tables S2 and S3).

Page 12: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

Theca evolution and dinosterol. Protein sequences of dCel1 and dCel2 cellulases (accessions in Table S4) were each used to retrieve 250 closest hits from the NCBI nr database, which were complemented by dinoflagellate sequences from our dataset (Table S1; primarily MMETSP and NCBI databases). Thedataset was reduced to a smaller number of unique, phylogenetically representative sequences: sequences that were largely incomplete and sequences that were closely related to one another (including all sequences from Kryptoperidinium foliaceum CCAP 1116/3, a close relative of K. foliaceum CCMP1326) or formed very long branches in preliminary phylogenies were removed. The final phylogenetic matrix of the GH7 dataset (Fig. 2B and Fig. S1) contained 184 sequences and 260 amino acid sites and was prepared by an alignment in MAFFT and removal of hypervariable sites in BMGE (as above), and phylogenies were inferred in IQ-Tree, as described above (see Multiprotein phylogeny). Dinosterol distribution in dinoflagellates was mapped by surveying the available literature,in part by using the reference list at https://doi.pangaea.de/10.1594/PANGAEA.819698.

Plastid metabolism and protein targeting. Sequences of plastid and nuclear protein (Figs. 2-4) were identified by BLASTP searches in datasets listed in Table S1, in addition to transcriptomes from two Oxyrrhis marina strains (MMETSP1424-1426 and MMETSP0468-471 projects), and Pyrodinium bahamense (NCBI: PRJNA169246). Contaminant sequences were identified in several projects (e.g., Oxyrrhis MMETSP projects) and carefully removed based on phylogenetic incongruence. In the phylogenetic reconstruction of dependency on plastids in non-photosynthetic species, (Fig. 3A) each enzyme of pathways was analysed separately. New dinoflagellate sequences were included in single-protein alignments from an earlier study (15) and the protein origin was assessed by RAxML phylogenies (computed as above) or analyzed in newly prepared datasets (Fd, FNR, SufB, SufC, SufD, TPT; Table S5). The final phylogenetic matrix of IspC (Fig. 3B) contained 66 sequences and 331 aminoacid sites and was computed in IQ-Tree as described above (see GH7 dataset preparation). Dinoflagellate FASI / PKS polyproteins were reconstructed (Fig. 3A) from mutually overlapping fragments comprising at least two domains (Table S5); domain order and functional specificity of mature FASI / PKS forms remain unknown, but both FAS and PKS are likely to be present (most individual domains exist in multiple sequence contexts). Atypical plastid FabI was identified in Dinophysis acuminata that was closely related to homologs in the Kareniaceae (Fig. 3A) but other proteins of the pathway were not. It is unclear whether this protein sequences may be a contaminant (other Kareniaceae-like proteins were found in the Dinophysis transcriptome), but it remains unlikely that Dinophysis possess the plastid fatty acid biosynthesis pathway. Plastid targeting signals in Noctiluca, Dinophysis, and Oxyrrhis were analysed in plastid proteins carrying N-terminal extensions as compared their bacterial orthologs. The most complete sequence for each protein was selected from the following: MMETSP-predicted proteins, proteins newly predicted from MMETSP-assembled contigs (Oxyrrhis and Dinophysis), or proteins predicted from newly assembled MMETSP reads (Noctiluca Trinity assembly; note that N-termini of some MMETSP-predicted proteins were incomplete). Proteins that screened positively for signal peptides in SignalP 4.1 (D-score cutoff 0.45) were further tested for the presence of transit peptides in ChloroP 1.1 at 0.45 cTP-score cutoff (16) and the strongest candidates for plastid targeting were listed in Table S6 (trans-membrane regions were predicted in TMHMM v2.0). The cleavage site of Plasmodium falciparum ferredoxin (Fig. S2) was predicted by PATS (http://gecco.org.chemie.uni-frankfurt.de/pats/pats-index.php), a species-specific tool for prediction of targeting pre-sequences. Partial sufB sequences were identified in Noctiluca (55%GC, c20274_g1_i1, and c20274_g2_i1 contigs in the Trinity assembly), Oxyrrhis marina MMETSP0468 (57.7% GC; contig CAMNT_0034061651), and Dinophysis (66.7% GC; contig CAMNT_0021013865). Plastid clpC fragments were identified in Noctiluca (55.4% GC; c23770_g4_i1and c23770_g5_i1 contigs in the Trinity assembly), Dinophysis (69.1% GC; contig CAMNT_0020950785), and Oxyrrhis marina MMETSP0468 (60.8% GC; contig

Page 13: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

CAMNT_0034034689).

Character evolution. Protein sequences of HLP-I and HLP-II (accessions in Table S4) were each used to retrieve 250 closest hits from the NCBI nr database; top hits among environmental and NCBI EST entries were also included. The final phylogenetic matrix (Fig. 4 and Fig. S3) contained 114 sequences and 99 amino acid sites and was prepared by adding sequences of dinoflagellates, removing closely related sequences, and alignment processing, and phylogenies were inferred in IQ-Tree, all as describedabove (see GH7 dataset preparation). The data presented in the character map were compiled from the literature and ancestral states were reconstructed by parsimony on the consensus of dinoflagellates relationships as established in this study (Fig. 1), taking known lower-level relationships into account (e.g., (17, 18)) . Transcripts of the three mitochondrion-localized protein-coding genes in Noctiluca were identified in the Trinity assembly by homology searches (cox1 on the contig c33015_g1_i5, cox3 on the contig c32288_g1_i2, and cob on the contig c32214_g1_i1). The cox3 transcript was found to becomplete and contain a canonical UAA stop codon at the expected position (i.e., one that is not generated by oligoadenylation of its 3' terminus as observed in other core dinoflagellates (19)); the onlycanonical stop codon in core dinoflagellates reported so far is in the cob of Symbiodinium minutum; (20)). Trascriptomic paired-end Illumina 50bp reads were mapped onto the assembled cox3 contig by using Bowtie2 (v2.2.9); reads mapped continually across the region where the split occurs in other coredinoflagellates with paired-end reads connecting both sides of the split (no indication of two separate RNA fragments, trans-splicing, or oligoA tailing was observed). PCR corresponding to a near-full length cox3 spanning both sides of the split was done by using Pfu polymerase, Rnase-treated genomic DNA of Noctiluca and specific primers. The reaction yielded a single product of the correct size (no product was observed in 'Dnase-treated template' and 'no template' controls) and the sequence of this product corresponded to Noctiluca cox3. Polymorphism at multiple sites was observed in the chromatogram and the DNA consensus differed in several nucleotides from the transcriptomic contig, where we also observed extensive polymorphism by read mapping: thus, the number of and variation among cox3 copies and whether editing of their transcripts is present remain to be established.

SI References:1. Gifford DJ (1985) Laboratory culture of marine planktonic oligotrichs(Ciliophora, Oligotrichida). Mar Ecol Prog Ser 23(3):257–267.

2. Harrison PJ, Waters RE, Taylor FJR (1980) A Broad Spectrum Artificial Sea Water Medium for Coastal and Open Ocean Phytoplankton1. J Phycol 16(1):28–35.

3. Keeling PJ, et al. (2014) The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptomesequencing. PLoS Biol 12(6):e1001889.

4. Bachvaroff TR, et al. (2014) Dinoflagellate phylogeny revisited: Using ribosomal proteins to resolve deep branching dinoflagellate clades. Mol Phylogenet Evol 70:314–322.

5. Gornik SG, et al. (2015) Endosymbiosis undone by stepwise elimination of the plastid in a parasitic dinoflagellate. Proc Natl Acad Sci 112(18):5767–5772.

6. Burki F, Okamoto N, Pombert J-F, Keeling PJ (2012) The evolutionary history of haptophytes and cryptophytes: phylogenomic evidence for separate origins. Proc R Soc B Biol Sci 279(February):2246–2254.

7. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4):772–780.

8. Criscuolo A, Gribaldo S (2010) BMGE (Block Mapping and Gathering with Entropy): a new

Page 14: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 10(1):210.

9. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313.

10. Roure B, Rodriguez-Ezpeleta N, Philippe H (2007) SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics. BMC Evol Biol 7 Suppl 1:S2–S2.

11. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ (2015) IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol Biol Evol 32(1):268–274.

12. Lartillot N, Rodrigue N, Stubbs D, Richer J (2013) PhyloBayes MPI: Phylogenetic Reconstruction with Infinite Mixtures of Profiles in a Parallel Environment. Syst Biol 62(4):611–615.

13. Miller MA, Pfeiffer W, Schwartz T (2011) The CIPRES science gateway: a community resourcefor phylogenetic analyses. Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery (ACM), p 41.

14. Shimodaira H, Hasegawa M (2001) CONSEL: for assessing the confidence of phylogenetic treeselection. Bioinformatics 17(12):1246–1247.

15. Janouškovec J, et al. (2015) Factors mediating plastid dependency and the origins of parasitism in apicomplexans and their close relatives. Proc Natl Acad Sci U S A 112(33):10200–7.

16. Emanuelsson O, Brunak S, von Heijne G, Nielsen H (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2(4):953–71.

17. Orr RJS, Murray SA, Stüken A, Rhodes L, Jakobsen KS (2012) When naked became armored: an eight-gene phylogeny reveals monophyletic origin of theca in dinoflagellates. PLoS ONE 7(11):e50004.

18. Saldarriaga J (2004) Molecular data and the evolutionary history of dinoflagellates. Eur J Protistol 40(1):85–111.

19. Jackson CJ, et al. (2007) Broad genomic and transcriptional analysis reveals a highly derived genome in dinoflagellate mitochondria. BMC Biol 5:41–41.

20. Shoguchi E, Shinzato C, Hisata K, Satoh N, Mungpakdee S (2015) The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans. Genome Biol Evol 7(8):2237–2244.

SI Figures and Tables:

Fig. S1: Phylogeny of cellulases of the Glycosyl Hydrolase 7 family (next page). Analysis of Glycosyl hydrolase family 7 (GH7) reveals cellulases in athecate dinoflagellates (bold) and their radiation in the thecates (color-coded). Unrooted best Maximum likelihood tree (IQ-Tree) was inferred from 119 dinoflagellate and 65 non-dinoflagellate eukaryotic protein sequences limited to taxonomically representative, near full-length, and slow-evolving entries. GH7 cellulases from Pyrocystis lunula (dCel1) and Lingulodinium polyedrum (dCel2) are highlighted. Eight putative paralogs are shown that were likely present in the ancestor of all living thecate orders are shown by vertical arrows; the exact number of ancestral paralogs is difficult to predict due to the low resolution of the tree, apparent incomplete lineage sorting and putative horizontal gene transfer (see, e.g., Bigelowiella natans), but was likely even higher - see 13-16 paralogous forms in the Gonyaulacales only.

Page 15: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.
Page 16: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

Fig. S2: In silico targeting predictions in plastid ferredoxins (see other plastid-targeted proteins in Table S6). Protein N-termini are shown to scale and show signal peptide D-scores (SignalP) and transit peptide cTP-scores (TargetP) and four conserved cysteines required for Fe-S formation. Spliced leaders or their fragments (SL=length in nucleotides) at the 5' end of dinoflagellate mRNAs indicate N-complete proteins. Porphyra ferredoxin is plastid-genome encoded and does not require targeting pre-sequences. The plastid ferredoxin in Oxyrrhis is truncated but contains an N-terminal extension carrying (a near-complete or complete) transit peptide region suggesting that the protein is targeted to the plastid like in other dinoflagellates.

Fig. S3: A novel dinoflagellate histone-like protein (next page). Phylogeny of bacterial (HU-like) and dinoflagellate histone-like proteins (HLP) reveals a novel dinoflagellate type (HLP-II), which has amutually exclusive distribution with HLP-I. Best Maximum likelihood tree (IQ-Tree) with ultrafast bootstrap supports at branches (only >50 supports are shown). Several environmental sequences (ENV)including those derived from dinoflagellate spliced-leader libraries (ENV dinoSL) are shown. The previously characterized HCc3 in Crypthecodinium cohnii (HLP-I) is highlighted in bold. Other histone-like proteins in eukaryotes (e.g., HU-like proteins in the plastid) are not closely related to eitherof the dinoflagellate HLP forms and were not included in the phylogeny.

Page 17: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.
Page 18: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

Table S1: Sequence sources and phylogenetic matrices used. Species and data presence in final concatenated phylogenetic matrices (Root 1-3; Fig. 1) with sequence sources: matrix size (sites #), percentage of missing sites (%MS), genes (%MG), and merged chimeric entries (%CH; Materials and Methods).

Group Operational taxonomic unit (Fig. 1)Root 1 sites #

% MS

% MG

% CH

Root 2 sites #

% MS

% MG

% CH

Root 3sites #

% MS

% MG

% CH Data source

Dinoflagellates Akashiwo sanguinea 27237 7 2 4 28820 6 2 4 28779 7 2 4 Bachvaroff et al., 2014 MPE

Dinoflagellates

Alexandrium spp. (A. tamarense, A. catenella OF101, A. minutum, A. ostenfeldii) 28346 4 1 5 29619 4 1 3 29857 4 1 3 NCBI EST; MMETSP0790

Dinoflagellates Amoebophrya sp. ex Akashiwo 26900 9 6 0 28385 8 6 0 --- --- --- --- Bachvaroff et al., 2014 MPE

Dinoflagellates Amoebophrya sp. ex Karlodinium 25811 12 3 0 27025 12 3 0 --- --- --- --- Bachvaroff et al., 2014 MPE

Dinoflagellates Amphidinium carterae CCMP1314 28630 3 0 0 29931 3 0 0 30074 3 0 0MMETSP0258-MMETSP0259; Bachvaroff et al., 2014 MPE

Dinoflagellates Dinophysis acuminata DAEP01 25106 15 2 29 26629 13 2 30 26904 13 2 30 MMETSP0797

Dinoflagellates Durinskia baltica CSIRO CS-38 26195 11 2 3 27245 11 2 2 27338 12 2 2 MMETSP0116-MMETSP0117

Dinoflagellates Gymnodinium catenatum GC744 28347 4 2 5 29929 3 2 4 30236 2 2 4 MMETSP0784

Dinoflagellates Hematodinium sp. SG-2012 ex Nephrops 28763 2 2 0 29746 3 2 0 --- --- --- --- NCBI TSA: GEMP00000000

DinoflagellatesHeterocapsa spp. (H. triquetra CCMP449; H. rotundata SCCAP K-0483) 26353 10 5 21 27614 10 5 21 27799 10 5 21 NCBI EST; MMETSP0503

Dinoflagellates Karenia brevis (CCMP2229, Wilson) 28973 1 1 1 30373 1 1 1 30567 1 1 1NCBI EST; MMETSP0027, MMETSP0029-MMETSP0031

DinoflagellatesKarlodinium veneficum (CCMP2283, CCMP 415, CCMP1974) 29119 1 0 4 30615 1 0 4 30816 1 0 4

NCBI EST; Bachvaroff et al., 2014 MPE; MMETSP1015-MMETSP1017

Dinoflagellates Kryptoperidinium foliaceum CCAP 1116/3 25472 13 5 3 26649 13 5 2 26986 13 5 3 MMETSP0118-MMETSP0119

Dinoflagellates Kryptoperidinium foliaceum CCMP 1326 26666 9 3 1 27709 10 3 1 28055 9 3 1 MMETSP0120-MMETSP0121

Dinoflagellates Lingulodinium polyedrum 27323 7 0 18 28520 7 0 18 28515 8 0 17 NCBI TSA: GABP00000000

Dinoflagellates Noctiluca scintillans SPMC136 27743 6 2 4 28787 6 2 3 29032 6 2 3MMETSP0253; NCBI TSA: GELK00000000

Dinoflagellates Oxyrrhis marina (CCMP1788, 44_PLY01) 11227 62 40 0 11114 64 41 0 --- --- --- --- NCBI EST; Lowe et al., 2012

Dinoflagellates Polarella glacialis CCMP1383 27590 6 1 12 29182 5 1 12 29304 5 1 12 MMETSP0227

Dinoflagellates Polykrikos lebourae 10500 64 30 20 10641 65 30 19 10637 66 30 19 Gavelis et al., 2015

Dinoflagellates Prorocentrum minimum CCMP2233 27625 6 2 8 29208 5 2 8 29518 5 2 8 MMETSP0267-MMETSP0269

Dinoflagellates Protoceratium reticulatum CCCM535 27388 7 0 4 28527 7 0 4 28762 7 0 4 MMETSP0228

Dinoflagellates Scrippsiella trochoidea CCMP3099 28580 3 0 6 30275 2 0 6 30340 2 0 5 MMETSP0270-MMETSP0272

Dinoflagellates Symbiodinium minutum Mf1.05b 21137 28 12 1 21589 30 12 1 21754 30 12 1Symbiodinium minutum genome database

Dinoflagellates Symbiodinium sp. CassKB8 25884 12 3 0 26788 13 3 0 26726 14 3 0NCBI SRA: SRX076696; http://medinalab.org/zoox/

Dinoflagellates Togula jolla CCCM725 29018 1 0 2 30353 1 0 2 30617 1 0 2 MMETSP0224

Perkinsids Perkinsus marinus 27188 8 0 0 28253 8 0 0 --- --- --- --- NCBI NR

Apicomplexans Babesia bovis 26932 8 5 0 --- --- --- --- --- --- --- --- NCBI NR

Apicomplexans Babesia microti 25736 12 10 0 --- --- --- --- --- --- --- --- NCBI NR

Apicomplexans Cryptosporidium muris 26442 10 6 0 --- --- --- --- --- --- --- --- NCBI NR

Apicomplexans Cryptosporidium parvum 25233 14 8 0 --- --- --- --- --- --- --- --- NCBI NR

Apicomplexans Eimeria tenella 20938 29 16 2 --- --- --- --- --- --- --- --- NCBI NR

Apicomplexans Plasmodium falciparum 27311 7 4 0 --- --- --- --- --- --- --- --- NCBI NR

Apicomplexans Theileria annulata 23842 19 11 0 --- --- --- --- --- --- --- --- NCBI NR

Apicomplexans Toxoplasma gondii 26623 9 5 0 --- --- --- --- --- --- --- --- NCBI NR

Ciliates Ichthyophthirius multifiliis 25145 14 13 0 --- --- --- --- --- --- --- --- NCBI NR

Ciliates Paramecium tetraurelia 24211 18 21 0 --- --- --- --- --- --- --- --- NCBI NR

Ciliates Oxytricha trifallax 25367 14 13 0 --- --- --- --- --- --- --- --- NCBI NR

Ciliates Tetrahymena thermophila 26960 8 6 0 --- --- --- --- --- --- --- --- NCBI NR

Stramenopiles Aureococcus anophageferrens 27214 7 4 0 --- --- --- --- --- --- --- --- NCBI NR

Stramenopiles Ectocarpus siliculosus 28646 3 0 0 --- --- --- --- --- --- --- --- NCBI NR

Stramenopiles Saprolegnia parasitica 26602 10 8 0 --- --- --- --- --- --- --- --- NCBI NR

Stramenopiles Schizochytrium aggregatum 27594 6 4 0 --- --- --- --- --- --- --- --- NCBI NR

Stramenopiles Thalassiosira pseudonana 28137 4 1 0 --- --- --- --- --- --- --- --- NCBI NR

TOTAL 29400 12 30780 12 30988 10

Page 19: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

Table S2: Testing of selected alternative topologies of Noctiluca and Akashiwo. Probability scores for different topologies as given by Approximately unbiased (AU) and Expected likelihood weights (ELW) tests. Scores of p=0.05 or greater are highlighted in bold; n.a.=not applicable, outgroup missing.Test Dataset Root1 Root2 Root3 Topology

p(AU) p(ELW) p(AU) p(ELW) p(AU) p(ELW)

Noctiluca an early branch among core dinoflagellates, a sister lineage to Amphidinium,or branching as a second lineage after Amphidinium

All 1.000 1.000 1.000 1.000 n.a. n.a. (OUT,(Noc,(Amp,OCDs)));

2E-036 0 1E-005 0 n.a. n.a. (OUT,((Noc,Amp),OCDs));

1E-060 0 2E-004 0 n.a. n.a. (OUT,(Amp,(Noc,OCDs)));

All-Din 1.000 1.000 1.000 1.000 n.a. n.a. (OUT,(Noc,(Amp,OCDs)));

3E-004 0 1E-005 0 n.a. n.a. (OUT,((Noc,Amp),OCDs));

2E-045 0 1E-004 0 n.a. n.a. (OUT,(Amp,(Noc,OCDs)));

All-Pro 1.000 1.000 1.000 1.000 n.a. n.a. (OUT,(Noc,(Amp,OCDs)));

2E-008 0 3E-005 0 n.a. n.a. (OUT,((Noc,Amp),OCDs));

2E-008 0 1E-007 0 n.a. n.a. (OUT,(Amp,(Noc,OCDs)));

All-Din&Pro

1.000 1.000 1.000 1.000 n.a. n.a. (OUT,(Noc,(Amp,OCDs)));

5E-006 0 3E-004 0 n.a. n.a. (OUT,((Noc,Amp),OCDs));

1E-089 0 3E-004 0 n.a. n.a. (OUT,(Amp,(Noc,OCDs)));

Akashiwo a sister lineage of thecate dinoflagellates, a sister lineage of Togula+Gymnodiniaceae s.s., or branching ealier than Togula+Gymnodiniaceae s.s.

All 0.718 0.692 0.889 0.865 0.928 0.912 (OUT,(Gym+Tog,(Aka,The)));

0.315 0.305 0.133 0.129 0.082 0.088 (OUT,((Aka,Gym+Tog),The));

0.008 0.003 0.009 0.005 0.001 0.001 (OUT,(Aka,(Gym+Tog,The)));

All-Din 0.748 0.725 0.896 0.876 0.933 0.907 (OUT,(Gym+Tog,(Aka,The)));

0.290 0.266 0.139 0.120 0.083 0.091 (OUT,((Aka,Gym+Tog),The));

0.017 0.010 0.019 0.004 0.008 0.002 (OUT,(Aka,(Gym+Tog,The)));

All-Pro 0.655 0.619 0.871 0.832 0.877 0.866 (OUT,(Gym+Tog,(Aka,The)));

0.397 0.373 0.165 0.162 0.143 0.134 (OUT,((Aka,Gym+Tog),The));

0.021 0.009 0.014 0.006 0.003 0.000 (OUT,(Aka,(Gym+Tog,The)));

All-Din&Pro

0.596 0.565 0.868 0.840 0.904 0.891 (OUT,(Gym+Tog,(Aka,The)));

0.468 0.422 0.179 0.152 0.117 0.105 (OUT,((Aka,Gym+Tog),The));

0.032 0.013 0.026 0.008 0.006 0.004 (OUT,(Aka,(Gym+Tog,The)));

Table S3: Testing of all alternative topologies among thecate dinoflagellate orders. Probability scores for different topologies as given by Approximately unbiased (AU) and Expected likelihood weights (ELW) tests. All topologies in which at least on of the tests gave p=0.01 or greater are shown; p=0.05 or greater are highlighted in bold.Dataset Root1 Root2 Root3 Topology (OUT=outgroup)

p(AU) p(ELW) p(AU) p(ELW) p(AU) p(ELW)

All 0.580 0.279 0.429 0.156 0.366 0.092 (OUT,((Din,Pro),(Gon,(Per,Sym))));

0.570 0.245 0.742 0.351 0.773 0.376 (OUT,((Din,Gon),(Per,(Pro,Sym))));

0.524 0.208 0.458 0.108 0.366 0.059 (OUT,(Din,(Gon,(Per,(Pro,Sym)))));

0.515 0.227 0.565 0.267 0.611 0.331 (OUT,((Gon,(Din,Pro)),(Per,Sym)));

0.110 0.017 0.178 0.033 0.255 0.053 (OUT,(Din,(Per,(Gon,(Pro,Sym)))));

0.058 0.011 0.202 0.045 0.150 0.014 (OUT,(Din,((Per,Gon),(Pro,Sym))));

0.044 0.003 0.071 0.005 0.067 0.007 (OUT,(Din,((Gon,Pro),(Per,Sym))));

0.038 0.001 0.125 0.026 0.180 0.046 (OUT,(Per,((Din,Gon),(Pro,Sym))));

0.032 0.002 0.041 0.004 0.007 0 (OUT,((Din,Gon),(Pro,(Per,Sym))));

0.021 0.002 0.031 0 0.005 0 (OUT,(Din,(Gon,(Pro,(Per,Sym)))));

0.017 0 0.001 0 0.020 0 (OUT,((Din,(Pro,Gon)),(Per,Sym)));

0.013 0 <0.001 0 0.008 0 (OUT,(Gon,(Din,(Per,(Pro,Sym)))));

0.012 0 <0.001 0 <0.001 0 (OUT,(Gon,((Pro,(Din,Per)),Sym)));

Page 20: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

0.009 0.004 0.016 0.002 0.053 0.005 (OUT,((Din,Pro),(Per,(Gon,Sym))));

0.007 0 0.028 0.001 0.048 0.005 (OUT,((Pro,(Din,Gon)),(Per,Sym)));

0.005 0 0.019 0 0.006 0 (OUT,(Din,(Pro,(Gon,(Per,Sym)))));

<0.001 0 <0.001 0 0.017 0 (OUT,((Pro,Gon),(Din,(Per,Sym))));

<0.001 0 0.021 0 <0.001 0 (OUT,(Pro,(Din,(Per,(Gon,Sym)))));

<0.001 0 0.016 0 <0.001 0 (OUT,(Per,(Pro,(Din,(Gon,Sym)))));

<0.001 0 0.048 0.002 0.059 0.011 (OUT,((Per,(Din,Gon)),(Pro,Sym)));

<0.001 0 0.023 0 <0.001 0 (OUT,(Pro,(Din,(Gon,(Per,Sym)))));

<0.001 0 0.008 0 0.010 0.001 (OUT,(Pro,((Din,Gon),(Per,Sym))));

<0.001 0 <0.001 0 0.012 0 (OUT,((Pro,(Din,Per)),(Gon,Sym)));

<0.001 0 <0.001 0 0.018 0 (OUT,(Pro,(Gon,((Din,Per),Sym))));

<0.001 0 <0.001 0 0.036 0 (OUT,(Pro,(Gon,(Din,(Per,Sym)))));

All-Din 0.874 0.681 0.659 0.344 0.638 0.345 (OUT,(Gon,(Per,(Pro,Sym))));

0.336 0.159 0.544 0.290 0.613 0.312 (OUT,(Per,(Gon,(Pro,Sym))));

0.200 0.111 0.303 0.138 0.289 0.158 (OUT,((Pro,Gon),(Per,Sym)));

0.126 0.035 0.449 0.211 0.369 0.159 (OUT,((Gon,Per),(Pro,Sym)));

0.037 0.012 0.090 0.015 0.086 0.026 (OUT,(Pro,(Gon,(Per,Sym))));

0.012 0.003 0.008 0 <0.001 0 (OUT,(Gon,(Pro,(Per,Sym))));

<0.001 0 0.011 0 0.009 0 (OUT,((Pro,Per),(Gon,Sym)));

<0.001 0 <0.001 0 0.014 0 (OUT,(Pro,(Per,(Gon,Sym))));

All-Pro 0.603 0.569 0.761 0.713 0.824 0.771 (OUT,((Din,Gon),(Per,Sym)));

0.466 0.429 0.329 0.274 0.278 0.212 (OUT,(Din,(Gon,(Per,Sym))));

0.008 0.002 0.012 0.011 0.033 0.016 (OUT,(Din,(Per,(Gon,Sym))));

0 0 <0.001 0 0.010 0 (OUT,(Per,(Gon,(Din,Sym))));

All-Din&Pro 0.991 0.988 0.984 0.966 0.960 0.945 (OUT,(Gon,(Per,Sym)));

0.016 0.009 0.028 0.025 0.067 0.050 (OUT,(Per,(Gon,Sym)));

0.007 0.003 0.020 0.009 0.013 0.006 (OUT,(Sym,(Per,Gon)));

Table S4: Reference accessions for dinoflagellate cellulases, CESA-like and histon-like proteins. NCBI, CAMPEP=MMETSP (see Table S1), and S. minutum genome Db protein accession are shown.Protein or Enzyme type Protein name Dinoflagellate species Reference accession Reference source

Cellulose/polysacharide metabolism

Glycosyl hydrolase family 7, dCel1 Pyrocystis lunula ADG63073 NCBI nr

Glycosyl hydrolase family 7 Amphidinium carterae comp414_c0_seq2 NCBI TSA

Glycosyl hydrolase family 7 Karenia brevis CCMP2229 CAMPEP_0173626610

MMETSP0027, MMETSP0029-MMETSP0031

Glycosyl hydrolase family 7 Noctiluca scintillans SPMC136 CAMPEP_0194478120 MMETSP0253

Glycosyl hydrolase family 7 Oxyrrhis marina LB1974 CAMPEP_0190395562

MMETSP1424-MMETSP1426

Glycosyl transferase CESA-like, type 1 Alexandrium catenella OF101 CAMPEP_0171158290 MMETSP0790

Glycosyl transferase CESA-like, type 1 Gymnodinium catenatum GC744 CAMPEP_0117466612 MMETSP0784

Glycosyl transferase CESA-like, type 1 Karenia_brevis CCMP2229 CAMPEP_0173802094

MMETSP0027, MMETSP0029-MMETSP0031

Glycosyl transferase CESA-like, type 1

Protoceratium reticulatum CCCM535 CAMPEP_0168370212 MMETSP0228

Glycosyl transferase CESA-like, type 1

Scrippsiella trochoidea CCMP3099 CAMPEP_0115463718

MMETSP0270-MMETSP0272

Glycosyl transferase Karlodinium micrum CCMP2283 CAMPEP_0169068018 MMETSP1015-

Page 21: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

CESA-like, type 2 MMETSP1017

Glycosyl transferase CESA-like, type 2

Prorocentrum minimum CCMP2233 CAMPEP_0177019396

MMETSP0267-MMETSP0269

Glycosyl transferase CESA-like, type 3 Symbiodinium minutum Mf1.05b 018942.t1 S. minutum genome Db

Histon-like proteins, DNA-binding

Histone-like protein, HLP-I

Amphidinium carterae CCMP1314 ACJ04919 NCBI nr

Histone-like protein, HLP-I Noctiluca scintillans SPMC136 CAMPEP_0194550744 MMETSP0253

Histone-like protein, HLP-II, HCC3 Crypthecodinium cohnii AAM97522 NCBI nr

Histone-like protein, HLP-II Symbiodinium minutum Mf1.05b 017975.t1 S. minutum genome Db

Table S5: Proteins in non-photosynthetic dinoflagellate plastids. Presence of genes encoding for plastid, and cytosolic and mitochondrial proteins in Noctiluca, Oxyrrhis, and Dinophysis; full protein names, enzyme commission numbers, protein and pathway abbreviations (used in Fig. 3A), and sequence sources (Noctiluca TSA assembly, CAMNT=MMETSP contigs, or CAMPEP=MMETSP proteins; see Table S1) are shown.

PathwayLocalization Abbreviation EC no. Protein name in Noctiluca? in Oxyrrhis? a in Dinophysis?

Isoprenoid precursor biosynthesis (Isopentenyl diphosphate= IPP/ Dimethylallyl diphosphate= DMAP)

Cytosolic (mevalonate pathway = MEV)

HMGCS 2.3.3.10hydroxymethylglutaryl-CoA synthase

HMGCR 1.1.1.34hydroxymethylglutaryl-CoA reductase

MVK 2.7.1.36 mevalonate kinase

PMVK 2.7.4.2 phosphomevalonate kinase

MVD 4.1.1.33diphosphomevalonate decarboxylase

Plastid (non-mevalonate pathway = MEP/DOXP)

DXS 2.2.1.71-deoxy-D-xylulose-5-phosphate synthase c37102_g1_i1 CAMNT_0034174429 a CAMPEP_0179347270

IspC (DXR) 1.1.1.2671-deoxy-D-xylulose-5-phosphate reductoisomerase c8491_g1_i1 CAMPEP_0190399830 a

CAMNT_0021046817, CAMNT_0020960785

IspD 2.7.7.602-C-methyl-D-erythritol 4-phosphate cytidylyltransferase c34245_g1_i1 CAMNT_0020996201

IspE 2.7.1.1484-diphosphocytidyl-2-C-methyl-D-erythritol kinase c34076_g1_i1 CAMNT_0034113123 CAMNT_0021004701

IspF 4.6.1.122-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase

c27998_g3_i1, c27998_g2_i1 CAMPEP_0190331156

CAMNT_0020973523, CAMNT_0021088175

IspG 1.17.7.14-hydroxy-3-methylbut-2-en-1-yldiphosphate synthase c4614_g1_i1

CAMPEP_0190339536, CAMPEP_0190336160

CAMPEP_0179237040, CAMPEP_0179316044

IspH (LytB) 1.17.1.24-hydroxy-3-methylbut-2-enyl diphosphate reductase c37338_g1_i1 CAMNT_0020995807

Tetrapyrrole biosynthesis (heme, chlorophyll, etc.)

Mitochondrial (C4 pathway) only

ALAS 2.3.1.37 5-aminolevulinate synthase CAMNT_0034123849

Plastid (C5 pathway) only

GTR (HemA) 1.2.1.70 glutamyl-tRNA reductase c16606_g1_i1 CAMNT_0020962887

GSA (HemL) 5.4.3.8 glutamate-1-semialdehyde 2,1-aminomutase c6482_g2_i1 CAMNT_0021017431

Mitochondrial/Cytosolic (C4 pathway) or Plastid C5 pathway)

ALAD (HemB) 4.2.1.245-aminolevulinate dehydratase/ porphobilinogen synthase

c17529_g1_i1, c17529_g3_i1 CAMNT_0034089891 CAMNT_0020991467

PBGD (HemC) 2.5.1.61porphobilinogen deaminase hydroxymethylbilane synthase c1741_g1_i1_17 CAMNT_0034053199 CAMNT_0020995631

UROS (HemD) 4.2.1.75 uroporphyrinogen-III synthase CAMPEP_0179295544

UROD (HemE)

4.1.1.37 uroporphyrinogen decarboxylase

c18473_g1_i1, c18473_g2_i1, c5162_g1_i1,

CAMNT_0034059593 CAMPEP_0179265208, CAMPEP_0179260602, CAMPEP_0179236930

Page 22: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

c44501_g1_i1

CPOX (HemF) 1.3.3.3 coproporphyrinogen III oxidasec43130_g1_i1, c19480_g3_i1 CAMPEP_0190301822 CAMNT_0021027311

PPOX (HemY) 1.3.3.4 protoporphyrinogen oxidasec32625_g3_i1, c7748_g1_i1

CAMNT_0034176619 a, CAMPEP_0190445206 a

CAMNT_0020983029, CAMPEP_0179218520

FECH (HemH) 4.99.1.1 ferrochelatase c37180_g1_i1 CAMNT_0021055343

Fatty acid biosynthesis and elongation

Cytosolic (polyketide synthase/ fatty acid synthase type I pathway = PKS/FASI)

FAAL 2.3.1.86fatty acid synthase type I, fatty acyl ligase domain

CAMPEP_0194492864, CAMPEP_0194556372, CAMPEP_0194556876, CAMPEP_0194490892

CAMPEP_0190385440, CAMPEP_0179308804 CAMPEP_0179250968

KS 2.3.1.86fatty acid synthase type I, ketoacyl synthase domain

CAMPEP_0194492864, CAMPEP_0194518942, CAMPEP_0194531188, CAMPEP_0194533868, CAMPEP_0194544822, CAMPEP_0194547234, CAMPEP_0194547510, CAMPEP_0194549100, CAMPEP_0194553672, CAMPEP_0194555636, CAMPEP_0194556058, CAMPEP_0194556250, CAMPEP_0194556372, CAMPEP_0194556646, CAMPEP_0194556876

CAMPEP_0190386610, CAMPEP_0190386310, CAMPEP_0190385440, CAMPEP_0190321934

CAMPEP_0179225310, CAMPEP_0179230504, CAMPEP_0179231766, CAMPEP_0179233104, CAMPEP_0179240346, CAMPEP_0179274784, CAMPEP_0179301140, CAMPEP_0179302604, CAMPEP_0179309896, CAMPEP_0179340930, CAMPEP_0179347172, CAMPEP_0179358456, CAMPEP_0179372828

AT 2.3.1.86fatty acid synthase type I, acyl transferase domain

CAMPEP_0194492864, CAMPEP_0194482594, CAMPEP_0194485702, CAMPEP_0194531888, CAMPEP_0194554940, CAMPEP_0194556372, CAMPEP_0194556876, CAMPEP_0194557686

CAMPEP_0190386610, CAMPEP_0190386310, CAMPEP_0190385440, CAMPEP_0190315816

CAMPEP_0179225352, CAMPEP_0179232228, CAMPEP_0179232754, CAMPEP_0179257472, CAMPEP_0179301464, CAMPEP_0179308134

DH 2.3.1.86fatty acid synthase type I, dehydrase domain

CAMPEP_0194551082, CAMPEP_0194552350

CAMPEP_0190386610, CAMPEP_0190323466 CAMPEP_0179257472

ER 2.3.1.86fatty acid synthase type I, enoyl reductase domain

CAMPEP_0194551082, CAMPEP_0194552350 CAMPEP_0190323466 CAMPEP_0179257472

KR 2.3.1.86fatty acid synthase type I, ketoacyl reductase domain

CAMPEP_0194551082, CAMPEP_0194552350, CAMPEP_0194492864

CAMPEP_0190386310, CAMPEP_0190384048, CAMPEP_0190330588

CAMPEP_0179227628, CAMPEP_0179257472, CAMPEP_0179268324, CAMPEP_0179292270, CAMPEP_0179306616, CAMPEP_0179310080, CAMPEP_0179311052, CAMPEP_0179311360

ACP 2.3.1.86fatty acid synthase type I, acyl carrier protein domain

CAMPEP_0194492864, CAMPEP_0194549100, CAMPEP_0194551082, CAMPEP_0194552350, CAMPEP_0194556058, CAMPEP_0194556372, CAMPEP_0194556876

CAMPEP_0190321934, CAMPEP_0190323466, CAMPEP_0190385440 CAMPEP_0179257472

TRD (SDR) 2.3.1.86fatty acid synthase type I, terminal reductase domain

Endoplasmic reticulum (ER fatty acid elongation pathway)

ELO 2.3.1.199 beta-ketoacyl-CoA synthase CAMPEP_0194480994CAMPEP_0190376278, CAMPEP_0190332524

CAMPEP_0179225992, CAMPEP_0179261800

KCR 1.1.1.330 beta-ketoacyl-CoA reductase CAMPEP_0190306630

PHS 4.2.1.134 beta-hydroxyacyl-CoA dehydratase CAMPEP_0194540052 CAMPEP_0190316596

CAMPEP_0179242246, CAMPEP_0179358262

TECR 1.3.1.93 trans-2-enoyl-CoA reductase

Plastid (fatty acid synthase type II pathway = FASII)

FabD 2.3.1.39 malonyl-CoA-acyl carrier proteintransacylase

FabG 1.1.1.100beta-ketoacyl-acyl carrier protein reductase

FabH 2.3.1.180beta-ketoacyl-acyl carrier protein synthase III

FabZ 4.2.1.59D-3-hydroxyoctanoyl-acyl carrier protein dehydratas

FabI 1.3.1.9 enoyl acyl carrier protein reductase CAMPEP_0179266880 b

Page 23: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

FabB/F 2.3.1.41beta-ketoacyl-acyl carrier protein synthetase

ACP acyl-carrier protein

Iron-sulfur (Fe-S) cluster assembly

Plastid (Suf pathway)

SufAIron-sulfur assembly protein SufA

SufBIron-sulfur assembly protein SufB

c20274_g1_i1, c20274_g2_i1 CAMNT_0034061651 CAMNT_0021013865

SufCIron-sulfur assembly protein SufC CAMNT_0021025055 CAMNT_0034058317

SufDIron-sulfur assembly protein SufD CAMNT_0021040041

SufEIron-sulfur assembly protein SufE

Ferredoxin redox system

Plastid (Fd – FNR pathway)

Fd (PetF) [2Fe-2S] ferredoxin c41939_g1_i1 CAMNT_0034037779 CAMNT_0020927107

FNR (PetH) 1.18.1.2 Ferredoxin NADP+ reductase c11604_g1_i1 CAMNT_0020923739

Triosephosphate translocation

Plastid (TPT translocon)

TPTTriosephosphate/phosphate transporter c40379_g1_i1

CAMNT_0021019703, CAMNT_0021010591

Protein folding and processing

Plastid (ClpC chaperone/protease)

ClpCChloroplast molecular chaperone ClpC

c23770_g4_i1 c23770_g5_i1 CAMNT_0034034689 CAMNT_0020950785

a sequence was obtained from Oxyrrhis marina LB1974 MMETSP1424-1426 combined assembly (all other were obtained from Oxyrrhis marina MMETSP0468-471 combined assembly).b closely related to Kareniaceae, a full plastid pathway unlikely to be present - see main text

Table S6: Signal and target peptide predictions. Prediction statistics of proteins predicted to be targeted to the non-photosynthetic plastid in Noctiluca, Oxyrrhis, and Dinophysis are listed (protein abbreviations correspond to Table S5). Protein ID: CAMNT=MMETSP contig at iMicrobe or Noctiluca scintillans Trinity assembly contig name in NCBI TSA; predicted cleavage sites of signal (Cmax and Ymax scores) and transit peptides (CS-score) are generally low - this has been observed previously in the dinoflagellate and Perkinsus plastid proteins.No

Dinoflagellate species

Protein and accession

N-terminal region integrity

Signal peptide (SP) prediction in SignalP 4.1Transit peptide (cTP) prediction inChloroP 1.1

Pro

tein

Pro

tein

ID

N-t

erm

inal

ext

ensi

on?

Met

pre

sent

?

Spl

iced

lead

er a

t 5' e

nd?

ST

OP

ups

trea

m o

f 1st M

et?

SP

Cm

ax s

core

SP

Ym

ax s

core

SP

Sm

ax s

core

SP

Sm

ean

scor

e

Cle

avag

e si

te p

ositi

ons

SP

D-s

core

SP

Net

wor

ks-u

sed

Phe

nyal

anin

e pr

esen

t?

(pos

ition

rel

ativ

e to

cl

eava

ge s

ite)

cTP

sco

re

cTP

CS

-sco

re

cTP

leng

th

Sec

ond

hydr

opho

bic

dom

ain

1Noctiluca scintillans IspD

c34245_g1_i1 yes yes 6 yes 0.644 0.736 0.961 0.852 30-31 0.798

SignalP-noTM

FAMP (-2)** 0.477 2.767 75

2Noctiluca scintillans IspE

c34076_g1_i1 yes yes 0.200 0.341 0.803 0.686 24-25 0.479

SignalP-TM FDLV (+1) 0.560 11.862 70

3Noctiluca scintillans IspG

c4614_g1_i1 yes yes 12 0.222 0.385 0.814 0.661 21-22 0.534

SignalP-noTM

FVSS (+7) 0.519 4.804 30

4Noctiluca scintillans IspH

c37338_g1_i1 yes yes 6 yes 0.340 0.447 0.710 0.573 19-20 0.515

SignalP-noTM

FALP (+14) 0.587 13.209 67

5Noctiluca scintillans Fd

c41939_g1_i1 yes yes 6 yes 0.399 0.581 0.940 0.852 27-28 0.727

SignalP-noTM

FAIA (+7)** 0.460 2.204 55

6Noctiluca scintillans FNR

c11604_g1_i1 yes yes 0.24 0.47 0.98 0.92 28-29 0.71

SignalP-noTM

FVQM (-3)** 0.55 11.93 51

Page 24: Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics Jan Janouskoveca,b,c,d,1, Gregory S.

7Dinophysis acuminata IspC

CAMNT_0021046817 yes yes 0.520 0.623 0.859 0.735 17-18 0.684

SignalP-noTM

FVPG (+1) 0.554 4.408 33

8Dinophysis acuminata IspF

CAMNT_0020973523 yes 0.323 0.495 0.854 0.749 19-20 0.632

SignalP-noTM FSHQ(-2) 0.476 1.677 22

9Dinophysis acuminata TPT1

CAMNT_0021019703 yes yes 6 0.758 0.692 0.765 0.634 23-24 0.661

SignalP-noTM 0.510 2.188 21

10Dinophysis acuminata TPT2

CAMNT_0021010591 yes yes yes 0.817 0.740 0.793 0.675 23-24 0.705

SignalP-noTM 0.523 1.992 21

11Dinophysis acuminata SufC

CAMNT_0021025055 yes 0.62 0.73 0.9 0.86 19-20 0.8

SignalP-noTM FAAA (-2) 0.46 2.91

4***

12Dinophysis acuminata FNR

CAMNT_0021011101 yes 0.22 0.42 0.95 0.8 21-22 0.63

SignalP-noTM

FASP (+13) 0.51 7.62 42

13Dinophysis acuminata Fd

CAMNT_0020927107 yes yes 8* yes 0.188 0.382 0.922 0.786 25-26 0.600

SignalP-noTM FVAP (+7) 0.463 4.645 67

14Oxyrrhis marina MMETSP0468 IspG

CAMNT_0034079579 yes 0.684 0.591 0.600 0.524 19-20 0.564

SignalP-TM FSLR (+5) 0.520 3.042 72 yes

15Oxyrrhis marina MMETSP0468 ALAD

CAMNT_0034089891 yes yes 0.609 0.612 0.885 0.707 26-27 0.650

SignalP-TM 0.518 0.664 73 yes

16Oxyrrhis marina MMETSP0468 SufB

CAMNT_0034061651 yes yes 0.18 0.38 0.89 0.8 15-16 0.55

SignalP-TM 0.45 2.91 64

17Oxyrrhis marina MMETSP0468 PBGD

CAMNT_0034053199 yes yes 0.427 0.486 0.707 0.588 19-20 0.527

SignalP-TM

FLQS (+8) 0.516 1.892 65 yes

Oxyrrhis marina LB1974 PBGD

CAMNT_0034168717 yes yes 6 0.159 0.335 0.841 0.710 17-18 0.485

SignalP-TM

FLES (+10) 0.527 1.892 70 yes

18Oxyrrhis marina MMETSP0468 Fd

CAMNT_0034037779 yes 0.544 5.420 40

*masked by 3 nucleotides at 5' end**alternative SP cleavage site at the position +1***alternative TP cleavage site likely present