High value phycotoxins from the dinoflagellate Prorocentrum.
Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in...
Transcript of Major transitions in dinoflagellate evolution unveiled by ... · Major transitions in...
Major transitions in dinoflagellate evolution unveiledby phylotranscriptomicsJan Janou�skoveca,b,c,d,1, Gregory S. Gavelise, Fabien Burkic,2, Donna Dinhc, Tsvetan R. Bachvarofff, Sebastian G. Gornikg,Kelley J. Brighth, Behzad Imanianc, Suzanne L. Stromh, Charles F. Delwichei, Ross F. Wallerj, Robert A. Fensomek,Brian S. Leanderc,d,e, Forest L. Rohwerb,d, and Juan F. Saldarriagac
aDepartment of Genetics, Evolution and Environment, University College London, London WC1E 6BT, United Kingdom; bBiology Department, San DiegoState University, San Diego, CA 92182; cBotany Department, University of British Columbia, Vancouver, BC V6T 1Z4, Canada; dProgram in IntegratedMicrobial Diversity, Canadian Institute for Advanced Research, Toronto, ON M5G 1Z8, Canada; eZoology Department, University of British Columbia,Vancouver, BC V6T 1Z4, Canada; fInstitute for Marine and Environmental Technology, University of Maryland Center for Environmental Sciences, Baltimore,MD 21202; gCentre for Chromosome Biology, School of Natural Sciences, National University of Ireland, Galway, Ireland; hShannon Point Marine Center,Western Washington University, Anacortes, WA 98221; iDepartment of Cell Biology and Molecular Genetics and Agricultural Experiment Station, Universityof Maryland, College Park, MD 20742; jDepartment of Biochemistry, University of Cambridge, Cambridge CB2 1QW, United Kingdom; and kBedfordInstitute of Oceanography, Geological Survey of Canada (Atlantic), Dartmouth, NS B2Y 4A2, Canada
Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved November 28, 2016 (received for review September 8, 2016)
Dinoflagellates are key species in marine environments, but theyremain poorly understood in part because of their large, complexgenomes, unique molecular biology, and unresolved in-grouprelationships. We created a taxonomically representative datasetof dinoflagellate transcriptomes and used this to infer a stronglysupported phylogeny to map major morphological and moleculartransitions in dinoflagellate evolution. Our results show an early-branching position of Noctiluca, monophyly of thecate (plate-bearing)dinoflagellates, and paraphyly of athecate ones. This represents un-ambiguous phylogenetic evidence for a single origin of the group’scellulosic theca, which we show coincided with a radiation of cellu-lases implicated in cell division. By integrating dinoflagellate molec-ular, fossil, and biogeochemical evidence, we propose a revisedmodel for the evolution of thecal tabulations and suggest that thelate acquisition of dinosterol in the group is inconsistent with dino-flagellates being the source of this biomarker in pre-Mesozoicstrata. Three distantly related, fundamentally nonphotosyntheticdinoflagellates, Noctiluca, Oxyrrhis, and Dinophysis, contain cryp-tic plastidial metabolisms and lack alternative cytosolic pathways,suggesting that all free-living dinoflagellates are metabolically de-pendent on plastids. This finding led us to propose general mech-anisms of dependency on plastid organelles in eukaryotes that havelost photosynthesis; it also suggests that the evolutionary originof bioluminescence in nonphotosynthetic dinoflagellates may belinked to plastidic tetrapyrrole biosynthesis. Finally, we use ourphylogenetic framework to show that dinoflagellate nuclei haverecruited DNA-binding proteins in three distinct evolutionarywaves, which included two independent acquisitions of bacterialhistone-like proteins.
dinoflagellates | phylogeny | theca | plastids | dinosterol
Dinoflagellates comprise approximately 2,400 named extantspecies, of which approximately half are photosynthetic (1).
However, this represents a fraction of their estimated diversity: insurface marine waters, dinoflagellates are some of the most abun-dant and diverse eukaryotes known (2). Dinoflagellates’ ecologicalsignificance befits their abundance: photosynthetic species aredominant marine primary producers, and phagotrophic species playan important role in the microbial loop through predation andnutrient recycling. Approximately 75–80% of the toxic eukaryoticphytoplankton species are dinoflagellates, and they cause shellfishpoisoning and harmful algal blooms of global importance. Symbioticgenera like Symbiodinium participate in interactions with metazoansand are essential for the formation of reef ecosystems, and parasiticforms play a central role in the collapse of harmful algal blooms,including those caused by dinoflagellates themselves (3). Dinofla-gellates synthesize important secondary metabolites including ste-rols, polyketides, toxins, and dimethylsulfide, and several of them
have evolved bioluminescence. They have a nonnucleosomal systemof nuclear DNA packaging, widespread trans-splicing in mRNAs,and highly unusual plastid and mitochondrial genomes with com-plex transcript modifications (4–8). Their photosynthesis relies onunique light-harvesting complexes, and its frequent loss in the groupmakes dinoflagellates a model for understanding the basis of evo-lutionary reliance on nonphotosynthetic plastid organelles.Detailed understanding of dinoflagellate biology has been
limited by a paucity of sequence data, especially unusual featuressuch as the organization of their very large and complex nucleargenomes (9, 10). Poorly resolved dinoflagellate trees have fur-ther complicated predictions of how specific metabolic pathwaysevolved and how they are distributed in uncultured members ofthe group. To date, molecular phylogenies have established thedeep-branching positions of Oxyrrhis marina (here included inthe dinoflagellates) and the parasitic Syndiniales [possibly sev-eral lineages (11)], but the internal relationships in the so-calledcore dinoflagellates, that is, all other orders and most species inthe group, have remained unresolved except at low taxonomiclevels (12–14). Traditionally, dinoflagellate taxonomy has been
Significance
We created a dataset of dinoflagellate transcriptomes to resolveinternal phylogenetic relationships of the group. We show thatthe dinoflagellate theca originated once, through a process thatlikely involved changes in the metabolism of cellulose, and sug-gest that a late origin of dinosterol in the group is at odds withdinoflagellates being the source of this important biomarkerbefore the Mesozoic. We also show that nonphotosynthetic di-noflagellates have retained nonphotosynthetic plastids with vitalmetabolic functions, and propose that one of these may be theevolutionary source of dinoflagellate bioluminescence. Finally,we reconstruct major molecular and morphological transitions indinoflagellates and highlight the role of horizontal gene transferin the origin of their unique nuclear architecture.
Author contributions: J.J. and J.F.S. designed research; J.J., G.S.G., F.B., D.D., T.R.B., S.G.G.,K.J.B., B.I., S.L.S., C.F.D., R.F.W., R.A.F., B.S.L., F.L.R., and J.F.S. performed research; J.J.analyzed data; and J.J. and J.F.S. wrote the paper with contributions from R.A.F.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The sequences reported in this paper have been deposited in the iMic-robe database (project code CAM_P_0001000) and GenBank Transcriptome Shotgun As-sembly (TSA) Sequence Database (accession nos. GELK00000000 and GEMP00000000).1To whom correspondence should be addressed. Email: [email protected] address: Department of Organismal Biology, Uppsala University, 75236 Uppsala,Sweden.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1614842114/-/DCSupplemental.
www.pnas.org/cgi/doi/10.1073/pnas.1614842114 PNAS Early Edition | 1 of 10
EVOLU
TION
PNASPL
US
based on their tabulation, the arrangement of vesicles in the cellcortex that may or may not contain cellulosic thecal plates(collectively the theca). Whether the dinoflagellate theca origi-nated once or multiple times has been controversial. Dinofla-gellates have left a fossil record that is one of the richest amongprotists, and many preserve a detailed record of tabulationthrough reflection of thecal plates that provide insights into thehistory of some modern taxa, as well as extinct groups. They havealso left an extensive biogeochemical record (i.e., sterols), butreconciling this evidence with poorly resolved gene phylogenieshas been difficult (15, 16).We circumvented the difficulties inherent to the sequencing of
large dinoflagellate genomes by compiling a phylogenetically rep-resentative transcriptomic dataset to illuminate dinoflagellate bi-ology and evolution. We infer a strongly resolved phylogeny fordinoflagellates and provide phylogenetic evidence for a single or-igin of the theca, which coincides with major predicted changes incellulose metabolism. We propose a model for the evolution oftabulation, and show that pre-Mesozoic biomarkers that have oftenbeen associated with the group are unlikely to come from dino-flagellate sources. Three distantly related, nonphotosynthetic di-noflagellates were found to be dependent on plastid metabolism,and we propose that this dependency is likely to apply to all free-living (i.e., nonparasitic) dinoflagellates and that plastidial metab-olites are likely to represent the evolutionary origin of dinoflagellatebioluminescence. Finally, we reconstruct character evolution indinoflagellates and show that their modern-day biology was shapedby stepwise molecular, metabolic, and morphological innovations,including nuclear DNA-binding proteins of a bacterial origin.
Results and DiscussionDinoflagellate Phylogeny.Representative, strongly resolved phylogeny for dinoflagellates. An in-ability to resolve dinoflagellate relationships has hindered evolu-tion-driven predictions of their biology and a full integration of thegroup’s rich fossil record with molecular-based schemes of evolu-tion. Our aim was to overcome these limitations by erecting aframework for character mapping rooted in a representative phy-logeny of all major dinoflagellate lineages. We generated tran-scriptomes from key species lacking deep-coverage sequencedata—Noctiluca scintillans, Togula jolla, Protoceratium reticulatum,Polarella glacialis, Hematodinium spp., Amphidinium carterae, andtwo isolates of Amoebophrya sp. parasites together with their hosts,Karlodinium veneficum and Akashiwo sanguinea—and com-plemented these with data from recent sequencing projects (9,17–19) (SI Appendix, Table S1). Sequences were added into align-ments of conserved proteins previously used in eukaryotic phylog-enies (20), and their orthology was verified in individual proteintrees (Materials and Methods); 101 orthologous alignments with thefewest missing data were selected and concatenated into three phy-logenetic matrices that differ by the root (Fig. 1A and SI Appendix,Table S1). The matrices include six dinoflagellate lineages previouslyabsent in multiprotein phylogenies: Noctilucales, Gymnodiniaceae s.s.,Togula, Akashiwo, Prorocentrales, and Dinophysiales, representinga broadly sampled large dinoflagellate datasets. Maximum-likelihoodand Bayesian inferences on all three matrices gave consistent andwell-supported topologies (Fig. 1 A and B). Relationships betweenthe outgroups and the early-branching Oxyrrhis, Hematodinium,
Perkinsus marinusOxyrrhis marina
Hematodinium sp. ex NephropsAmoebophrya sp. ex A. sanguinea
Amoebophrya sp. ex K. veneficumNoctiluca scintillansAmphidinium carterae
Karenia brevisKarlodinium veneficum
Togula jollaGymnodinium catenatum
Polykrikos lebouraeAkashiwo sanguinea
Dinophysis acuminataProrocentrum minimum
Alexandrium spp.Lingulodinium polyedrum
Protoceratium reticulatumPolarella glacialis
Symbiodinium sp. CassKB8Symbiodinium minutum
Heterocapsa spp.Scrippsiella trochoidea
Durinskia balticaKryptoperidinium foliaceum CCMP1326Kryptoperidium foliaceum CCAP1116/3A
Root 3 (R3)
Root 2 (R2)
0.1
99/99/1
51/54/dt
72/64/1
50/-/dt
50/53/dt
Peridiniales
Symbiodi-niaceae
Gonyaulacales
Gym
nodi
nial
es
Syndiniales
Cor
e di
nofl
agel
late
DIN
OFL
AG
ELLA
TES
1
2
3
4
5
6
Gymno-diniaceae
s.s.
Kareniaceae
PerSym
GonDin
Pro
C PerSym
GonDin
ProPro
Pro
1
3
56
4
2Noctiluca (and Amphidinium) early-branchingGymnodiales paraphyletic to thecates, andTogula sister group of Gymnodiniaceae s.s.Akashiwo sister group to thecates
Heterocapsa with other Peridiniales (not early-branching)Thecates monophyletic (unambiguous support)
Symbiodiniaceae nested within thecates (not intermediate) Cellulosic thecal plates
Principal findings:
R1Noc + othercore dino. R2
R1R2
R2R3
R2R3
R2R3
Het +other Per
Thecates
Aka +thecates
Noc+Amp
Sym + Per
Sym + Pro
R1
R1
R1
R2R3
R1
R2R3
R1
86/86/190/87/1
97/99/199/97/1
dt/dt/dtdt/dt/.99
68/62/1
98/98/1
100/100/dt
dt/dt/dtdt/dt/dt
84/87/1dt/dt/1
All - Din
80/67/184/79/dt
88/85/190/87/1
100/99/1
dt/dt/dtdt/dt/.99
100/100/dt
72/64/1
99/99/1
51/54/dtdt/dt/dt
dt/dt/1dt/dt/1
All
57/48/157/65/dt
84/87/184/84/1
dt/dt/dtdt/dt/.99
56/48/1
100/100/dt
96/96/193/93/1
99/98/1
All - Din & Pro
not applicable
86/84/185/88/1
99/100/1
dt/dt/dtdt/dt/.99
62/54/1
100/100/dt
99/99/198/99/1
100/100/1
not applicable
All - Pro
97/95/1
B CladeSpecies presence
1
3
5
6
4
R2R3
R1
R2R3
R1100/100/.82
2 100/100/.92100/100/.94100/100/.92
100/100/.94100/100/.98
100/100/.95100/100/.98
Kar +Gym,TogAka+thecates
Gym,Tog+Aka + thecates
Matrix
SINGLE ORIGINOF THECA
Schizochytrium aggregatumSaprolegnia parasitica
Ectocarpus siliculosusAureococcus anophageferrens
Thalassiosira pseudonana
Paramecium tetraureliaTetrahymena thermophilaIchthyophthirius multifiliis
Cryptosporidium parvumCryptosporidium muris
Toxoplasma gondiiEimeria tenella
Plasmodium falciparumBabesia microti
Theileria annulataBabesia bovis
99/99/1
Apicomplexans
Ciliates
StramenopilesRoot 1 (R1)
Oxytricha trifallax
Noctilucales
Perkinsozoa
Fig. 1. Multiprotein phylogeny of dinoflagellates. (A) Best maximum-likelihood tree (IQ-Tree) of dinoflagellates and relatives based on 101-protein dataset(root 1 matrix, 43 species, 29,400 sites). Branches show ultrafast bootstraps (IQ-Tree)/nonparametric bootstraps (RAxML)/posterior probabilities (PhyloBayes)(dash indicates <50/50/0.5 support; filled circles indicate 100/100/1 support; dt indicates a different topology). Roots of alternative matrices (Perkinsus, root 2,30,780 sites; and Noctiluca, root 3, 30,988 sites) are shown by arrows. (B) Overview of branch supports for principal findings (taxon and matrix abbreviations asunderlined in A) in phylogenies of 12 matrices that differ by their root (R1–R3) and species presence (All, All - Din, All - Pro, All - Din & Pro; SI Appendix, TableS1). (C) Two placements of Dinophysis (Din) relative to Gon, Per, and Sym thecates and a variable position of Prorocentrum (Pro) as identified in phylogeniesof the 12 matrices (SI Appendix, Table S3, provides tree topology tests).
2 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1614842114 Janou�skovec et al.
and Amoebophyra spp. are fully resolved and congruent with ear-lier studies (11). Core dinoflagellates are monophyletic, and sev-eral longstanding issues about their relationships can be resolved(Fig. 1).Early position of Noctilucales and athecate paraphyly. Athecate dinofla-gellates have long confounded dinoflagellate molecular phyloge-nies as a result of their intermixing with thecate taxa, for examplewithin the so-called Gymnodiniales–Peridiniales–Prorocentrales(GPP) complex (21), or as a result of the unstable position ofcertain outliers like the Noctilucales, which have at times beenplaced as basal or nested deeply inside the group (12, 16, 22). Ouranalyses resolve these issues and help reconcile dinoflagellatemorphological and molecular data in several important ways. First,we find that athecate dinoflagellates represent a paraphyletic as-semblage with respect to the thecates (Fig. 1A), suggesting thatearlier mixed groupings like the GPP complex are artifacts causedby limited phylogenetic resolution. Second, N. scintillans andA. carterae are the earliest-branching core dinoflagellates, with Noc-tiluca positioned at the base in most analyses, except for Bayesianinferences on Root 2 matrix, in which it is also basal but togetherwith Amphidinium (Fig. 1B). Statistical evaluation of alternativetree topologies by approximately unbiased test and expected-likelihood weights test rejects topologies other than Noctilucarepresenting the earliest branch of core dinoflagellates (P = 0.01;see SI Appendix, Table S2 and SI Materials and Methods). Thisposition is reinforced by the absence of a cox3 split inNoctiluca (asdetailed later) and resolves the long-problematic position of theNoctilucales (12–14, 16, 22), making them central to understand-ing the biology of the core dinoflagellate ancestor. Third, the pre-viously mysterious Togula (23) is related to the Gymnodiniaceaesensu stricto (a clade represented here by Gymnodinium s.s. andPolykrikos). Finally, Akashiwo is placed as the sister taxon tothecate dinoflagellates in all analyses, although an alternativetopology as a sister to Gymnodiaceae s.s. and Togula cannot berejected (SI Appendix, Table S2). Statistical support for themonophyly of Akashiwo and the thecates increases when thedivergent outgroup sequences are excluded in both phylogeniesand tree topology tests (Fig. 1B and SI Appendix, Table S2). Thissuggests that the relationship is likely genuine, making Akashiwothe closest investigated athecate relative of thecate dinoflagellates.(Fig. 1B). Overall, the order Gymnodiniales represents multipleparaphyletic lineages at the base of the core dinoflagellates,despite their close morphological similarity [Akashiwo, theKareniaceae, and even one member of the Noctilucales were, untilrecently, classified in the genus Gymnodinium (13, 24)], suggestingthat their conserved morphological characteristics were ancestral toall core dinoflagellates.Monophyly of thecate dinoflagellates and nested position of Symbiodiniaceae.In molecular phylogenies, thecate dinoflagellates have been mixedwith athecate species and are only exceptionally recovered asmonophyletic in specific datasets and with low support (12, 14).Our large-scale phylogenies, which include all five major thecategroups, recover thecate dinoflagellates as monophyletic, always withmaximal or near-maximal support (Fig. 1 A and B). This providesunambiguous phylogenetic support for the single origin of the di-noflagellate theca. Peridiniales, Gonyaulacales, and Symbiodinia-ceae (represented by Symbiodinium and Polarella) are monophyleticin all our analyses. The long-problematic Heterocapsa, previouslyplaced at the base of dinoflagellates (25), or away from the Peri-diniales (14), is strongly resolved as the sister group to other Peri-diniales (Fig. 1 A and B), a position consistent with its modifiedperidinialean tabulation (26). The placement of Prorocentrum andDinophysis, both representatives of poorly sampled and morpho-logically divergent orders, remains unresolved within the thecates:Dinophysis is placed at the base of the Gonyaulacales or of allthecates with low support, and the position of Prorocentrum is evenmore unstable (Fig. 1C). Analyses excluding Dinophysis, Pro-rocentrum, or both confirm the common origin and monophyly and
of the other thecate lineages, that is, the Gonyaulacales, Symbio-diniaceae, and Peridiniales inclusive of Heterocapsa (Fig. 1B). Thebranching order of these core thecate lineages is also conserved: theGonyaulacales always branch comparatively early, and the Sym-biodiniaceae are always late-branching within the thecates andconsistently recovered close to the Peridiniales. This topology isweakly supported, but support increases when the problematicProrocentrum is excluded (Fig. 1C). An exhaustive testing of alter-native tree topologies (SI Appendix, Table S3 and SI Materials andMethods) rejects all topologies in which the Symbiodiniaceae ap-pear as the sister group of other thecates at the significance level ofP = 0.05 (and also at P = 0.01 except for a single dataset in whichboth Dinophysis and Prorocentrum are absent). Symbiodiniaceae(Symbiodinium, Polarella, and related forms) are frequently classi-fied together with the early fossil genus Suessia as the “Suessiales”(26) or even within the “Suessiaceae” (27, 28), but, if this is correct,the Symbiodiniaceae should appear as the sister group of all otherliving thecates, a topology never recovered in phylogenies. Mor-phological evidence does not support the combination of the twogroups either: although tabulations in symbiodiniaceans and sues-siaceans have more series of thecal plates than most thecate dino-flagellates, determining the homologies of individual plates is notpossible (26, 29). Thus, we use the family Symbiodiniaceae (26) forthe clade uniting Symbiodinium, Polarella, and their modern rela-tives (27, 28) to separate them from the exclusively fossil Suessia-ceae (Suessia and related forms). It remains possible (but not likely)that the Suessiaceae developed their theca independently, but allother fossil and modern thecate lineages seem to have originatedfrom a common ancestor. Four independent lines of evidencesupport this: monophyly of the modern thecates in multiproteinphylogenies (Fig. 1), rapid emergence of fossils reflecting the pos-session of the theca during the early Mesozoic (30), similarities intabulation patterns between different thecate lineages (15, 26), andthe presence of theca-associated cellulases of a common evolu-tionary origin in modern thecates (Fig. 2).
Thecal Evolution and Dinoflagellate Paleohistory.Phylogeny-driven model for theca origin, evolution, and loss. Most the-cate dinoflagellates (both living and fossil) belong to theGonyaulacales and Peridiniales, two orders with tabulations in-volving five to six latitudinal series of thecal plates. The details ofthese tabulations are consistently distinct and longstanding in thefossil record, a pattern consistent with the fact that, in molecularphylogenies, the two orders are not closely related within thethecates (Fig. 1). These patterns suggest that dinoflagellates withgonyaulacoid–peridinoid tabulations originated comparativelyearly: the extinct rhaetogonyaulacoids (Fig. 2A) in the Middle toLate Triassic (31) and true, modern-looking gonyaulacoids andperidinoids in the later Early Jurassic. Even if the phylogeneticposition of the Dinophysiales and Prorocentrales in moleculartrees remains unresolved, their tabulation patterns are mor-phologically divergent and unlikely to represent ancestral ortransitional states: the fossil Nannoceratopsis suggests, for ex-ample, that the dinophysioid tabulation type is evolutionarilyderived (Fig. 2A). As explained earlier, we suggest that thesuessioid and gymnodinioid tabulations of the Symbiodiniaceaeand their sister group, the Borghiellaceae (27), are also derivedsecondarily from gonyaulacoid–peridinioid ancestors and origi-nated by a secondary increase in plate number (Fig. 2A); they donot represent early intermediates in theca evolution, as con-sidered by some earlier models (15, 32). In contrast, the LateTriassic suessioid fossils such as Suessia could represent anintermediate stage between gymnodinioid and gonyaulacoid–peridinioid tabulation types or an independent example of de-crease in primary plate number from gymnodinioid ancestors(Fig. 2A). All in all, paleontological and molecular phylogeneticdata suggest that all living thecate dinoflagellates originatedfrom ancestors with a gonyaulacoid–peridinoid tabulation and
Janou�skovec et al. PNAS Early Edition | 3 of 10
EVOLU
TION
PNASPL
US
argue for the derived position of the Symbiodiniaceae. Themodel is limited by the incompleteness of the fossil record and willbe further developed by understanding the tabulations and phy-logenies of little known or morphologically divergent incertae sedisthecates like Heterodinium, Thecadinium, or Cladopyxis (26). Nosimple scenario [plate decrease, increase, and fragmentationmodels (32)] can account for the evolution of thecal tabulationfrom a phylogeny-driven perspective (Fig. 1): secondary increase inplate number is observed not only in symbiodiniaceans but also inPyrophacus (Gonyaulacales), a genus with a multiplated tabulationderived from ancestors with a gonyaulacoid tabulation, whereasother thecates have gone through a process of plate decrease, e.g.,Dinophysiales and Prorocentrales (in the hyposome) and theLate Triassic to Middle Jurassic fossil Valvaeodinium. Our modelalso strongly suggests that the theca can be lost: some species inthe Symbiodiniaceae and Borghiellaceae lack visible cellulose inamphiesmal vesicles altogether (28, 33), and their phylogeneticpositions suggest that their thecae were lost more than once (Fig.2A). Finally, a broad, negative relationship between the numberand relative surface area of amphiesmal vesicles and the amount ofcellulose contained in them emerges. The Gymnodiniales havenumerous, small amphiesmal vesicles that lack cellulose, whereasthe Gonyaulacales, Peridiniales, Prorocentrales, and Dinophysialeshave few, large amphiesmal vesicles containing thick thecal plates,the ancestral state for all living thecate dinoflagellates (Fig. 2A).Symbiodiniaceans that have moderate plate numbers in 7–10
latitudinal series have only thin cellulosic plates, but those mem-bers of the Symbiodiniaceae and Borghiellaceae that reverted to agymnodinoid tabulation often lack cellulose altogether (Fig. 2A)(e.g refs. 28, 33, but see also ref. 27). Additional data for examplefrom the Borghiellaceae and Pyrophacus will make it possible totest these trends, but, as things stand now, it seems that the acqui-sition of thick cellulosic plates within amphiesmal vesicles is con-strained with their surface area and number. Subsequent reductionsand losses of cellulose in the Symbiodiniaceae and Borghelliaceaerelaxed this constraint, leading to a partial or complete reversal tonumerous small-sized amphiesmal vesicles.Origin of theca coincides with onset of cellulase radiation. The origin ofthe dinoflagellate theca is intimately linked to the biosynthesis ofcellulose, its building material, but investigations into the detailsof cellulose production in dinoflagellates have been limited to rareultrastructural and labeling studies (34). Recently, production of ahighly expressed cellulase [dCel1 from Glycosyl hydrolase family 7(GH7)] was shown to be coupled to the cell cycle progression inCrypthecodinium cohnii and was immunolocalized to the cell wallin several dinoflagellates, suggesting an important role in celluloseprocessing during division (31). We identified multiple diversifiedparalogs of GH7 genes in all thecates and one to three closelyrelated paralogs in four athecate dinoflagellates in our dataset(SI Appendix, Table S4). A eukaryote-wide phylogeny of 184slow-evolving GH7 protein sequences (Fig. 2B and SI Appendix,Fig. S1 and SI Materials and Methods) suggests that the thecate
DinophysisProrocentrum
Gonyaulacales Symbiodiniaceae
Peridiniales
Thecate dinoflagellates:
Fungi, Daphnia
Oomycetes CrustaceansAmphimedon Amoebozoans
Hypermastigotes Aureococcus
Emiliania
0.2
dCel1
dCel
2
Kareniabrevis
Oxyrrhis marina
Noctiluca scintillans
Amphidinium carterae
Bigellowiella natans
gonyaulacoid-peridinioid
e.g., Peridiniumgymnodinioid
e.g., Gymnodiniume.g., Symbiodinium
& Leiocephalium
suessioid* & gymnodinioid*
amphiesmal vesiclecross-section
singleorigin
of theca
cellulose
prorocentroide.g., Prorocentrum
nannoceratopsioide.g., Nannoceratopsis
(Jurassic)
rhaeto-gonyaulacoide.g., Rhaeto-gonyaulax(Triassic)
suessioide.g., Suessia
(Triassic)
dinophysioide.g., Dinophysis
CELLULOSICTHECA
theca reduction
or loss
?
Jurassic CretaceousProt. Cambrian Ordov. Devon. Carbonifer. Perm. TriassicSil.
Sam
ples
(%) b
y pe
riod
with
TA-
Dino
ster
oids
(c)
Rela
tive
Num
ber o
f Spe
cies (
a an
d b)
DINOFLAGELLATES
ACRITARCHS
b
c
a
0
10
20
30
40
50
60
70
80
90
100
H1 H2
LCA ofmodernthecates
H1 H2
LCA ofmodernthecates
BA
C
Fig. 2. Thecal evolution and dinoflagellate paleohistory. (A) Phylogeny-driven model of changes between major modern and fossil (crosses) tabulationaltypes. Gymnodinoid tabulation with numerous small, empty amphiesmal vesicles is ancestral and gave rise to the gonyaulacoid–peridinioid tabulation with afew large, cellulose-rich thecal plates. Suessioid and gymnodinioid tabulations in modern Symbiodiniaceae and Borghiellaceae (asterisk) are derived in-dependently of the standard gymnodiniod and Triassic suessioid tabulations (Suessia), and are characterized by decrease or loss of cellulose content. Pro-rocentroid and dinophysioid tabulations are derived from the gonyaulacoid–peridiniod tabulation (the latter probably via a nannoceratopsioid intermediate).Triassic suessioid and rhaetogonyaulacoid tabulations may represent evolutionary intermediates or independent experiments in thecal plate reduction. (B)Maximum-likelihood phylogeny (IQ-Tree) of 184 eukaryotic GH7 proteins reveals cellulases in athecate dinoflagellates (underlined) and their radiation in thethecate (color-coded). Black rectangles indicate 50% reduction in branch length. Known GH7 cellulases in P. lunula (dCel1) and Lingulodinium polyedrum(dCel2) are shown. Further details are provided in SI Appendix, Fig. S1 and Table S4. (C) Alternative hypotheses (H1 and H2) on the first emergence of tri-aromatic dinosteranes attributable to dinoflagellates or their direct ancestors (H2 is preferred by our data). Relative species numbers of dinoflagellates (a) andacritarchs (b) and percentage of dinosterane-positive samples (c; see ref. 35 for sample data) from the Proterozoic (green), Paleozoic (red), and Mesozoic(blue) are shown together with the predicted emergence of the last common ancestor (LCA) of modern thecates. Reprinted with permission from refs. 26, 28(www.sciencedirect.com/science/journal/14344610), 35 (permission conveyed through Copyright Clearance Center, Inc.), and 74.
4 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1614842114 Janou�skovec et al.
paralogs are derived by multiple rounds of duplication followed byselective lineage sorting. The branching pattern is poorly resolved,but indicates a common origin for most thecate GH7 proteinstogether with sequences from the athecate Karenia brevis andA. carterae and algae Bigelowiella natans and Thalassiosira oceanica(the latter two are nested within dinoflagellates and were pre-sumably spread horizontally). Some duplications in the thecateGH7 occurred at the level of genera or orders, but at least eightand possibly twice as many paralogs apparently originated earlier(SI Appendix, Fig. S1)—presumably in the common ancestor ofall thecates. These observations suggest that the radiation ofGH7 genes in thecate dinoflagellates is linked to the evolu-tionary origin and subsequent evolution of the theca. The GH7protein identified in K. brevis (SI Appendix, Table S4) likelycorresponds to the dCel1 homolog previously immunolocalized inthe cell cortex (31). Interestingly, A. sanguinea, the likely sistergroup of thecate dinoflagellates, is immunopositive for that sameprotein (31), although the corresponding GH7 sequence remainsunknown (our mixed transcriptome of Akashiwo cells infectedby Amoebophrya sp. lacks it). The function of GH7 enzymes inathecate species has not been studied, but they are likely involvedin the metabolism of cellulose or related polysaccharides, whichmay have been an important precondition for the acquisition ofthe cellulosic thecal plates. Unlike cellulose breakdown, cellulosebiosynthesis in dinoflagellates is not understood at the molecularlevel (34). We identified three types of algal cellulose synthase(CESA-like) homologs in thecate and athecate dinoflagellates,candidates for elucidating their cellulose biosynthesis (SI Appendix,Table S4).Dinosterol is absent in deep-branching dinoflagellates. The diversity andabundance of dinoflagellates in Mesozoic and younger sedimentscorrelates with levels of triaromatic dinosterols, derivatives of thefossilizing biomarker 4-methyl sterol, dinosterol (4α, 23, 24R-tri-methyl-5α-cholest-22E-en-3β-ol) (15, 35). Dinosteranes also occurin Late Proterozoic and early Paleozoic sediments that are oftenenriched with acritarchs (microfossils of uncertain origin, some ofwhich have been speculatively attributed to dinoflagellates or theirdirect ancestors), and this has led to the proposal that dinofla-gellates are ancient and acquired dinosterol biosynthesis early intheir evolution (35–37). We compared this hypothesis (Fig. 2C,H1) to a Mesozoic origin of the dinoflagellate dinosterol (Fig. 2C,H2) by mapping sterol distribution onto our updated phylogeny ofdinoflagellates (Fig. 1). Dinosterol and other 4-methyl sterols areabsent from all dinoflagellate relatives with known sterol profiles,including ciliates, perkinsids, apicomplexans, Chromera, andVitrella, but also Oxyrrhis (38) and Amoebophrya, which likely onlyacquires 4-methyl sterols from its host (39, 40). In core dinofla-gellates, 4-methyl sterols are ubiquitous, but dinosterol itself is ab-sent in three of their earliest branches: Noctiluca, Amphidinium, andthe Kareniaceae (e.g., refs. 41–43). Gyrodinium dominans, likelyanother early core dinoflagellate (14), also lacks dinosterol (38).This suggests that dinosterol appeared first in the last commonancestor of Gymnodiniaceae s.s., Akashiwo, and thecate dinofla-gellates (although broader testing for its presence in early-branchingdinoflagellates is needed). We suggest that pre-Mesozoic dinoster-anes are unlikely to originate from dinoflagellates for four reasons.First, dinosteranes from the Late Proterozoic and early Paleozoicgreatly predate unambiguous dinoflagellate fossils, and dinosterolpresence in modern species is restricted to close relatives of thethecates (Fig. 1), which originated in the early Mesozoic. Second,Paleozoic acritarch microfossils bear no demonstrable morpholog-ical similarity to dinoflagellates (26). Third, dinosteranes prevalencein Paleozoic and Proterozoic samples is highly variable comparedwith Mesozoic samples (35). They seem to be entirely absentfrom the Carboniferous and Permian (35), a discontinuity thatcontrasts with their almost universal preservation in Mesozoicand younger sediments and species. Finally, small amounts ofdinosterol are known from a modern species of diatom (44), and
traces of dinosteranes are also present in Archean bitumens, wheredinoflagellates could not have possibly existed (45). All this suggeststhat different organisms in different geological eras evolved dinos-terol biosynthesis independently of dinoflagellates and that dinos-terol production by certain acritarchs ended with their mid-Paleozoicextinction. We also note that the phylogenetic distance betweenthe origin of dinosterol-producing athecates and the origin ofmodern thecate dinoflagellates (see Fig. 1) is consistent with thetime lapse between the Early Triassic dinosterane increase and theappearance of modern thecate orders in the Early Jurassic sedi-ments (Fig. 2C). We therefore suggest that abundant dinosteranesin some Scythian (Early Triassic) sediments predating the earliestthecate fossils (Middle Triassic) (35) are derived from athecatedinoflagellates alone, which gained the ability to produce dinosterolsnear the Permian/Triassic boundary and became abundant shortlyafter it (Fig. 2C, H2).
Plastid Metabolism and Dependency.Plastid metabolism in nonphotosynthetic dinoflagellates. Approximatelyhalf of the described dinoflagellate species are nonphotosyntheticand are traditionally considered to lack plastids. The other halfcontains a photosynthetic peridinin-pigmented plastid that, insome lineages, has been replaced by other types of plastids. Theperidinin plastid was inherited from the plastid in the commonphotosynthetic ancestor of dinoflagellates and apicomplexans (46,47), but whether cryptic, nonpigmented plastids have been retainedin nonphotosynthetic dinoflagellates remains contentious: Cryp-thecodinium and Oxyrrhis appear to contain plastid-derived genes(48, 49), whereas Hematodinium lacks all traces of the organelle(50). We investigated whether plastid and cytosolic pathways forisoprenoid, tetrapyrrole, and fatty acid biosynthesis were present intwo distantly related nonphotosynthetic dinoflagellates, N. scintil-lans and O. marina, as well as in Dinophysis acuminata, a fun-damentally nonphotosynthetic species that nevertheless carrieskleptoplastids. For each metabolic enzyme in these pathways, weelaborated a single protein phylogeny and classified its origin asplastidic (in a clade with photosynthetic eukaryotes only), cytosolic(in a clade containing heterotrophic eukaryotes), or bacterial (in aclade with bacteria, putative recent horizontal transfer), a meth-odology informed by published localizations in model eukaryotes(e.g., ref. 51) and by in silico targeting predictions in selectedproteins (Fig. 3 and SI Appendix, SI Materials and Methods).All three investigated dinoflagellates contain an isoprenoid
pathway of plastid origin (all seven enzymes are present in Noc-tiluca and Dinophysis) and lack the cytosolic pathway variant (Fig.3A), This is exemplified by their retention of cyanobacterial IspCenzymes (Fig. 3B), which branch among orthologs from pho-tosynthetic dinoflagellates and other algae. Similarly, all threenonphotosynthetic dinoflagellates contain multiple components ofthe plastid tetrapyrrole pathway (an essentially complete enzymeset is present in Noctiluca and Dinophysis), but only two to threecomponents of that in mitochondria and the cytosol. Comparingour data to the Symbiodiniumminutum genome, we propose that asingle tetrapyrrole pathway of a predominantly plastid origin thatinitiates from glutamate (Fig. 3A, GTR and GSA) is present in allcore dinoflagellates, a feature typical of eukaryotic plastids [mi-tochondrial aminolevulinic acid synthase (ALA) synthase is pre-sent in the early-branching Hematodinium, Oxyrrhis, and Perkinsus(50)]. None of the three nonphotosynthetic dinoflagellates containproteins for plastid fatty acid biosynthesis, suggesting that thispathway is dispensable in dinoflagellates in the absence of pho-tosynthesis (Fig. 3A; FabI in Dinophysis is unusual; SI Appendix, SIMaterials and Methods). Genes for plastid iron–sulfur cluster as-sembly (SufB, C, D), ferredoxin (Fd) redox system [i.e., Fd NADP+
reductase (FNR)], and triose phosphate membrane translocators(TPTs) are also present in the three species (SI Appendix,Table S5).
Janou�skovec et al. PNAS Early Edition | 5 of 10
EVOLU
TION
PNASPL
US
Plastid protein targeting and genome loss. We further investigated 56protein sequences in Noctiluca, Oxyrrhis, and Dinophysis of aplastidic origin (SI Appendix, Table S5). Most are incomplete, butseven are complete (they contain a partial spliced leader at the 5′terminus of the corresponding transcript), and another 28 carry anextension of more than 50 aa at their N terminus. Proteins fromthe latter two categories were tested for the presence of plastid-targeting peptides in silico, and 17 of them carry bipartite targetingsignatures comprising signal and transit peptides (SI Appendix,Table S6). Thirteen of these contain a phenylalanine at or near thepredicted signal peptide cleavage site, and three Oxyrrhis proteinscontain a second transmembrane region, all characteristics of
targeting to plastids but not to other subcellular compartments indinoflagellates (52, 53). In silico predictions have limited accuracy,but the consistent presence of N-terminal extensions and signalpeptides in proteins is congruent only with a plastidic origin. Forexample, cyanobacterial Fds in Noctiluca and Dinophysis with fourconserved cysteine residues required for Fe-S formation containN-terminal extensions with signal and transit peptides for plastidtargeting (truncated in Oxyrrhis; SI Appendix, Fig. S2). Noctilucaand Dinophysis also contain a plastid-targeted Fd NADP+ re-ductase (i.e., FNR; SI Appendix, Tables S5 and S6), suggesting thattheir Fd–FNR redox system might have a similar function to thatin the nonphotosynthetic plastid of Plasmodium (54). SufB and
absentnon-photosynthetic (genome likely absent)non-photosynthetic (genome present)photosynthetic (genome present)
Plastid type:
Metabolic dependency on plastidsA
c=apicomplexans (G)b=myzozoan ancestor
Fatty acid biosynthesiscytosolic
FASI / PKSIplastidic FASII
KS AT DH
ER KR FabD
FabH
FabG
FabZ
FabI
FabB
/F
FAAL
ACP
TRD
d=chrompodellids (G)
a=
ACP
mitochondrial/cytosolic C4
plastidic C5Tetrapyrrole biosynthesis
ALAS
ALAD
PBG
DU
ROS
URO
DCP
OXPP
OXFE
CHG
TRG
SA ALAD
PBG
DU
ROS
URO
DCP
OXPP
OXFE
CH
Isoprenoid biosynthesiscytosolic MEV plastidic
MEP/DOXP
HM
GCS
HM
GCR
MVK
PMVK
MVD
DXS
IspC
IspD
IspE
IspF
IspG
IspH
ELO
PHS
KCR
TECR
ERelongation
a b
Met
. upt
ake
c
d
dependencyon plastidisoprenoids
?
Perkinsus marinus (G)
Protein origin by phylogeny:plastidiccytosolic
not identified (in genomes)bacterial
Dependency:
plastid isoprenoids &tetrapyrroles
parasite hostplastid isoprenoids only
Symbiodinium minutum(G)
Lingulodinium polyedrum(T) *
*
* *
* *
*
*
fNoctiluca scintillans (T) *
Hematodinium sp. (T)
e
e=core dinoflagellates ** *( )
Oxyrrhis marina (T)f=dinoflagellates **? ?
*
?
?plastid
loss
plastid loss *
Other:
domain fusion (order may differ)
enzyme variant presentvariant uncertain
in some species ( ) predicted*
?*
Dinophysis acuminata (T) *
*
*
Met
. upt
ake
*
parasitic descendants: free-living descendants:ALL dependent on plastidsheterotrophic mixotrophic with a new
endosymbiontmixotrophic
& phototrophicheterotrophic with
kleptoplastidyNoctiluca, OxyrrhisCrypthecodinium
Voromonas, ColpodellaDinophysis Durinskia
KryptoperidiniumKarenia
KarlodiniumHeterocapsa
LingulodiniumChromera
heterotrophicancestor
DPlastid dependency
CCore metabolism in non-photosynthetic plastids
Plasmodium (P), Toxoplasma (P) Eimeria (P), Alphamonas
Noctiluca, Oxyrrhis, DinophysisCrypthecodinium Voromonas, Colpodella
Perkinsus (P)Theileria (P), Babesia (P)
Fatty acid synthesis
Tetrapyrrole biosynthesis
Isoprenoid unit biosynthesis
Ferredoxin redox system(Fdx/FNR)
Fe-S assembly (Suf)
Met
. upt
ake
d ddnon-photosynthetic
plastid
photosyntheticendosymbiont / plastid d
new photosynthetic endosymbiont / plastid
d d kleptoplastid
metabolicfunction
SOME dependent on plastids
* *
early photosynthe-tic ancestor
d
PerkinsusToxoplasmaPlasmodium
HematodiniumAmoebophrya
Cryptosporidium
d
host d
earlyphotosynthetic
ancestor
dependencyprotistcell
B Plastid IspCphylogeny
Galdieria sulphuraria
Oxyrrhis marina MMETSP468Oxyrrhis marina CCMP1378
Togula jolla
Protoceratium reticulatumAlexandrium catenella
Polarella glacialis
Karlodinium veneficumKarenia brevis
Heterocapsa rotundata
Pyrocystis lunula
Symbiodinium minutum
Scrippsiella trochoideaCrypthecodinium cohnii
DINO-FLAGELLATES
0.2
OTHERPLASTIDS
Dinophysis acuminata
Apicomplexans &Chrompodellids (n=6)
Perkinsus marinus
Noctiluca scintillansGymnodinium catenatumProrocentrum minimum
Kryptoperidinium foliaceum CCMP1326Durinskia baltica
Kryptoperidinium foliaceum CCAP1161/3
Haptophytes (n=3)Aureococcus anophagefferens
Diatoms (n=2)
Cyanidioschyzon merolae
Red algae (n=3)
Ectocarpus siliculosusGuillardia theta
CYANOBACTERIA (n=17)
Viridiplantae (n=7)Glaucophytes (n=2)
98 99
99
99
99
98
97
9797
93
92
92
91
9899
6584
60
60
57
80
878775
77
75
71
Fig. 3. Plastid metabolism and dependency in nonphotosyntetic dinoflagellates. (A) Phylogeny-driven reconstruction of plastid and nonplastid variants ofcore metabolism (isoprenoid, tetrapyrrole, and fatty acid biosynthesis) in genomes (marked as “G”) or transcriptomes (“T”) of dinoflagellates and relatives.Individual enzymes (SI Appendix, Table S5) were classified by protein phylogenies and color-coded as to their presence/absence and origin. The data suggestthat Oxyrrhis, Noctiluca, and Dinophysis are metabolically dependent on plastids. Metabolite (Met.) uptake was summarized from the literature. (B) Maxi-mum-likelihood phylogeny (IQ-Tree) reveals IspCs of cyanobacterial origin in nonphotosynthetic dinoflagellates and relatives (bold); ultrafast bootstraps atbranches are shown (>50 shown; ≥95 highlighted; filled circles, 100). (C) Three grades in functional organization of core metabolic pathways in non-photosynthetic plastids in dinoflagellates (blue) and relatives (“P” represents parasites). (D) Model for evolutionary dependency on plastids in dinoflagellatesand relatives, which is applicable to other eukaryotes. Ancestral dependency (marked as “d”) on plastid metabolism (loss of cytosolic isoprenoid biosynthesis;later reinforced by the loss of C4 tetrapyrrole biosynthesis in some taxa) led to retention of plastids in all free-living and many parasitic descendants. Thedependency can be transferred onto a new plastidial symbiont (Kareniaceae) or host organism (in parasites dependent solely on host-derived metabolites);only the latter leads to an outright loss of the plastid.
6 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1614842114 Janou�skovec et al.
ClpC have essential functions in plastids, but, in apicomplexans,they also constitute key barriers to the loss of the plastid ge-nome (47, 55). SufB carries a bipartite plastid-targeting signaturein Oxyrrhis (SI Appendix, Table S6), an apparently incompleteN-terminal extension in Dinophysis, and is encoded on GC-richcontigs in all three species (55–66.7% GC), all typical of a nuclearbut not plastidial localization. Similarly to sufB, all three non-photosynthetic dinoflagellates contain plastid-like clpC fragmentson GC-rich, likely nuclear contigs (SI Appendix, SI Materials andMethods). SufB and clpC are also nucleus-encoded in Perkinsusand Symbiodinium (47, 56), and this indicates that both genes wererelocated from the plastid genome early in their evolution. Be-cause plastids in photosynthetic dinoflagellates encode only pho-tosystem genes (7, 56) and ancestral reconstruction identifies noadditional barriers to genome loss (47), evidence increasingly in-dicates that plastid genomes in nonphotosynthetic dinoflagellatesand Perkinsus were lost with the loss of photosynthesis.Principles of plastid dependency in dinoflagellates and eukaryotes. Noc-tiluca, Oxyrrhis, and Dinophysis are metabolically dependent oncryptic plastids for the biosynthesis of isoprenoid units, and Noc-tiluca and Dinophysis for tetrapyrroles; evidence for this are mul-tiple proteins in pathways of plastidial origin (as determined byphylogenies), presequences for plastid targeting, the absence ofcytosolic pathway variants, and plastid localization of homologs inmodel species (Fig. 3 and SI Appendix, Tables S5 and S6). A fullrelocalization of either pathway to the cytosol is unprecedented inany organism, and the dependency on plastid pathways is sup-ported by the fact that we obtain similar results from threedistantly related heterotrophs and also from closely relatedphototrophs, one of which has genome data available (9). Addi-tional plastid pathways—Fd redox system and Fe-S assembly—arepresent in Noctiluca, Oxyrrhis, and Dinophysis; these are essentialfor the function of the plastid but not for the host cell. Metabolismof amino acids remains insufficiently known in dinoflagellates andis absent in the plastids of apicomplexans (51). A comparison ofnonphotosynthetic plastids in the broader group (Fig. 3C) revealsthree functional grades in core metabolism that reflect dispens-ability of individual pathways: the biosynthesis of isoprenoid units(and required cofactors) is ubiquitous and is the only core plastidpathway in piroplasmid apicomplexans and Perkinsus, whereasplastid fatty acid biosynthesis was retained only in apicomplexansand Alphamonas (47) (Fig. 3 A and C).The pattern of plastid dependency in dinoflagellates parallels
that in apicomplexans and chrompodellids [chromerids and col-podellids (47)] and reinforces conclusions that their common an-cestor had a plastid (46) and was reliant on it for isoprenoid unitsafter it lost the capability to synthesize them in the cytosol (47).Despite rare secondary losses of plastids in certain parasites (50,57) and ongoing uncertainties about plastid presence in someorganisms (e.g., gregarines, Psammosa, Eudubosquella), plastidsare indispensable in all free-living members of this group yet ex-amined (Fig. 3A) (47, 48), including multiple uncultured forms(58). This pattern suggests that the metabolic dependency onplastids in free-living species cannot be bypassed by obtaining therelevant compounds from the environment or ingested prey (Fig.3D). Rather, it has only increased with time as redundant cytosolicand mitochondrial pathways continue being lost (Fig. 3A) (47).For example, the loss of mitochondrial delta-ALA in core dino-flagellates has extended their plastid dependency to tetrapyrrolebiosynthesis (Fig. 3A), much like in apicomplexans and chrom-podellids (47). Most parasites retain plastids (59, 60), but theirdependency on the organelle can be reduced or bypassed com-pletely by the uptake of host metabolites (Fig. 3A) (57)—searchesin transcriptomic data indicate that Amoebophrya parasites lackthe plastid, which was likely lost in their common ancestor withHematodinium (50). Based on these patterns, we suggest that allfree-living (but not all parasitic) dinoflagellates rely on plastidorganelles that are derived from the ancestral peridinin plastid
(Fig. 3D). These include phagotrophs (Noctiluca), osmotrophs(Crypthecodinium), and species with kleptoplastidy (Dinophysis)and new endosymbionts (Durinskia) except where these endo-symbionts have substituted metabolite dependency on the ances-tral plastid (likely in the Kareniaceae; Fig. 3D). This provides abroad rationale for why dinoflagellates with diatom endosymbi-onts contain two types of plastid isoprenoid and tetrapyrrolepathways (61) (Fig. 3 B and D). It also explains why Dinophysiscontains a plastid Fd and TPT of a dinoflagellate ancestry (62):both proteins contain bipartite targeting presequences with sig-nal peptides (SI Appendix, Fig. S2 and Table S6; the latter wastruncated in ref. 62), suggesting they are targeted into a crypticthree-membrane plastid (Fig. 3D), not the kleptoplastid as pre-viously argued (62). Finally, we emphasize that a complete lossof a plastid organelle has never been confirmed in free-livingeukaryotes, and we posit that this would be hard to achieve inestablished endosymbioses given the dependency patterns thatexist in free-living dinoflagellates and related organisms (Fig. 3 Aand D) (47).Plastid tetrapyrroles and the evolution of bioluminescence. Several speciesof dinoflagellates are bioluminescent (63). In the photosyntheticspecies Pyrocystis lunula, the light-emitting compound luciferin hasan open tetrapyrrole structure thought to be synthesized from thestructurally similar chlorophyll a (64): the organism incorporatesradioactively labeled chlorophyll precursors into chlorophyll andluciferin, suggesting that their biosynthesis is linked (65). How-ever, other bioluminescent dinoflagellates like Noctiluca, Proto-peridinium, and certain Polykrikos species are nonphotosynthetic(63) and not known to synthesize chlorophyll. The prediction thatthey acquire chlorophyll from their prey (66) is inconsistent withprey-independent bioluminescence in at least one of them, Pro-toperidinium crassipes (67). Our finding of the plastid tetrapyrrolepathway in Noctiluca, which also leads to the precursors of chlo-rophyll, offers an alternative explanation of luciferin presence: itmay be obtained by biosynthesis rather than scavenging, at least insome species. The plastid tetrapyrrole pathway is apparently in-dispensable as a key requirement for heme synthesis in all coredinoflagellates (Fig. 3A), and could therefore account for luciferinproduction in any bioluminescent dinoflagellate, irrespective of thepresence of photosynthesis. This biosynthesis scenario also opensthe possibility that luciferin is not derived via chlorophyll per se,but via an earlier intermediate in its biogenesis, perhaps a chlor-ophyllide or chlorine-like tetrapyrrole. Although this remains tobe tested experimentally, our finding of the plastid tetrapyrrolepathway supports the possibility that bioluminescence in non-photosynthetic dinoflagellates relies on a biosynthetic machin-ery repurposed from heme and chlorophyll production.
Character Evolution in Dinoflagellates.Nuclear evolution: Stepwise horizontal gene gain. Dinoflagellates haveunique nuclei that have lost bulk nucleosomal DNA packaging,and instead condense DNA by using two types of basic proteinsthat are different from histones. Dinoflagellate/viral nucleoproteins(DVNPs) are similar to uncharacterized proteins from phycodna-viruses, are distributed in all dinoflagellates yet examined, andrepresent a family of basic proteins with high DNA-binding affinity(4). In contrast, dinoflagellate histone-like proteins (HLPs) are ofbacterial origin and have been found only in certain core di-noflagellate species; they are primarily detected at the chromosomeperiphery, where they are predicted to organize extended DNAloops during transcription (68). We identified DVNPs in all tran-scriptomes in our dataset, confirming their ubiquitous distributionamong dinoflagellates. Our searches also confirm that HLPsare absent in all early-branching taxa (Oxyrrhis, Hematodinium,and Amoebophrya spp.) and are ubiquitous in core dinoflagel-lates. Unexpectedly, however, we found that HLPs in Noctiluca,Amphidinium, Togula, and Gymnodinium are dissimilar in se-quence to HLPs in other dinoflagellates despite their similar length
Janou�skovec et al. PNAS Early Edition | 7 of 10
EVOLU
TION
PNASPL
US
and structure (SI Appendix, Table S4). We reconstructed the phy-logeny of dinoflagellate HLPs together with a representative se-lection of their closest orthologs, the bacterial HU-like proteins(HLPs in other eukaryotes are not closely related to those in di-noflagellates). The outcome confirms a wide separation betweenthe dinoflagellate type known previously, HLP-I [e.g., HCc3 inCrypthecodinium (68)], and the HLP-II (Fig. 4 and SI Appendix, Fig.S3). Interestingly, HLP-I and HLP-II have mutually exclusive dis-tributions (Fig. 4 and SI Appendix, Fig. S3), which suggest thatHLP-II rather than HLP-I was ancestral to core dinoflagellates.HLP-I most likely appeared in the ancestor of Kareniaceae andother core dinoflagellates (it temporarily coexisted with HLP-Ifollowed by selective loss or spread horizontally later between theKareniaceae and thecates and Akashiwo; Fig. 5). Because HLP-Iand HLP-II are monophyletic but not closely related to eachother, and HU-like proteins are present in a wide range of bac-terial phyla, the dinoflagellate HLPs are likely derived from HU-
like proteins and not vice versa (this is in contrast to DVNPs, inwhich the direction of transfer with phycodnaviruses cannot beestablished). The unique molecular architecture of dinoflagellatenuclei thus resulted from at least three independent waves ofprotein gain (Fig. 5). The recruitment of DVNPs took place in thegroup’s ancestor, leading to a decrease in the nuclear protein:DNA ratio and potentially the loss of bulk nucleosomal packagingand increase in the genome size in dinoflagellates. HLPs wereacquired later than DVNPs by at least two independent horizontaltransfers from different bacterial donors. The initial gain of HLPsin the ancestor core dinoflagellates coincided with the emergenceof liquid crystalline chromosomes with arched DNA fibrils, whichare condensed permanently in most species.Organelle evolution: Plastid reduction and mitochondrial cox3 split. Evi-dence of a dependency on plastids in nonphotosynthetic dinofla-gellates (Fig. 3) corroborates earlier conclusions that the commonancestor of dinoflagellates and apicomplexans was photosynthetic(46) and dependent on plastid-generated isoprenoids (47). Ourphylogeny also supports the prediction that more than a dozendescendant lineages of this dinoflagellate–apicomplexan ancestorhave lost photosynthesis (46, 69). At least two parasites, Crypto-sporidium and Hematodinium, have lost the plastid outright, butthis is not the case in other parasites and in any free-living lineagesthat have been investigated with sufficient detail (six independenttransitions to heterotrophy). We thus posit that plastid loss in di-noflagellates and apicomplexans is less frequent than their re-tention after the loss of photosynthesis, and is limited to a fewparasites (47). After the split with apicomplexans but at least by thetime Amphidinium diverged, the dinoflagellate plastid acquired thephotosynthetic carotenoid peridinin, peridin–chlorophyll bindingproteins, and a reduced, minicircular genome (6). Our resultssuggest that during this transition the plastid sufB and clpC genes(key barriers to plastid genome loss in apicomplexans) were relo-cated to the nucleus in dinoflagellates. This made the dinoflagel-late plastid genome dispensable in the absence of photosynthesis,likely explaining why all heterotrophic representatives studied todate appear to lack it. In at least four distantly related photosynthetic
HCc3
Dino agellate HLP-II
Dino agellate HLP-I
Bacterial HU-like
Bacterial HU-like
7684
6959
0.2
Togula jollaGymnodinium catenatumAmphidinium carteraeNoctiluca scintillans
all other coredino agellates
reference sequence(NCBI): ACJ04919
reference sequence(NCBI): AAM97522
Fig. 4. Evolution of histone-like proteins. Phylogeny of bacterial (HU-like)and dinoflagellate HLPs reveals a dinoflagellate-type histone-like protein,HLP-II, in early-branching core dinoflagellates. HLP-II has a mutually exclusivedistribution with HLP-I (e.g., the characterized HCc3 in C. cohnii, in bold).Further details are provided in SI Appendix, Fig. S3 and Table S4.
Apic
ompl
exa
and
Chro
mpo
delli
ds
Colp
onem
ids
and
Cilia
tes
Perk
inso
zoa
Oxy
rrhi
daSy
ndin
iace
ae a
nd
Amoe
boph
ryac
eae
Noc
tiluc
ales
Amph
idin
ium
sens
u st
ricto
Kare
niac
eae
Akas
hiw
o
Din
ophy
sial
es
Pror
ocen
tral
es
Gon
yaul
acal
es
Sues
sial
es
Plastid presence + oligo U-tailing in plastid mRNAs & RuBisCO type II
HLP-II
Theca & multiple paralogs of GH7 cellulases
Theca (cellulosic cell wall)
Spliced leader trans-splicing in mRNAs
Spliced leader trans-splicing
4-methyl sterols
Dinosterol
DVNPs + Loss of bulk nucleosomal DNA packaging, decrease in protein:DNA ratio, increase in genome size
HLP-I (earliest emergence)
Perid
inia
les
incl
. H
eter
ocap
sace
ae
Liquid crystalline interphase-condensed chromosomes with arched DNA fibrills in at least one life cycle stage
Gymnodiniales
Gym
nodi
niac
eae
s.s.
and
Tog
ula
CORE DINOFLAGELLATES
Cingulum & sulcus: Shallow , True Dinoflagellate/viral nucleoproteins (DVNPs)
Cellulase GH7 expansion
Mitochondrial cox3 split & trans-splicingMinicircular,highly reducedplastid DNAPeridinin &peridinin-chlorophyll a-binding proteins
Minicircular plastid DNA, Peridinin & PCPs
Striated strand & wave on transversal flagellum
Shallow cingulum & sulcus, discernible epi- & hyposome
True cing. & sulcus
Histone-like proteins: HLP-I , HLP-II 4-methyl sterols , +Dinosterol
Striated strand & wave on transv. flagellum
Liquid crystalline chromosomes throughout life cycle
Liquid cryst. chr.: temporary , permanent
present (different colors)absentnot applicable
DINOFLAGELLATES
Mitochondrial cox3 split and trans-splicing
Fig. 5. Model for character evolution in dinofla-gellates. Ancestral character states (filled circles) ofconserved traits are reconstructed on the consensusphylogeny of dinoflagellates and their relatives byparsimony (arrowheads). Dotted branches in thethecate lineages indicate uncertain placement. Gapsindicate missing data, and “not applicable” denotesplastid genome absence or the presence of a differ-ent plastid genome type (Kareniaceae). The verticalsquare bracket indicates an evolutionary range inwhich traits emerged. Photos of dinoflagellates (byG. S. Gavelis), left to right: Kofoidinium sp. (Noctilu-cales), Nematodinium sp. (Gymnodiniaceae s.s.), Neo-ceratium praelongum (Gonyaulacales),Dinophysis miles(Dinophysiales), and Heterocapsa sp. (Peridiniales).
8 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1614842114 Janou�skovec et al.
dinoflagellates, the expression of plastid genes is accompanied bysubstitutional editing of corresponding mRNAs (Fig. 5) (7). Theorigin of plastid editing is, however, uncertain: it appeared sometime after the divergence of apicomplexans and chrompodellids(70) and possibly became more widespread after the divergence ofAmphidinium (71), but pinpointing its origin more precisely willrequire an analysis on deep-branching photosynthetic dinoflagel-lates such as Spatulodinium pseudonoctiluca (13).The mtDNA in at least five lineages of core dinoflagellates
including Amphidinium contains a unique feature: cox3 is split inthe same region into two fragments that are trans-spliced at theRNA level (8, 72). The split is absent inHematodinium and earlierdiverging species, but, to our knowledge, its presence in theNoctilucales was not known until now (Fig. 5). We identified acox3 contig in the Noctiluca transcriptome corresponding to a full-length protein (terminated by a canonical stop codon rare in thegroup; SI Appendix, SI Materials and Methods). Mapping individualRNA read pairs onto the contig demonstrated continuous tran-scription across the split region and provides no support for theexistence of two transcripts and their trans-splicing. PCR ampli-fication by using Noctiluca genomic DNA as a template produceda single product spanning both sides of the cox3 split, the identityof which was confirmed by sequencing (SI Appendix, SI Mate-rials and Methods). Because the phylogenetic distribution andthe unique character of the cox3 split are indicative of a singleevolutionary origin, the uninterrupted cox3 in Noctiluca corrobo-rates the early position of the Noctilucales among core dinofla-gellates (Fig. 1).Character map: Framework for evolutionary and functional predictions. Byusing parsimony, we reconstructed ancestral character states ofmajor conserved morphological and molecular traits at differentpoints of the dinoflagellate phylogeny (Fig. 5 and SI Appendix, SIMaterials and Methods). Newly mapped transitions include thegain of 4-methyl sterols, dinosterol, nuclear HLP-I and II, themitochondrial cox3 split, the theca, and the gain of multipleparalogs of GH7 cellulases. Two additional transitions map at thecommon ancestor of Amphidinium and later-diverging taxa: thegain of condensed liquid crystalline chromosomes throughoutthe life cycle and the gain of a proteinaceous striated rod in thetransverse flagellum, which produces a strongly pronounced fla-gellar wave (Fig. 5). The corresponding characteristics in theNoctilucales are little understood as yet—chromosomes in one oftheir life stages, the trophont, are relaxed and the transversalflagellum in their gametes is trailing, wave-less, and contains only athin filament in place of the striated rod (73). Detailed analysis isrequired to determine whether these states represent true evolu-tionarily intermediates or secondary modifications associated withthe unusual morphology of this order. The origin of other di-noflagellate characteristics was established previously and isreinforced within our framework: gain of plastids, RuBisCO formII and oligoU-tailing in plastid mRNAs before the split with api-complexans (46), and the acquisition of spliced leader trans-splicing of mRNAs in their common ancestor with perkinsids (Fig.5). DVNPs, ubiquitous in the species in our dataset, are ancestralto dinoflagellates and associated with changes in protein:DNAratio and genome size. The ancestor of syndinians and core di-noflagellates had a life stage with a shallow sulcus and cingulum(flagellar grooves), the latter dividing the cell into an upper epi-some and a lower hyposome, a transitional morphology between
short flagellar grooves in Oxyrrhis and Psammosa and deeply en-graved perpendicular flagellar grooves in core dinoflagellates (Fig.5). Altogether, most transitions map to the branch correspondingto the ancestor of core dinoflagellates, but other characteristicsare scattered widely along the evolutionary backbone (Fig. 5).Thus, the ecological success of dinoflagellates has resulted from aseries of independent changes to the morphology, metabolism,and molecular biology of their ancestors.
ConclusionsWe used sequence data to illuminate dinoflagellate biology andevolution. Evidence from our multiprotein phylogenies resolvesnumerous issues relating to dinoflagellate relationships, providesstrong support for the single origin of the theca, and helps rec-oncile several apparent contradictions in dinoflagellate fossil,biogeochemical, and molecular data (Figs. 1 and 2). The originof the theca coincides with a radiation of cell wall-localizedcellulases involved in cell division (Fig. 2B). Plastid biosyntheticpathways exist in the nonphotosynthetic Noctiluca, Oxyrrhis, andDinophysis, and cytosolic pathway variants do not (Fig. 3). Thissuggests that all free-living dinoflagellates are metabolically de-pendent on plastids that have taken over important cellularfunctions, apparently early in the evolution of the group; plas-tidial tetrapyrrole biosynthesis may also explain the existence ofbioluminescent luciferin in nonpigmented dinoflagellates. Theorigin of the liquid crystalline nuclei coincides with the acquisi-tion of bacterial histone-like proteins, which occurred in twodistinct evolutionary phases (Fig. 4), suggesting that horizontalgene transfers were the ultimate origin of key dinoflagellatefeatures. By producing a map of the major transitions in theevolutionary history of dinoflagellates (Fig. 5), we provide apredictional framework that will facilitate the investigation ofmany aspects of the group’s cell biology (nuclear organization,plastid evolution), molecular biology, and paleobiology.
Materials and MethodsRNAwasextractedbyRNAqueous kit or TRIzol Plus RNAkit. Paired-end50-bp or100-bp Illumina sequence reads were generated and assembled in Trinityversion 2 or as part of the Marine Microbial Eukaryote Transcriptome Se-quencing Project pipeline (19). Phylogenetic matrices were prepared fromalignments in MAFFT version 7.215 stripped of hypervariable sites in BlockMapping and Gathering with Entropy version 1.1. Phylogenies were computedin IQ-Tree (1,000 ultrafast bootstraps), RAxML version 8 (300 nonparametricbootstraps), and Phylobayes (where applicable). Plastid targeting signals wereanalyzed in SignalP 4.1 (D-score cutoff 0.45) and ChloroP 1.1 at 0.45 cTP-scorecutoff. Species culturing and sequencing, phylogenetic inferences, and analy-ses of plastid metabolism and protein targeting are detailed in SI Appendix, SIMaterials and Methods.
ACKNOWLEDGMENTS. We thank Bill MacMillan for technical support andPatrick Keeling for facilities and support. This work was supported by aUniversity College London Excellence Fellowship (to J.J.), a CIFAR Global ScholarFellowship (to J.J.), a University of British Columbia Four-Year PhD Fellowship (toJ.J.), Gordon and Betty Moore Foundation Grant 2637 to the National Center forGenome Resources (NCGR), National Science and Engineering Research Councilof Canada Grant NSERC 2014-05258 (to B.S.L. and G.S.G.), a Tula FoundationGrant to Patrick Keeling (F.B.), the Centre for Microbial Biodiversity andEvolution (F.B.), Australian Research Council Grant DP1093395 (to S.G.G.),Science Foundation Ireland Grant 13/SIRG/2125 (to S.G.G.), NSF Grant EF-0629625 (to C.F.D. and T.R.B.), and Canadian Institute for Health ResearchGrant MOP-42517 to Patrick Keeling. MMETSP samples were sequenced,assembled, and annotated at NCGR. This is ESS contribution no. 20160099.
1. Gómez F (2012) A quantitative review of the lifestyle, habitat and trophic diversity ofdinoflagellates (Dinoflagellata, Alveolata). Syst Biodivers 10(3):267–275.
2. de Vargas C, et al.; Tara Oceans Coordinators (2015) Ocean plankton.Eukaryoticplankton diversity in the sunlit ocean. Science 348(6237):1261605.
3. Velo-Suárez L, BrosnahanML, Anderson DM, McGillicuddy DJ, Jr (2013) A quantitativeassessment of the role of the parasite Amoebophrya in the termination of Alexan-drium fundyense blooms within a small coastal embayment. PLoS One 8(12):e81150.
4. Gornik SG, et al. (2012) Loss of nucleosomal DNA condensation coincides with ap-pearance of a novel nuclear protein in dinoflagellates. Curr Biol 22(24):2303–2312.
5. Wong JTY, New DC, Wong JCW, Hung VKL (2003) Histone-like proteins of the di-noflagellate Crypthecodinium cohnii have homologies to bacterial DNA-bindingproteins. Eukaryot Cell 2(3):646–650.
6. Zhang Z, Green BR, Cavalier-Smith T (1999) Single gene circles in dinoflagellatechloroplast genomes. Nature 400(6740):155–159.
7. Wang Y, Morse D (2006) Rampant polyuridylylation of plastid gene transcripts in thedinoflagellate Lingulodinium. Nucleic Acids Res 34(2):613–619.
8. Nash EA, Nisbet RER, Barbrook AC, Howe CJ (2008) Dinoflagellates: A mitochondrialgenome all at sea. Trends Genet 24(7):328–335.
Janou�skovec et al. PNAS Early Edition | 9 of 10
EVOLU
TION
PNASPL
US
9. Shoguchi E, et al. (2013) Draft assembly of the Symbiodinium minutum nuclear ge-nome reveals dinoflagellate gene structure. Curr Biol 23(15):1399–1408.
10. Lin S, et al. (2015) The Symbiodinium kawagutii genome illuminates dinoflagellategene expression and coral symbiosis. Science 350(6261):691–694.
11. Bachvaroff TR, et al. (2014) Dinoflagellate phylogeny revisited: Using ribosomal proteinsto resolve deep branching dinoflagellate clades. Mol Phylogenet Evol 70:314–322.
12. Hoppenrath M, Leander BS (2010) Dinoflagellate phylogeny as inferred from heatshock protein 90 and ribosomal gene sequences. PLoS One 5(10):e13220.
13. Gómez F, Moreira D, López-García P (2010) Molecular phylogeny of noctilucoid di-noflagellates (Noctilucales, Dinophyceae). Protist 161(3):466–478.
14. Orr RJS, Murray SA, Stüken A, Rhodes L, Jakobsen KS (2012) When naked becamearmored: An eight-gene phylogeny reveals monophyletic origin of theca in dinofla-gellates. PLoS One 7(11):e50004.
15. Fensome RA, Saldarriaga JF, Taylor “Max” FJR (1999) Dinoflagellate phylogeny revisited:Reconciling morphological and molecular based phylogenies. Grana 38(2):66–80.
16. Saldarriaga JF, Taylor “Max” FJR, Cavalier-Smith T, Menden-Deuer S, Keeling PJ (2004)Molecular data and the evolutionary history of dinoflagellates. Eur J Protistol 40(1):85–111.
17. Imanian B, Keeling PJ (2014) Horizontal gene transfer and redundancy of tryptophanbiosynthetic enzymes in dinotoms. Genome Biol Evol 6(2):333–343.
18. Gavelis GS, White RA, Suttle CA, Keeling PJ, Leander BS (2015) Single-cell tran-scriptomics using spliced leader PCR: Evidence for multiple losses of photosynthesis inpolykrikoid dinoflagellates. BMC Genomics 16(1):528.
19. Keeling PJ, et al. (2014) The Marine Microbial Eukaryote Transcriptome SequencingProject (MMETSP): Illuminating the functional diversity of eukaryotic life in theoceans through transcriptome sequencing. PLoS Biol 12(6):e1001889.
20. Burki F, Okamoto N, Pombert J-F, Keeling PJ (2012) The evolutionary history ofhaptophytes and cryptophytes: Phylogenomic evidence for separate origins. Proc BiolSci 279(1736):2246–2254.
21. Saunders GW, Hill DRA, Sexton JP, Andersen RA (1997) Small-subunit ribosomal RNAsequences from selected dinoflagellates: Testing classical evolutionary hypotheses withmolecular systematic methods. Origins of Algae and Their Plastids, ed Bhattacharya D(Springer, Vienna), pp 237–259.
22. Fukuda Y, Endoh H (2008) Phylogenetic analyses of the dinoflagellate Noctilucascintillans based on beta-tubulin and Hsp90 genes. Eur J Protistol 44(1):27–33.
23. Jørgensen MF, Murray S, Daugbjerg N (2004) A new genus of athecate interstitialdinoflagellates, Togula gen. nov., previously encompassed within Amphidinium sensulato: Inferred from light and electron microscopy and phylogenetic analyses of partiallarge subunit ribosomal DNA sequences. Phycol Res 52(3):284–299.
24. Daugbjerg N, Hansen G, Larsen J, Moestrup Ø (2000) Phylogeny of some of the majorgenera of dinoflagellates based on ultrastructure and partial LSU rDNA sequencedata, including the erection of three new genera of unarmoured dinoflagellates.Phycologia 39(4):302–317.
25. Zhang H, Bhattacharya D, Lin S (2007) A three-gene dinoflagellate phylogeny sug-gests monophyly of prorocentrales and a basal position for Amphidinium and Het-erocapsa. J Mol Evol 65(4):463–474.
26. Fensome RA, et al. (1993) A Classification of Living and Fossil Dinoflagellates. Micro-paleontology Special Publication 7 (American Museum of Natural History, New York).
27. Moestrup Ø, Lindberg K, Daugbjerg N (2009) Studies on woloszynskioid dinoflagel-lates IV: The genus Biecheleria gen. nov. Phycol Res 57(3):203–220.
28. Takahashi K, Moestrup Ø, Jordan RW, Iwataki M (2015) Two new freshwater wo-loszynskioids Asulcocephalium miricentonis gen. et sp. nov. and Leiocephaliumpseudosanguineum gen. et sp. nov. (Suessiaceae, Dinophyceae) lacking an apicalfurrow apparatus. Protist 166(6):638–658.
29. Medlin LK, Fensome RA (2013) Dinoflagellate macroevolution: Some considerations basedon an integration of molecular, morphological and fossil evidence. Biological and Geo-logical Perspectives of Dinoflagellates, eds Lewis JM, Marret F, Bradley L. The Micro-palaeontological Society, Special Publications (Geological Society, London), pp 255–266.
30. Fensome RA, MacRae RA, Moldowan JM, Taylor FJR, Williams GL (1996) The earlyMesozoic radiation of dinoflagellates. Paleobiology 22(3):329–338.
31. Bujak JP, Williams GL (1981) The evolution of dinoflagellates. Can J Bot 59(11):2077–2087.32. Hansen G, Daugbjerg N, Henriksen P (2007) Baldinia anauniensis gen. et sp. nov.: A
“new” dinoflagellate from Lake Tovel, N. Italy. Phycologia 46(1):86–108.33. Sekida S, Horiguchi T, Okuda K (2004) Development of thecal plates and pellicle in the
dinoflagellate Scrippsiella hexapraecingula (Peridiniales, Dinophyceae) elucidated bychanges in stainability of the associated membranes. Eur J Phycol 39(1):105–114.
34. Kwok ACM, Wong JTY (2010) The activity of a wall-bound cellulase is required for andis coupled to cell cycle progression in the dinoflagellate Crypthecodinium cohnii.Plant Cell 22(4):1281–1298.
35. Moldowan JM, et al. (1996) Chemostratigraphic reconstruction of biofacies: Molecular evi-dence linking cyst-forming dinoflagellates with pre-Triassic ancestors.Geology 24(2):159–162.
36. Moldowan JM, Talyzina NM (1998) Biogeochemical evidence for dinoflagellate an-cestors in the early cambrian. Science 281(5380):1168–1170.
37. Summons RE, Walter MR (1990) Molecular fossils and microfossils of prokaryotes andprotists from Proterozoic sediments. Am J Sci 290:212–244.
38. Chu F-LE, et al. (2008) Sterol production and phytosterol bioconversion in two species ofheterotrophic protists, Oxyrrhis marina and Gyrodinium dominans. Mar Biol 156(2):155–169.
39. Place AR, Bai X, Kim S, Sengco MR, Wayne Coats D (2009) Dinoflagellate host-parasitesterol profiles dictate karlotoxin sensitivity(1). J Phycol 45(2):375–385.
40. Leblond JD, Sengco MR, Sickman JO, Dahmen JL, Anderson DM (2006) Sterols of thesyndinian dinoflagellate Amoebophrya sp., a parasite of the dinoflagellate Alexan-drium tamarense (Dinophyceae). J Eukaryot Microbiol 53(3):211–216.
41. Teshima SI, Kanazawa A, Tago A (1980) Sterols of the dinoflagellate Noctiluca milialis.Mem Fac Fish Kagoshima Univ 29:319–326.
42. Withers NW, Goad LJ, Goodwin TW (1979) A new sterol, 4α-methyl-5α-ergosta-8(14),24(28)-dien-3β-ol, from the marine dinoflagellate Amphidinium carterae.Phytochemistry 18(5):899–901.
43. Leblond JD, Chapman PJ (2002) A survey of the sterol composition of the marinedinoflagellates Karenia brevis, Karenia mikimotoi, and Karlodinium micrum: Distri-bution of sterols within other members of the class Dinophyceae. J Phycol 38(4):670–682.
44. Volkman JK, Barrett SM, Dunstan GA, Jeffrey SW (1993) Geochemical significance ofthe occurrence of dinosterol and other 4-methyl sterols in a marine diatom. OrgGeochem 20(1):7–15.
45. Brocks JJ, Buick R, Summons RE, Logan GA (2003) A reconstruction of Archean bi-ological diversity based on molecular fossils from the 2.78 to 2.45 billion-year-oldMount Bruce Supergroup, Hamersley Basin, Western Australia. Geochim CosmochimActa 67(22):4321–4335.
46. Janou�skovec J, Horák A, Oborník M, Luke�s J, Keeling PJ (2010) A common red algalorigin of the apicomplexan, dinoflagellate, and heterokont plastids. Proc Natl AcadSci USA 107(24):10949–10954.
47. Janou�skovec J, et al. (2015) Factors mediating plastid dependency and the origins ofparasitism in apicomplexans and their close relatives. Proc Natl Acad Sci USA 112(33):10200–10207.
48. Sanchez-Puerta MV, Lippmeier JC, Apt KE, Delwiche CF (2007) Plastid genes in a non-photosynthetic dinoflagellate. Protist 158(1):105–117.
49. Slamovits CH, Keeling PJ (2008) Plastid-derived genes in the nonphotosynthetic al-veolate Oxyrrhis marina. Mol Biol Evol 25(7):1297–1306.
50. Gornik SG, et al. (2015) Endosymbiosis undone by stepwise elimination of the plastidin a parasitic dinoflagellate. Proc Natl Acad Sci USA 112(18):5767–5772.
51. Seeber F, Soldati-Favre D (2010) Metabolic pathways in the apicoplast of apicom-plexa. Int Rev Cell Mol Biol 281:161–228.
52. Nassoury N, Cappadocia M, Morse D (2003) Plastid ultrastructure defines the proteinimport pathway in dinoflagellates. J Cell Sci 116(pt 14):2867–2874.
53. Patron NJ, Waller RF, Archibald JM, Keeling PJ (2005) Complex protein targeting todinoflagellate plastids. J Mol Biol 348(4):1015–1024.
54. Pandini V, et al. (2002) Ferredoxin-NADP+ reductase and ferredoxin of the protozoanparasite Toxoplasma gondii interact productively in vitro and in vivo. J Biol Chem277(50):48463–48471.
55. Howe CJ, Purton S (2007) The little genome of apicomplexan plastids: Its raison d’etreand a possible explanation for the ‘delayed death’ phenomenon. Protist 158(2):121–133.
56. Mungpakdee S, et al. (2014) Massive gene transfer and extensive RNA editing of asymbiotic dinoflagellate plastid genome. Genome Biol Evol 6(6):1408–1422.
57. Abrahamsen MS, et al. (2004) Complete genome sequence of the apicomplexan,Cryptosporidium parvum. Science 304(5669):441–445.
58. Janou�skovec J, Horák A, Barott KL, Rohwer FL, Keeling PJ (2012) Global analysis of plastiddiversity reveals apicomplexan-related lineages in coral reefs. Curr Biol 22(13):R518–R519.
59. McFadden GI, Reith ME, Munholland J, Lang-Unnasch N (1996) Plastid in humanparasites. Nature 381(6582):482.
60. Matsuzaki M, Kuroiwa H, Kuroiwa T, Kita K, Nozaki H (2008) A cryptic algal groupunveiled: A plastid biosynthesis pathway in the oyster parasite Perkinsus marinus.MolBiol Evol 25(6):1167–1179.
61. Hehenberger E, Imanian B, Burki F, Keeling PJ (2014) Evidence for the retention oftwo evolutionary distinct plastids in dinoflagellates with diatom endosymbionts.Genome Biol Evol 6(9):2321–2334.
62. Wisecaver JH, Hackett JD (2010) Transcriptome analysis reveals nuclear-encodedproteins for the maintenance of temporary plastids in the dinoflagellate Dinophysisacuminata. BMC Genomics 11:366.
63. Marcinko CLJ, Painter SC, Martin AP, Allen JT (2013) A review of the measurementand modelling of dinoflagellate bioluminescence. Prog Oceanogr 109:117–129.
64. Topalov G, Kishi Y (2001) Chlorophyll Catabolism leading to the skeleton of di-noflagellate and Krill luciferins: Hypothesis and model studies. Financial support fromthe National Institutes of Health (NS 12108) is gratefully acknowledged. Angew ChemInt Ed Engl 40(20):3892–3894.
65. Wu C, Akimoto H, Ohmiya Y (2003) Tracer studies on dinoflagellate luciferin with[15N]-glycine and [15N]-l-glutamic acid in the dinoflagellate Pyrocystis lunula.Tetrahedron Lett 44(6):1263–1266.
66. Liu L, Hastings JW (2007) Two different domains of the luciferase gene in the het-erotrophic dinoflagellate Noctiluca scintillans occur as two separate genes in pho-tosynthetic species. Proc Natl Acad Sci USA 104(3):696–701.
67. Yamaguchi A, Horiguchi T (2008) Culture of the heterotrophic dinoflagellate Proto-peridinium crassipes (Dinophyceae) with noncellular food items(1). J Phycol 44(4):1090–1092.
68. Chan Y-H, Wong JTY (2007) Concentration-dependent organization of DNA by thedinoflagellate histone-like protein HCc3. Nucleic Acids Res 35(8):2573–2583.
69. Saldarriaga JF, Taylor FJ, Keeling PJ, Cavalier-Smith T (2001) Dinoflagellate nuclear SSU rRNAphylogeny suggests multiple plastid losses and replacements. J Mol Evol 53(3):204–213.
70. Janou�skovec J, et al. (2013) Split photosystem protein, linear-mapping topology, andgrowth of structural complexity in the plastid genome of Chromera velia. Mol BiolEvol 30(11):2447–2462.
71. Barbrook AC, et al. (2012) Polyuridylylation and processing of transcripts from mul-tiple gene minicircles in chloroplasts of the dinoflagellate Amphidinium carterae.Plant Mol Biol 79(4-5):347–357.
72. Jackson CJ, Waller RF (2013) A widespread and unusual RNA trans-splicing type indinoflagellate mitochondria. PLoS One 8(2):e56777.
73. Soyer M-O (1970) Etude ultrastructurale de l’endoplasme et des vacuoles chez deuxtypes de Dinoflagellés appartenant aux genres Noctiluca (Suriray) et Blastodinium(Chatton). Z Zellforsch Mikrosk Anat 105(3):350–388.
74. Lee SY, et al. (2014) Morphological characterization of Symbiodinium minutum andS. psygmophilum belonging to clade B. Algae 29(4):299–310.
10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1614842114 Janou�skovec et al.
Supporting InformationJanouškovec et al.: Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics
SI Materials and Methods:Dinoflagellate culturing, sequencing, and sequence assembly. Noctiluca scintillans SPMC136 (MMETSP0253) was grown on Prorocentrum micans CCMP691 (primary preferred prey) in filtered (0.2 µm) autoclaved seawater (30 psu) with a dilute trace metal amendment (1). Scaled-up cultures were captured on a 80 µm sieve and maintained on Dunaliella tertiolecta (secondary non-preferred prey) for 25 days to ensure a complete removal of Prorocentrum (which was visually absent by the day 15 following the transfer). Noctiluca cells were then re-captured on the sieve and their total RNA was extracted by using the RNAqueous kit (Ambion). Togula jolla CCCM725 (project MMETSP0224), Protoceratium reticulatum CCCM535 (project MMETSP0228), and Polarella glacialis CCMP1383 (project MMETSP0227) were grown in the natural seawater medium HESNW (2), and their RNA was purified by the TRIzol Plus RNA kit (Thermo Fisher). Sequencing libraries were built and transcriptomic reads generated, processed, and assembled at the National Center for Genome Resourcesas described previously (3). Assembled contigs and predicted proteomes were downloaded from the MMETSP website (http://data.imicrobe.us/project/view/104). A second independent assembly of Noctiluca reads was generated by using Trinity v2.0.6 at default settings; the resulting contigs were found to contain longer 5' regions compared to the MMETSP assembly and were used in the analysis ofN-termini of plastid-targeted proteins. This transcriptomic assembly has been deposited in GenBank (TSA) under the accession GELK00000000. Each Noctiluca protein used in this study (extracted from either of the assemblies) was first screened by BLASTP against the predicted proteome of Prorocentrum minimum CCMP2233 (Table S1), and its affiliation to dinoflagellates was then verified in a Maximum likelihood phylogeny. No sequences of Prorocentrum and rare, well-identifiable sequences of Dunaliella were detected in the assemblies (all assembled nuclear ribosomal RNA contigsbelong to Noctiluca). The transcriptome assembly of Hematodinium sp. was deposited in GenBank (TSA; GEMP00000000. Data from Amphidinium carterae and two Amoebophrya isolates were generated as described in (4, 5). Data from other species were obtained as detailed in Table S1.
Multiprotein phylogenies. Dinoflagellate sequences were added into alignments of conserved proteins that were previously used in eukaryotic phylogenies (6), and those with 30% or less of taxa missing were selected. The alignments were re-aligned by the 'localpair' algorithm in MAFFT v7.215 (7), stripped of hypervariable sites (-b 4 -g 0.4 settings) in BMGE v1.1 (8) and the orthology of sequences within was verified by comparing their RAxML v.8 (9) maximum likelihood phylogenies (LG + Gamma 4 + F model) with known relationships based on published phylogenies. Paralogous, highly divergent or contaminant sequences were identified in several species and removed; where ambiguous, all paralogs for a given species were removed, and where multiple ambiguities were identified, the whole gene alignment was discarded. Single protein alignments were concatenated in Scafos v1.25 (10)by using 'o=gclv gamma=yes l=1 m=1' settings. Chimeric sequences were created for species where overlapping fragments or non-overlapping fragments of a congruent phylogenetic position were recovered (Table S1). A total of 12 phylogenetic matrices were concatenated independently: three variants of the outgroup times four variants of species presence among thecate taxa (Table S1; Fig. 1). Maximum Likelihood phylogenies of the concatenated matrices were inferred in IQ-Tree v1.41 (11) by using the LG + I + GAMMA4 + F settings (-m TEST was run first to select this model) with 1000 ultrafast bootstraps, and RAxML by using the LG + GAMMA4 + F with 300 non-parametric bootstraps. Bayesian phylogenies were inferred in PhyloBayes MPI v1.5a (12) on CIPRESS Science Gateway (13) by using GTR + CAT + GAMMA4, -dc, and maxdiff<0.1 settings. Approximately unbiased (AU) and Expected likelihood weights (ELW) test scores for alternative tree topologies were computed in Consel (14) and IQ-Tree (11), respectively (Tables S2 and S3).
Theca evolution and dinosterol. Protein sequences of dCel1 and dCel2 cellulases (accessions in Table S4) were each used to retrieve 250 closest hits from the NCBI nr database, which were complemented by dinoflagellate sequences from our dataset (Table S1; primarily MMETSP and NCBI databases). Thedataset was reduced to a smaller number of unique, phylogenetically representative sequences: sequences that were largely incomplete and sequences that were closely related to one another (including all sequences from Kryptoperidinium foliaceum CCAP 1116/3, a close relative of K. foliaceum CCMP1326) or formed very long branches in preliminary phylogenies were removed. The final phylogenetic matrix of the GH7 dataset (Fig. 2B and Fig. S1) contained 184 sequences and 260 amino acid sites and was prepared by an alignment in MAFFT and removal of hypervariable sites in BMGE (as above), and phylogenies were inferred in IQ-Tree, as described above (see Multiprotein phylogeny). Dinosterol distribution in dinoflagellates was mapped by surveying the available literature,in part by using the reference list at https://doi.pangaea.de/10.1594/PANGAEA.819698.
Plastid metabolism and protein targeting. Sequences of plastid and nuclear protein (Figs. 2-4) were identified by BLASTP searches in datasets listed in Table S1, in addition to transcriptomes from two Oxyrrhis marina strains (MMETSP1424-1426 and MMETSP0468-471 projects), and Pyrodinium bahamense (NCBI: PRJNA169246). Contaminant sequences were identified in several projects (e.g., Oxyrrhis MMETSP projects) and carefully removed based on phylogenetic incongruence. In the phylogenetic reconstruction of dependency on plastids in non-photosynthetic species, (Fig. 3A) each enzyme of pathways was analysed separately. New dinoflagellate sequences were included in single-protein alignments from an earlier study (15) and the protein origin was assessed by RAxML phylogenies (computed as above) or analyzed in newly prepared datasets (Fd, FNR, SufB, SufC, SufD, TPT; Table S5). The final phylogenetic matrix of IspC (Fig. 3B) contained 66 sequences and 331 aminoacid sites and was computed in IQ-Tree as described above (see GH7 dataset preparation). Dinoflagellate FASI / PKS polyproteins were reconstructed (Fig. 3A) from mutually overlapping fragments comprising at least two domains (Table S5); domain order and functional specificity of mature FASI / PKS forms remain unknown, but both FAS and PKS are likely to be present (most individual domains exist in multiple sequence contexts). Atypical plastid FabI was identified in Dinophysis acuminata that was closely related to homologs in the Kareniaceae (Fig. 3A) but other proteins of the pathway were not. It is unclear whether this protein sequences may be a contaminant (other Kareniaceae-like proteins were found in the Dinophysis transcriptome), but it remains unlikely that Dinophysis possess the plastid fatty acid biosynthesis pathway. Plastid targeting signals in Noctiluca, Dinophysis, and Oxyrrhis were analysed in plastid proteins carrying N-terminal extensions as compared their bacterial orthologs. The most complete sequence for each protein was selected from the following: MMETSP-predicted proteins, proteins newly predicted from MMETSP-assembled contigs (Oxyrrhis and Dinophysis), or proteins predicted from newly assembled MMETSP reads (Noctiluca Trinity assembly; note that N-termini of some MMETSP-predicted proteins were incomplete). Proteins that screened positively for signal peptides in SignalP 4.1 (D-score cutoff 0.45) were further tested for the presence of transit peptides in ChloroP 1.1 at 0.45 cTP-score cutoff (16) and the strongest candidates for plastid targeting were listed in Table S6 (trans-membrane regions were predicted in TMHMM v2.0). The cleavage site of Plasmodium falciparum ferredoxin (Fig. S2) was predicted by PATS (http://gecco.org.chemie.uni-frankfurt.de/pats/pats-index.php), a species-specific tool for prediction of targeting pre-sequences. Partial sufB sequences were identified in Noctiluca (55%GC, c20274_g1_i1, and c20274_g2_i1 contigs in the Trinity assembly), Oxyrrhis marina MMETSP0468 (57.7% GC; contig CAMNT_0034061651), and Dinophysis (66.7% GC; contig CAMNT_0021013865). Plastid clpC fragments were identified in Noctiluca (55.4% GC; c23770_g4_i1and c23770_g5_i1 contigs in the Trinity assembly), Dinophysis (69.1% GC; contig CAMNT_0020950785), and Oxyrrhis marina MMETSP0468 (60.8% GC; contig
CAMNT_0034034689).
Character evolution. Protein sequences of HLP-I and HLP-II (accessions in Table S4) were each used to retrieve 250 closest hits from the NCBI nr database; top hits among environmental and NCBI EST entries were also included. The final phylogenetic matrix (Fig. 4 and Fig. S3) contained 114 sequences and 99 amino acid sites and was prepared by adding sequences of dinoflagellates, removing closely related sequences, and alignment processing, and phylogenies were inferred in IQ-Tree, all as describedabove (see GH7 dataset preparation). The data presented in the character map were compiled from the literature and ancestral states were reconstructed by parsimony on the consensus of dinoflagellates relationships as established in this study (Fig. 1), taking known lower-level relationships into account (e.g., (17, 18)) . Transcripts of the three mitochondrion-localized protein-coding genes in Noctiluca were identified in the Trinity assembly by homology searches (cox1 on the contig c33015_g1_i5, cox3 on the contig c32288_g1_i2, and cob on the contig c32214_g1_i1). The cox3 transcript was found to becomplete and contain a canonical UAA stop codon at the expected position (i.e., one that is not generated by oligoadenylation of its 3' terminus as observed in other core dinoflagellates (19)); the onlycanonical stop codon in core dinoflagellates reported so far is in the cob of Symbiodinium minutum; (20)). Trascriptomic paired-end Illumina 50bp reads were mapped onto the assembled cox3 contig by using Bowtie2 (v2.2.9); reads mapped continually across the region where the split occurs in other coredinoflagellates with paired-end reads connecting both sides of the split (no indication of two separate RNA fragments, trans-splicing, or oligoA tailing was observed). PCR corresponding to a near-full length cox3 spanning both sides of the split was done by using Pfu polymerase, Rnase-treated genomic DNA of Noctiluca and specific primers. The reaction yielded a single product of the correct size (no product was observed in 'Dnase-treated template' and 'no template' controls) and the sequence of this product corresponded to Noctiluca cox3. Polymorphism at multiple sites was observed in the chromatogram and the DNA consensus differed in several nucleotides from the transcriptomic contig, where we also observed extensive polymorphism by read mapping: thus, the number of and variation among cox3 copies and whether editing of their transcripts is present remain to be established.
SI References:1. Gifford DJ (1985) Laboratory culture of marine planktonic oligotrichs(Ciliophora, Oligotrichida). Mar Ecol Prog Ser 23(3):257–267.
2. Harrison PJ, Waters RE, Taylor FJR (1980) A Broad Spectrum Artificial Sea Water Medium for Coastal and Open Ocean Phytoplankton1. J Phycol 16(1):28–35.
3. Keeling PJ, et al. (2014) The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptomesequencing. PLoS Biol 12(6):e1001889.
4. Bachvaroff TR, et al. (2014) Dinoflagellate phylogeny revisited: Using ribosomal proteins to resolve deep branching dinoflagellate clades. Mol Phylogenet Evol 70:314–322.
5. Gornik SG, et al. (2015) Endosymbiosis undone by stepwise elimination of the plastid in a parasitic dinoflagellate. Proc Natl Acad Sci 112(18):5767–5772.
6. Burki F, Okamoto N, Pombert J-F, Keeling PJ (2012) The evolutionary history of haptophytes and cryptophytes: phylogenomic evidence for separate origins. Proc R Soc B Biol Sci 279(February):2246–2254.
7. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4):772–780.
8. Criscuolo A, Gribaldo S (2010) BMGE (Block Mapping and Gathering with Entropy): a new
software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 10(1):210.
9. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313.
10. Roure B, Rodriguez-Ezpeleta N, Philippe H (2007) SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics. BMC Evol Biol 7 Suppl 1:S2–S2.
11. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ (2015) IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol Biol Evol 32(1):268–274.
12. Lartillot N, Rodrigue N, Stubbs D, Richer J (2013) PhyloBayes MPI: Phylogenetic Reconstruction with Infinite Mixtures of Profiles in a Parallel Environment. Syst Biol 62(4):611–615.
13. Miller MA, Pfeiffer W, Schwartz T (2011) The CIPRES science gateway: a community resourcefor phylogenetic analyses. Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery (ACM), p 41.
14. Shimodaira H, Hasegawa M (2001) CONSEL: for assessing the confidence of phylogenetic treeselection. Bioinformatics 17(12):1246–1247.
15. Janouškovec J, et al. (2015) Factors mediating plastid dependency and the origins of parasitism in apicomplexans and their close relatives. Proc Natl Acad Sci U S A 112(33):10200–7.
16. Emanuelsson O, Brunak S, von Heijne G, Nielsen H (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2(4):953–71.
17. Orr RJS, Murray SA, Stüken A, Rhodes L, Jakobsen KS (2012) When naked became armored: an eight-gene phylogeny reveals monophyletic origin of theca in dinoflagellates. PLoS ONE 7(11):e50004.
18. Saldarriaga J (2004) Molecular data and the evolutionary history of dinoflagellates. Eur J Protistol 40(1):85–111.
19. Jackson CJ, et al. (2007) Broad genomic and transcriptional analysis reveals a highly derived genome in dinoflagellate mitochondria. BMC Biol 5:41–41.
20. Shoguchi E, Shinzato C, Hisata K, Satoh N, Mungpakdee S (2015) The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans. Genome Biol Evol 7(8):2237–2244.
SI Figures and Tables:
Fig. S1: Phylogeny of cellulases of the Glycosyl Hydrolase 7 family (next page). Analysis of Glycosyl hydrolase family 7 (GH7) reveals cellulases in athecate dinoflagellates (bold) and their radiation in the thecates (color-coded). Unrooted best Maximum likelihood tree (IQ-Tree) was inferred from 119 dinoflagellate and 65 non-dinoflagellate eukaryotic protein sequences limited to taxonomically representative, near full-length, and slow-evolving entries. GH7 cellulases from Pyrocystis lunula (dCel1) and Lingulodinium polyedrum (dCel2) are highlighted. Eight putative paralogs are shown that were likely present in the ancestor of all living thecate orders are shown by vertical arrows; the exact number of ancestral paralogs is difficult to predict due to the low resolution of the tree, apparent incomplete lineage sorting and putative horizontal gene transfer (see, e.g., Bigelowiella natans), but was likely even higher - see 13-16 paralogous forms in the Gonyaulacales only.
Fig. S2: In silico targeting predictions in plastid ferredoxins (see other plastid-targeted proteins in Table S6). Protein N-termini are shown to scale and show signal peptide D-scores (SignalP) and transit peptide cTP-scores (TargetP) and four conserved cysteines required for Fe-S formation. Spliced leaders or their fragments (SL=length in nucleotides) at the 5' end of dinoflagellate mRNAs indicate N-complete proteins. Porphyra ferredoxin is plastid-genome encoded and does not require targeting pre-sequences. The plastid ferredoxin in Oxyrrhis is truncated but contains an N-terminal extension carrying (a near-complete or complete) transit peptide region suggesting that the protein is targeted to the plastid like in other dinoflagellates.
Fig. S3: A novel dinoflagellate histone-like protein (next page). Phylogeny of bacterial (HU-like) and dinoflagellate histone-like proteins (HLP) reveals a novel dinoflagellate type (HLP-II), which has amutually exclusive distribution with HLP-I. Best Maximum likelihood tree (IQ-Tree) with ultrafast bootstrap supports at branches (only >50 supports are shown). Several environmental sequences (ENV)including those derived from dinoflagellate spliced-leader libraries (ENV dinoSL) are shown. The previously characterized HCc3 in Crypthecodinium cohnii (HLP-I) is highlighted in bold. Other histone-like proteins in eukaryotes (e.g., HU-like proteins in the plastid) are not closely related to eitherof the dinoflagellate HLP forms and were not included in the phylogeny.
Table S1: Sequence sources and phylogenetic matrices used. Species and data presence in final concatenated phylogenetic matrices (Root 1-3; Fig. 1) with sequence sources: matrix size (sites #), percentage of missing sites (%MS), genes (%MG), and merged chimeric entries (%CH; Materials and Methods).
Group Operational taxonomic unit (Fig. 1)Root 1 sites #
% MS
% MG
% CH
Root 2 sites #
% MS
% MG
% CH
Root 3sites #
% MS
% MG
% CH Data source
Dinoflagellates Akashiwo sanguinea 27237 7 2 4 28820 6 2 4 28779 7 2 4 Bachvaroff et al., 2014 MPE
Dinoflagellates
Alexandrium spp. (A. tamarense, A. catenella OF101, A. minutum, A. ostenfeldii) 28346 4 1 5 29619 4 1 3 29857 4 1 3 NCBI EST; MMETSP0790
Dinoflagellates Amoebophrya sp. ex Akashiwo 26900 9 6 0 28385 8 6 0 --- --- --- --- Bachvaroff et al., 2014 MPE
Dinoflagellates Amoebophrya sp. ex Karlodinium 25811 12 3 0 27025 12 3 0 --- --- --- --- Bachvaroff et al., 2014 MPE
Dinoflagellates Amphidinium carterae CCMP1314 28630 3 0 0 29931 3 0 0 30074 3 0 0MMETSP0258-MMETSP0259; Bachvaroff et al., 2014 MPE
Dinoflagellates Dinophysis acuminata DAEP01 25106 15 2 29 26629 13 2 30 26904 13 2 30 MMETSP0797
Dinoflagellates Durinskia baltica CSIRO CS-38 26195 11 2 3 27245 11 2 2 27338 12 2 2 MMETSP0116-MMETSP0117
Dinoflagellates Gymnodinium catenatum GC744 28347 4 2 5 29929 3 2 4 30236 2 2 4 MMETSP0784
Dinoflagellates Hematodinium sp. SG-2012 ex Nephrops 28763 2 2 0 29746 3 2 0 --- --- --- --- NCBI TSA: GEMP00000000
DinoflagellatesHeterocapsa spp. (H. triquetra CCMP449; H. rotundata SCCAP K-0483) 26353 10 5 21 27614 10 5 21 27799 10 5 21 NCBI EST; MMETSP0503
Dinoflagellates Karenia brevis (CCMP2229, Wilson) 28973 1 1 1 30373 1 1 1 30567 1 1 1NCBI EST; MMETSP0027, MMETSP0029-MMETSP0031
DinoflagellatesKarlodinium veneficum (CCMP2283, CCMP 415, CCMP1974) 29119 1 0 4 30615 1 0 4 30816 1 0 4
NCBI EST; Bachvaroff et al., 2014 MPE; MMETSP1015-MMETSP1017
Dinoflagellates Kryptoperidinium foliaceum CCAP 1116/3 25472 13 5 3 26649 13 5 2 26986 13 5 3 MMETSP0118-MMETSP0119
Dinoflagellates Kryptoperidinium foliaceum CCMP 1326 26666 9 3 1 27709 10 3 1 28055 9 3 1 MMETSP0120-MMETSP0121
Dinoflagellates Lingulodinium polyedrum 27323 7 0 18 28520 7 0 18 28515 8 0 17 NCBI TSA: GABP00000000
Dinoflagellates Noctiluca scintillans SPMC136 27743 6 2 4 28787 6 2 3 29032 6 2 3MMETSP0253; NCBI TSA: GELK00000000
Dinoflagellates Oxyrrhis marina (CCMP1788, 44_PLY01) 11227 62 40 0 11114 64 41 0 --- --- --- --- NCBI EST; Lowe et al., 2012
Dinoflagellates Polarella glacialis CCMP1383 27590 6 1 12 29182 5 1 12 29304 5 1 12 MMETSP0227
Dinoflagellates Polykrikos lebourae 10500 64 30 20 10641 65 30 19 10637 66 30 19 Gavelis et al., 2015
Dinoflagellates Prorocentrum minimum CCMP2233 27625 6 2 8 29208 5 2 8 29518 5 2 8 MMETSP0267-MMETSP0269
Dinoflagellates Protoceratium reticulatum CCCM535 27388 7 0 4 28527 7 0 4 28762 7 0 4 MMETSP0228
Dinoflagellates Scrippsiella trochoidea CCMP3099 28580 3 0 6 30275 2 0 6 30340 2 0 5 MMETSP0270-MMETSP0272
Dinoflagellates Symbiodinium minutum Mf1.05b 21137 28 12 1 21589 30 12 1 21754 30 12 1Symbiodinium minutum genome database
Dinoflagellates Symbiodinium sp. CassKB8 25884 12 3 0 26788 13 3 0 26726 14 3 0NCBI SRA: SRX076696; http://medinalab.org/zoox/
Dinoflagellates Togula jolla CCCM725 29018 1 0 2 30353 1 0 2 30617 1 0 2 MMETSP0224
Perkinsids Perkinsus marinus 27188 8 0 0 28253 8 0 0 --- --- --- --- NCBI NR
Apicomplexans Babesia bovis 26932 8 5 0 --- --- --- --- --- --- --- --- NCBI NR
Apicomplexans Babesia microti 25736 12 10 0 --- --- --- --- --- --- --- --- NCBI NR
Apicomplexans Cryptosporidium muris 26442 10 6 0 --- --- --- --- --- --- --- --- NCBI NR
Apicomplexans Cryptosporidium parvum 25233 14 8 0 --- --- --- --- --- --- --- --- NCBI NR
Apicomplexans Eimeria tenella 20938 29 16 2 --- --- --- --- --- --- --- --- NCBI NR
Apicomplexans Plasmodium falciparum 27311 7 4 0 --- --- --- --- --- --- --- --- NCBI NR
Apicomplexans Theileria annulata 23842 19 11 0 --- --- --- --- --- --- --- --- NCBI NR
Apicomplexans Toxoplasma gondii 26623 9 5 0 --- --- --- --- --- --- --- --- NCBI NR
Ciliates Ichthyophthirius multifiliis 25145 14 13 0 --- --- --- --- --- --- --- --- NCBI NR
Ciliates Paramecium tetraurelia 24211 18 21 0 --- --- --- --- --- --- --- --- NCBI NR
Ciliates Oxytricha trifallax 25367 14 13 0 --- --- --- --- --- --- --- --- NCBI NR
Ciliates Tetrahymena thermophila 26960 8 6 0 --- --- --- --- --- --- --- --- NCBI NR
Stramenopiles Aureococcus anophageferrens 27214 7 4 0 --- --- --- --- --- --- --- --- NCBI NR
Stramenopiles Ectocarpus siliculosus 28646 3 0 0 --- --- --- --- --- --- --- --- NCBI NR
Stramenopiles Saprolegnia parasitica 26602 10 8 0 --- --- --- --- --- --- --- --- NCBI NR
Stramenopiles Schizochytrium aggregatum 27594 6 4 0 --- --- --- --- --- --- --- --- NCBI NR
Stramenopiles Thalassiosira pseudonana 28137 4 1 0 --- --- --- --- --- --- --- --- NCBI NR
TOTAL 29400 12 30780 12 30988 10
Table S2: Testing of selected alternative topologies of Noctiluca and Akashiwo. Probability scores for different topologies as given by Approximately unbiased (AU) and Expected likelihood weights (ELW) tests. Scores of p=0.05 or greater are highlighted in bold; n.a.=not applicable, outgroup missing.Test Dataset Root1 Root2 Root3 Topology
p(AU) p(ELW) p(AU) p(ELW) p(AU) p(ELW)
Noctiluca an early branch among core dinoflagellates, a sister lineage to Amphidinium,or branching as a second lineage after Amphidinium
All 1.000 1.000 1.000 1.000 n.a. n.a. (OUT,(Noc,(Amp,OCDs)));
2E-036 0 1E-005 0 n.a. n.a. (OUT,((Noc,Amp),OCDs));
1E-060 0 2E-004 0 n.a. n.a. (OUT,(Amp,(Noc,OCDs)));
All-Din 1.000 1.000 1.000 1.000 n.a. n.a. (OUT,(Noc,(Amp,OCDs)));
3E-004 0 1E-005 0 n.a. n.a. (OUT,((Noc,Amp),OCDs));
2E-045 0 1E-004 0 n.a. n.a. (OUT,(Amp,(Noc,OCDs)));
All-Pro 1.000 1.000 1.000 1.000 n.a. n.a. (OUT,(Noc,(Amp,OCDs)));
2E-008 0 3E-005 0 n.a. n.a. (OUT,((Noc,Amp),OCDs));
2E-008 0 1E-007 0 n.a. n.a. (OUT,(Amp,(Noc,OCDs)));
All-Din&Pro
1.000 1.000 1.000 1.000 n.a. n.a. (OUT,(Noc,(Amp,OCDs)));
5E-006 0 3E-004 0 n.a. n.a. (OUT,((Noc,Amp),OCDs));
1E-089 0 3E-004 0 n.a. n.a. (OUT,(Amp,(Noc,OCDs)));
Akashiwo a sister lineage of thecate dinoflagellates, a sister lineage of Togula+Gymnodiniaceae s.s., or branching ealier than Togula+Gymnodiniaceae s.s.
All 0.718 0.692 0.889 0.865 0.928 0.912 (OUT,(Gym+Tog,(Aka,The)));
0.315 0.305 0.133 0.129 0.082 0.088 (OUT,((Aka,Gym+Tog),The));
0.008 0.003 0.009 0.005 0.001 0.001 (OUT,(Aka,(Gym+Tog,The)));
All-Din 0.748 0.725 0.896 0.876 0.933 0.907 (OUT,(Gym+Tog,(Aka,The)));
0.290 0.266 0.139 0.120 0.083 0.091 (OUT,((Aka,Gym+Tog),The));
0.017 0.010 0.019 0.004 0.008 0.002 (OUT,(Aka,(Gym+Tog,The)));
All-Pro 0.655 0.619 0.871 0.832 0.877 0.866 (OUT,(Gym+Tog,(Aka,The)));
0.397 0.373 0.165 0.162 0.143 0.134 (OUT,((Aka,Gym+Tog),The));
0.021 0.009 0.014 0.006 0.003 0.000 (OUT,(Aka,(Gym+Tog,The)));
All-Din&Pro
0.596 0.565 0.868 0.840 0.904 0.891 (OUT,(Gym+Tog,(Aka,The)));
0.468 0.422 0.179 0.152 0.117 0.105 (OUT,((Aka,Gym+Tog),The));
0.032 0.013 0.026 0.008 0.006 0.004 (OUT,(Aka,(Gym+Tog,The)));
Table S3: Testing of all alternative topologies among thecate dinoflagellate orders. Probability scores for different topologies as given by Approximately unbiased (AU) and Expected likelihood weights (ELW) tests. All topologies in which at least on of the tests gave p=0.01 or greater are shown; p=0.05 or greater are highlighted in bold.Dataset Root1 Root2 Root3 Topology (OUT=outgroup)
p(AU) p(ELW) p(AU) p(ELW) p(AU) p(ELW)
All 0.580 0.279 0.429 0.156 0.366 0.092 (OUT,((Din,Pro),(Gon,(Per,Sym))));
0.570 0.245 0.742 0.351 0.773 0.376 (OUT,((Din,Gon),(Per,(Pro,Sym))));
0.524 0.208 0.458 0.108 0.366 0.059 (OUT,(Din,(Gon,(Per,(Pro,Sym)))));
0.515 0.227 0.565 0.267 0.611 0.331 (OUT,((Gon,(Din,Pro)),(Per,Sym)));
0.110 0.017 0.178 0.033 0.255 0.053 (OUT,(Din,(Per,(Gon,(Pro,Sym)))));
0.058 0.011 0.202 0.045 0.150 0.014 (OUT,(Din,((Per,Gon),(Pro,Sym))));
0.044 0.003 0.071 0.005 0.067 0.007 (OUT,(Din,((Gon,Pro),(Per,Sym))));
0.038 0.001 0.125 0.026 0.180 0.046 (OUT,(Per,((Din,Gon),(Pro,Sym))));
0.032 0.002 0.041 0.004 0.007 0 (OUT,((Din,Gon),(Pro,(Per,Sym))));
0.021 0.002 0.031 0 0.005 0 (OUT,(Din,(Gon,(Pro,(Per,Sym)))));
0.017 0 0.001 0 0.020 0 (OUT,((Din,(Pro,Gon)),(Per,Sym)));
0.013 0 <0.001 0 0.008 0 (OUT,(Gon,(Din,(Per,(Pro,Sym)))));
0.012 0 <0.001 0 <0.001 0 (OUT,(Gon,((Pro,(Din,Per)),Sym)));
0.009 0.004 0.016 0.002 0.053 0.005 (OUT,((Din,Pro),(Per,(Gon,Sym))));
0.007 0 0.028 0.001 0.048 0.005 (OUT,((Pro,(Din,Gon)),(Per,Sym)));
0.005 0 0.019 0 0.006 0 (OUT,(Din,(Pro,(Gon,(Per,Sym)))));
<0.001 0 <0.001 0 0.017 0 (OUT,((Pro,Gon),(Din,(Per,Sym))));
<0.001 0 0.021 0 <0.001 0 (OUT,(Pro,(Din,(Per,(Gon,Sym)))));
<0.001 0 0.016 0 <0.001 0 (OUT,(Per,(Pro,(Din,(Gon,Sym)))));
<0.001 0 0.048 0.002 0.059 0.011 (OUT,((Per,(Din,Gon)),(Pro,Sym)));
<0.001 0 0.023 0 <0.001 0 (OUT,(Pro,(Din,(Gon,(Per,Sym)))));
<0.001 0 0.008 0 0.010 0.001 (OUT,(Pro,((Din,Gon),(Per,Sym))));
<0.001 0 <0.001 0 0.012 0 (OUT,((Pro,(Din,Per)),(Gon,Sym)));
<0.001 0 <0.001 0 0.018 0 (OUT,(Pro,(Gon,((Din,Per),Sym))));
<0.001 0 <0.001 0 0.036 0 (OUT,(Pro,(Gon,(Din,(Per,Sym)))));
All-Din 0.874 0.681 0.659 0.344 0.638 0.345 (OUT,(Gon,(Per,(Pro,Sym))));
0.336 0.159 0.544 0.290 0.613 0.312 (OUT,(Per,(Gon,(Pro,Sym))));
0.200 0.111 0.303 0.138 0.289 0.158 (OUT,((Pro,Gon),(Per,Sym)));
0.126 0.035 0.449 0.211 0.369 0.159 (OUT,((Gon,Per),(Pro,Sym)));
0.037 0.012 0.090 0.015 0.086 0.026 (OUT,(Pro,(Gon,(Per,Sym))));
0.012 0.003 0.008 0 <0.001 0 (OUT,(Gon,(Pro,(Per,Sym))));
<0.001 0 0.011 0 0.009 0 (OUT,((Pro,Per),(Gon,Sym)));
<0.001 0 <0.001 0 0.014 0 (OUT,(Pro,(Per,(Gon,Sym))));
All-Pro 0.603 0.569 0.761 0.713 0.824 0.771 (OUT,((Din,Gon),(Per,Sym)));
0.466 0.429 0.329 0.274 0.278 0.212 (OUT,(Din,(Gon,(Per,Sym))));
0.008 0.002 0.012 0.011 0.033 0.016 (OUT,(Din,(Per,(Gon,Sym))));
0 0 <0.001 0 0.010 0 (OUT,(Per,(Gon,(Din,Sym))));
All-Din&Pro 0.991 0.988 0.984 0.966 0.960 0.945 (OUT,(Gon,(Per,Sym)));
0.016 0.009 0.028 0.025 0.067 0.050 (OUT,(Per,(Gon,Sym)));
0.007 0.003 0.020 0.009 0.013 0.006 (OUT,(Sym,(Per,Gon)));
Table S4: Reference accessions for dinoflagellate cellulases, CESA-like and histon-like proteins. NCBI, CAMPEP=MMETSP (see Table S1), and S. minutum genome Db protein accession are shown.Protein or Enzyme type Protein name Dinoflagellate species Reference accession Reference source
Cellulose/polysacharide metabolism
Glycosyl hydrolase family 7, dCel1 Pyrocystis lunula ADG63073 NCBI nr
Glycosyl hydrolase family 7 Amphidinium carterae comp414_c0_seq2 NCBI TSA
Glycosyl hydrolase family 7 Karenia brevis CCMP2229 CAMPEP_0173626610
MMETSP0027, MMETSP0029-MMETSP0031
Glycosyl hydrolase family 7 Noctiluca scintillans SPMC136 CAMPEP_0194478120 MMETSP0253
Glycosyl hydrolase family 7 Oxyrrhis marina LB1974 CAMPEP_0190395562
MMETSP1424-MMETSP1426
Glycosyl transferase CESA-like, type 1 Alexandrium catenella OF101 CAMPEP_0171158290 MMETSP0790
Glycosyl transferase CESA-like, type 1 Gymnodinium catenatum GC744 CAMPEP_0117466612 MMETSP0784
Glycosyl transferase CESA-like, type 1 Karenia_brevis CCMP2229 CAMPEP_0173802094
MMETSP0027, MMETSP0029-MMETSP0031
Glycosyl transferase CESA-like, type 1
Protoceratium reticulatum CCCM535 CAMPEP_0168370212 MMETSP0228
Glycosyl transferase CESA-like, type 1
Scrippsiella trochoidea CCMP3099 CAMPEP_0115463718
MMETSP0270-MMETSP0272
Glycosyl transferase Karlodinium micrum CCMP2283 CAMPEP_0169068018 MMETSP1015-
CESA-like, type 2 MMETSP1017
Glycosyl transferase CESA-like, type 2
Prorocentrum minimum CCMP2233 CAMPEP_0177019396
MMETSP0267-MMETSP0269
Glycosyl transferase CESA-like, type 3 Symbiodinium minutum Mf1.05b 018942.t1 S. minutum genome Db
Histon-like proteins, DNA-binding
Histone-like protein, HLP-I
Amphidinium carterae CCMP1314 ACJ04919 NCBI nr
Histone-like protein, HLP-I Noctiluca scintillans SPMC136 CAMPEP_0194550744 MMETSP0253
Histone-like protein, HLP-II, HCC3 Crypthecodinium cohnii AAM97522 NCBI nr
Histone-like protein, HLP-II Symbiodinium minutum Mf1.05b 017975.t1 S. minutum genome Db
Table S5: Proteins in non-photosynthetic dinoflagellate plastids. Presence of genes encoding for plastid, and cytosolic and mitochondrial proteins in Noctiluca, Oxyrrhis, and Dinophysis; full protein names, enzyme commission numbers, protein and pathway abbreviations (used in Fig. 3A), and sequence sources (Noctiluca TSA assembly, CAMNT=MMETSP contigs, or CAMPEP=MMETSP proteins; see Table S1) are shown.
PathwayLocalization Abbreviation EC no. Protein name in Noctiluca? in Oxyrrhis? a in Dinophysis?
Isoprenoid precursor biosynthesis (Isopentenyl diphosphate= IPP/ Dimethylallyl diphosphate= DMAP)
Cytosolic (mevalonate pathway = MEV)
HMGCS 2.3.3.10hydroxymethylglutaryl-CoA synthase
HMGCR 1.1.1.34hydroxymethylglutaryl-CoA reductase
MVK 2.7.1.36 mevalonate kinase
PMVK 2.7.4.2 phosphomevalonate kinase
MVD 4.1.1.33diphosphomevalonate decarboxylase
Plastid (non-mevalonate pathway = MEP/DOXP)
DXS 2.2.1.71-deoxy-D-xylulose-5-phosphate synthase c37102_g1_i1 CAMNT_0034174429 a CAMPEP_0179347270
IspC (DXR) 1.1.1.2671-deoxy-D-xylulose-5-phosphate reductoisomerase c8491_g1_i1 CAMPEP_0190399830 a
CAMNT_0021046817, CAMNT_0020960785
IspD 2.7.7.602-C-methyl-D-erythritol 4-phosphate cytidylyltransferase c34245_g1_i1 CAMNT_0020996201
IspE 2.7.1.1484-diphosphocytidyl-2-C-methyl-D-erythritol kinase c34076_g1_i1 CAMNT_0034113123 CAMNT_0021004701
IspF 4.6.1.122-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase
c27998_g3_i1, c27998_g2_i1 CAMPEP_0190331156
CAMNT_0020973523, CAMNT_0021088175
IspG 1.17.7.14-hydroxy-3-methylbut-2-en-1-yldiphosphate synthase c4614_g1_i1
CAMPEP_0190339536, CAMPEP_0190336160
CAMPEP_0179237040, CAMPEP_0179316044
IspH (LytB) 1.17.1.24-hydroxy-3-methylbut-2-enyl diphosphate reductase c37338_g1_i1 CAMNT_0020995807
Tetrapyrrole biosynthesis (heme, chlorophyll, etc.)
Mitochondrial (C4 pathway) only
ALAS 2.3.1.37 5-aminolevulinate synthase CAMNT_0034123849
Plastid (C5 pathway) only
GTR (HemA) 1.2.1.70 glutamyl-tRNA reductase c16606_g1_i1 CAMNT_0020962887
GSA (HemL) 5.4.3.8 glutamate-1-semialdehyde 2,1-aminomutase c6482_g2_i1 CAMNT_0021017431
Mitochondrial/Cytosolic (C4 pathway) or Plastid C5 pathway)
ALAD (HemB) 4.2.1.245-aminolevulinate dehydratase/ porphobilinogen synthase
c17529_g1_i1, c17529_g3_i1 CAMNT_0034089891 CAMNT_0020991467
PBGD (HemC) 2.5.1.61porphobilinogen deaminase hydroxymethylbilane synthase c1741_g1_i1_17 CAMNT_0034053199 CAMNT_0020995631
UROS (HemD) 4.2.1.75 uroporphyrinogen-III synthase CAMPEP_0179295544
UROD (HemE)
4.1.1.37 uroporphyrinogen decarboxylase
c18473_g1_i1, c18473_g2_i1, c5162_g1_i1,
CAMNT_0034059593 CAMPEP_0179265208, CAMPEP_0179260602, CAMPEP_0179236930
c44501_g1_i1
CPOX (HemF) 1.3.3.3 coproporphyrinogen III oxidasec43130_g1_i1, c19480_g3_i1 CAMPEP_0190301822 CAMNT_0021027311
PPOX (HemY) 1.3.3.4 protoporphyrinogen oxidasec32625_g3_i1, c7748_g1_i1
CAMNT_0034176619 a, CAMPEP_0190445206 a
CAMNT_0020983029, CAMPEP_0179218520
FECH (HemH) 4.99.1.1 ferrochelatase c37180_g1_i1 CAMNT_0021055343
Fatty acid biosynthesis and elongation
Cytosolic (polyketide synthase/ fatty acid synthase type I pathway = PKS/FASI)
FAAL 2.3.1.86fatty acid synthase type I, fatty acyl ligase domain
CAMPEP_0194492864, CAMPEP_0194556372, CAMPEP_0194556876, CAMPEP_0194490892
CAMPEP_0190385440, CAMPEP_0179308804 CAMPEP_0179250968
KS 2.3.1.86fatty acid synthase type I, ketoacyl synthase domain
CAMPEP_0194492864, CAMPEP_0194518942, CAMPEP_0194531188, CAMPEP_0194533868, CAMPEP_0194544822, CAMPEP_0194547234, CAMPEP_0194547510, CAMPEP_0194549100, CAMPEP_0194553672, CAMPEP_0194555636, CAMPEP_0194556058, CAMPEP_0194556250, CAMPEP_0194556372, CAMPEP_0194556646, CAMPEP_0194556876
CAMPEP_0190386610, CAMPEP_0190386310, CAMPEP_0190385440, CAMPEP_0190321934
CAMPEP_0179225310, CAMPEP_0179230504, CAMPEP_0179231766, CAMPEP_0179233104, CAMPEP_0179240346, CAMPEP_0179274784, CAMPEP_0179301140, CAMPEP_0179302604, CAMPEP_0179309896, CAMPEP_0179340930, CAMPEP_0179347172, CAMPEP_0179358456, CAMPEP_0179372828
AT 2.3.1.86fatty acid synthase type I, acyl transferase domain
CAMPEP_0194492864, CAMPEP_0194482594, CAMPEP_0194485702, CAMPEP_0194531888, CAMPEP_0194554940, CAMPEP_0194556372, CAMPEP_0194556876, CAMPEP_0194557686
CAMPEP_0190386610, CAMPEP_0190386310, CAMPEP_0190385440, CAMPEP_0190315816
CAMPEP_0179225352, CAMPEP_0179232228, CAMPEP_0179232754, CAMPEP_0179257472, CAMPEP_0179301464, CAMPEP_0179308134
DH 2.3.1.86fatty acid synthase type I, dehydrase domain
CAMPEP_0194551082, CAMPEP_0194552350
CAMPEP_0190386610, CAMPEP_0190323466 CAMPEP_0179257472
ER 2.3.1.86fatty acid synthase type I, enoyl reductase domain
CAMPEP_0194551082, CAMPEP_0194552350 CAMPEP_0190323466 CAMPEP_0179257472
KR 2.3.1.86fatty acid synthase type I, ketoacyl reductase domain
CAMPEP_0194551082, CAMPEP_0194552350, CAMPEP_0194492864
CAMPEP_0190386310, CAMPEP_0190384048, CAMPEP_0190330588
CAMPEP_0179227628, CAMPEP_0179257472, CAMPEP_0179268324, CAMPEP_0179292270, CAMPEP_0179306616, CAMPEP_0179310080, CAMPEP_0179311052, CAMPEP_0179311360
ACP 2.3.1.86fatty acid synthase type I, acyl carrier protein domain
CAMPEP_0194492864, CAMPEP_0194549100, CAMPEP_0194551082, CAMPEP_0194552350, CAMPEP_0194556058, CAMPEP_0194556372, CAMPEP_0194556876
CAMPEP_0190321934, CAMPEP_0190323466, CAMPEP_0190385440 CAMPEP_0179257472
TRD (SDR) 2.3.1.86fatty acid synthase type I, terminal reductase domain
Endoplasmic reticulum (ER fatty acid elongation pathway)
ELO 2.3.1.199 beta-ketoacyl-CoA synthase CAMPEP_0194480994CAMPEP_0190376278, CAMPEP_0190332524
CAMPEP_0179225992, CAMPEP_0179261800
KCR 1.1.1.330 beta-ketoacyl-CoA reductase CAMPEP_0190306630
PHS 4.2.1.134 beta-hydroxyacyl-CoA dehydratase CAMPEP_0194540052 CAMPEP_0190316596
CAMPEP_0179242246, CAMPEP_0179358262
TECR 1.3.1.93 trans-2-enoyl-CoA reductase
Plastid (fatty acid synthase type II pathway = FASII)
FabD 2.3.1.39 malonyl-CoA-acyl carrier proteintransacylase
FabG 1.1.1.100beta-ketoacyl-acyl carrier protein reductase
FabH 2.3.1.180beta-ketoacyl-acyl carrier protein synthase III
FabZ 4.2.1.59D-3-hydroxyoctanoyl-acyl carrier protein dehydratas
FabI 1.3.1.9 enoyl acyl carrier protein reductase CAMPEP_0179266880 b
FabB/F 2.3.1.41beta-ketoacyl-acyl carrier protein synthetase
ACP acyl-carrier protein
Iron-sulfur (Fe-S) cluster assembly
Plastid (Suf pathway)
SufAIron-sulfur assembly protein SufA
SufBIron-sulfur assembly protein SufB
c20274_g1_i1, c20274_g2_i1 CAMNT_0034061651 CAMNT_0021013865
SufCIron-sulfur assembly protein SufC CAMNT_0021025055 CAMNT_0034058317
SufDIron-sulfur assembly protein SufD CAMNT_0021040041
SufEIron-sulfur assembly protein SufE
Ferredoxin redox system
Plastid (Fd – FNR pathway)
Fd (PetF) [2Fe-2S] ferredoxin c41939_g1_i1 CAMNT_0034037779 CAMNT_0020927107
FNR (PetH) 1.18.1.2 Ferredoxin NADP+ reductase c11604_g1_i1 CAMNT_0020923739
Triosephosphate translocation
Plastid (TPT translocon)
TPTTriosephosphate/phosphate transporter c40379_g1_i1
CAMNT_0021019703, CAMNT_0021010591
Protein folding and processing
Plastid (ClpC chaperone/protease)
ClpCChloroplast molecular chaperone ClpC
c23770_g4_i1 c23770_g5_i1 CAMNT_0034034689 CAMNT_0020950785
a sequence was obtained from Oxyrrhis marina LB1974 MMETSP1424-1426 combined assembly (all other were obtained from Oxyrrhis marina MMETSP0468-471 combined assembly).b closely related to Kareniaceae, a full plastid pathway unlikely to be present - see main text
Table S6: Signal and target peptide predictions. Prediction statistics of proteins predicted to be targeted to the non-photosynthetic plastid in Noctiluca, Oxyrrhis, and Dinophysis are listed (protein abbreviations correspond to Table S5). Protein ID: CAMNT=MMETSP contig at iMicrobe or Noctiluca scintillans Trinity assembly contig name in NCBI TSA; predicted cleavage sites of signal (Cmax and Ymax scores) and transit peptides (CS-score) are generally low - this has been observed previously in the dinoflagellate and Perkinsus plastid proteins.No
Dinoflagellate species
Protein and accession
N-terminal region integrity
Signal peptide (SP) prediction in SignalP 4.1Transit peptide (cTP) prediction inChloroP 1.1
Pro
tein
Pro
tein
ID
N-t
erm
inal
ext
ensi
on?
Met
pre
sent
?
Spl
iced
lead
er a
t 5' e
nd?
ST
OP
ups
trea
m o
f 1st M
et?
SP
Cm
ax s
core
SP
Ym
ax s
core
SP
Sm
ax s
core
SP
Sm
ean
scor
e
Cle
avag
e si
te p
ositi
ons
SP
D-s
core
SP
Net
wor
ks-u
sed
Phe
nyal
anin
e pr
esen
t?
(pos
ition
rel
ativ
e to
cl
eava
ge s
ite)
cTP
sco
re
cTP
CS
-sco
re
cTP
leng
th
Sec
ond
hydr
opho
bic
dom
ain
1Noctiluca scintillans IspD
c34245_g1_i1 yes yes 6 yes 0.644 0.736 0.961 0.852 30-31 0.798
SignalP-noTM
FAMP (-2)** 0.477 2.767 75
2Noctiluca scintillans IspE
c34076_g1_i1 yes yes 0.200 0.341 0.803 0.686 24-25 0.479
SignalP-TM FDLV (+1) 0.560 11.862 70
3Noctiluca scintillans IspG
c4614_g1_i1 yes yes 12 0.222 0.385 0.814 0.661 21-22 0.534
SignalP-noTM
FVSS (+7) 0.519 4.804 30
4Noctiluca scintillans IspH
c37338_g1_i1 yes yes 6 yes 0.340 0.447 0.710 0.573 19-20 0.515
SignalP-noTM
FALP (+14) 0.587 13.209 67
5Noctiluca scintillans Fd
c41939_g1_i1 yes yes 6 yes 0.399 0.581 0.940 0.852 27-28 0.727
SignalP-noTM
FAIA (+7)** 0.460 2.204 55
6Noctiluca scintillans FNR
c11604_g1_i1 yes yes 0.24 0.47 0.98 0.92 28-29 0.71
SignalP-noTM
FVQM (-3)** 0.55 11.93 51
7Dinophysis acuminata IspC
CAMNT_0021046817 yes yes 0.520 0.623 0.859 0.735 17-18 0.684
SignalP-noTM
FVPG (+1) 0.554 4.408 33
8Dinophysis acuminata IspF
CAMNT_0020973523 yes 0.323 0.495 0.854 0.749 19-20 0.632
SignalP-noTM FSHQ(-2) 0.476 1.677 22
9Dinophysis acuminata TPT1
CAMNT_0021019703 yes yes 6 0.758 0.692 0.765 0.634 23-24 0.661
SignalP-noTM 0.510 2.188 21
10Dinophysis acuminata TPT2
CAMNT_0021010591 yes yes yes 0.817 0.740 0.793 0.675 23-24 0.705
SignalP-noTM 0.523 1.992 21
11Dinophysis acuminata SufC
CAMNT_0021025055 yes 0.62 0.73 0.9 0.86 19-20 0.8
SignalP-noTM FAAA (-2) 0.46 2.91
4***
12Dinophysis acuminata FNR
CAMNT_0021011101 yes 0.22 0.42 0.95 0.8 21-22 0.63
SignalP-noTM
FASP (+13) 0.51 7.62 42
13Dinophysis acuminata Fd
CAMNT_0020927107 yes yes 8* yes 0.188 0.382 0.922 0.786 25-26 0.600
SignalP-noTM FVAP (+7) 0.463 4.645 67
14Oxyrrhis marina MMETSP0468 IspG
CAMNT_0034079579 yes 0.684 0.591 0.600 0.524 19-20 0.564
SignalP-TM FSLR (+5) 0.520 3.042 72 yes
15Oxyrrhis marina MMETSP0468 ALAD
CAMNT_0034089891 yes yes 0.609 0.612 0.885 0.707 26-27 0.650
SignalP-TM 0.518 0.664 73 yes
16Oxyrrhis marina MMETSP0468 SufB
CAMNT_0034061651 yes yes 0.18 0.38 0.89 0.8 15-16 0.55
SignalP-TM 0.45 2.91 64
17Oxyrrhis marina MMETSP0468 PBGD
CAMNT_0034053199 yes yes 0.427 0.486 0.707 0.588 19-20 0.527
SignalP-TM
FLQS (+8) 0.516 1.892 65 yes
Oxyrrhis marina LB1974 PBGD
CAMNT_0034168717 yes yes 6 0.159 0.335 0.841 0.710 17-18 0.485
SignalP-TM
FLES (+10) 0.527 1.892 70 yes
18Oxyrrhis marina MMETSP0468 Fd
CAMNT_0034037779 yes 0.544 5.420 40
*masked by 3 nucleotides at 5' end**alternative SP cleavage site at the position +1***alternative TP cleavage site likely present