De novo genome sequencing of Skeletonema marinoi and ... · Photo: Per Johander Skeletonema marinoi...
Transcript of De novo genome sequencing of Skeletonema marinoi and ... · Photo: Per Johander Skeletonema marinoi...
De novo genome sequencing ofSkeletonema marinoi and Surirella brebissonii
Mats Töpel!*, Magnus Alm Rosenblad!, Ulrika Lind!, Susanna Gross", Sandra Karlsten!,Jens Persson!, Mattias Backman!, Anna Godhe", Anders Blomberg!
1. Department of Chemistry and Molecular Biology, University of Gothenburg2. Department of Biology and Environmental Sciences, University of Gothenburg
Introduction
De novo whole genome sequencing of the two diatom species Suri-rella brebissonii (CCMP2919) and Skeletonema marinoi (GUMACC St54) are currently conducted as part of the Linnaeus Centre for Marine Evolutionary Biology (CeMEB) initiative at the University of Gothenburg. This work is part of the Infrastructure for Marine Genetic model Organisms (IMAGO) project, aimed at developing new marine model systems and provide genomic and genetic tools to study vital phenomena and components of coastal marine ecosystems.
Protein translocation in diatoms
A chloroplast’s genome only encodes ~100 proteins, but the organ-elle requires many more proteins in order to perform its functions in the cell. Translocons at the Outer and Inner Chloroplast enve-lope membranes (TOC and TIC, respectively) are the two multi-protein complexes in plants, red- and green algae that enable chlo-roplasts to import these essential nuclear-encoded proteins.
Diatom plastids, on the other hand, are surrounded by four mem-branes where the outermost is continuous with the endoplasmic reticulum (ER) [8]. The second membrane (known as the periplas-tid membrane [PPM]) is the remnant of the secondary endosymbiont’s plasma membrane (proposed to be of red algal origin) [9]. The two innermost membranes are homologous to the outer and inner envelope membranes in plant plastids and are de-rived from the membranes surrounding the cyanobiont of primary plastids [10].
The identity of the TOC and TIC translocons in diatoms and most other chromalveolate organism groups (e.g brown algae, dinofla-gellates and apicomplexan parasites) is mainly unknown. How-ever, bioinformatics analyses of whole genome sequences from dia-toms has shown that these systems are also present in diatoms. Bullman et al. [11] reported the discovery of an Omp85 protein that is localized in the third outermost plastid membrane (homologous to the outer envelope membrane in plants) of the diatom Phaeodactylum tricornutum.
To date, this Omp85 protein is the only reported putative member of the TOC complex in diatoms. Our phylogenetic analyses (including bacterial, plant and diatom sequences) reveals that the diatom sequences are of red algal origin and more specifically belongs to the Toc75 gene family. Unexpectedly long branches in the diatom part of the tree indicates a rapid, albeit even, evolu-tionary rate. This phenomenon has been reported on previously, but the significance of the phenomenon has not yet been thor-oughly investigated.
Assembly statistics
DNA libraries (insert size 150 and 3000 bp), of which one were gen-erated from an axenic culture, and one RNA library (300 bp) from Surirella brebissonii has been sequenced. One axenic 300 bp library from Skeletonema marinoi has been generated. Both genomes have been assembled using the CLC de novo assembler software package. Sequence reads where preprocessed using cutadapt [1] and the fastx toolkit [2] (for details see http://matstopel.se/notebook).
Skeletonema SurirellaTotal nt sequenced (Gb) 33 46Total input to assembly (Gb) 26 33Assembly size (Mb) 49 136Number of contigs (K) 53 244Average coverage (x/cont) 443 207N50 (bp) 1673 694Average contig lenght (bp) 929 557Longest contig (Kb) 506* 76*Putative bacterial symbiont.
Preliminary findings
Organelles ! The plastid of Surirella brebissonii contains a group II intron with an ORF, the first group II intron to be identified in dia-toms. Interestingly, the mitochondrial genome of S. brebissonii has lost the group II intron present in both Thalassiosira pseudonana and Phaeodactylum tricornutum mtDNA.Three putative components of the Translocon at the Outer Chloro-plast envelope membrane (TOC) have been identified in S. brebissonii and one in S. marinoi.
Cell wall ! Six silicon transporter genes (SIT’s) have been predicted to be present in the S. brebissonii genome, and two in S. marinoi. Bio-informatics analyses have also identified centric and pennate specific motives in these sequences. Frustulin and Silaffin/Cingulin proteins, that also are involved in diatom cell wall biogenesis, have been identified in both genomes, and preliminary analyses have found novel motifs in these sequences.
Phylogenetic analysis of the OMP85 superfamily that includes the Toc75 gene family of channel proteins. Preliminary analyses using data from Töpel et al. [4] as query sequences have identified at least one protein from the Omp85 superfamily, in the genomes of Surire-lla brebissonii and Skeletonema marinoi, respectively. Identified contigs where translated in all six reading frames, using the program getorf [5], and aligned to the query dataset using MAFFT [6]. Correct reading frames identified in this way were then used in BLAST searches of the publicly available diatom gene predictions, and subsequently anal-ysed together using MrBayes 3.2 [7].
Surirella brebissonii is an assymetric pennate bentic diatom which is approximately 45 um long and mostly found in brackish water. It was selected for sequencing because of its rather large size and assymetric form. It has since long been used for studies on chromosome separation.Photo: Per Johander
Skeletonema marinoi is a main primary producer during spring blooms in the North Atlantic and a valuable food source for zooplankton. Its generation time is 24 hours, which makes it ideal for studies of pheno-typic response. Benthic cells act as resting stages, with up to 50 000 per gram of sediment, and can survive for at least hundred years and thereby provide short-term evolutionary archives in sediments.Photo: Anna Godhe.
Evolutionary relationship between genera where whole genome data (WGS) is available. Albeit sparse (the number of diatoms have been estimated to ~200 000 species [3]), these seven species constitutes a broad phylogenetic sample from the diatom tree of life, covering many large morphological groups. Access to WGS data from either of the two groups Coscinodiscophycidae or Rhizosoleniophycidae would however signifi-cantly help improve our understanding of diatom evolution by including the crown node of the group in the analyses. Tree modified from [12].
Coscinodiscophycidae
Fragilariopsis
Phaeodactylum
Pseudo-nitzschia
Rhizosoleniophycidae
Thalassiosira
Surirella
Skeletonema
Radial Centrics
Bi(multi)polar Centrics
Raphid Pennates
The chloroplast protein translocation machinery in plants. The prepro-tein (black line) is first recognised by one of the TOC receptors (green), and subsequently transported through the Toc75 channel, and the TIC complex, to the chloroplast stroma. The identity of the TOC and TIC translocons in diatoms and most other chromalveolates is mainly unknown. Numbers indicate the names of the proteins. Graphics: Paula Töpel.
TOC
OEM
IEMIM
S
TIC
Cytoso
lStro
ma
Hs p 70
Hs p 70Hs p 60
6412
34
22
625532
SPP
2040
75
159
159
21110
Hsp93
References1. https://code.google.com/p/cutadapt/. 2. http://hannonlab.cshl.edu/fastx_toolkit/. 3. Bowler C., Vardi A., Allen A.E. (2010). Oceanographic and Biogeochemical Insights from Diatom Genomes. Annu. Rev. Mar. Sci. 2, 333–65. 4. Töpel, M., Ling Q. and Jarvis, P. (2012) Neofunctionalization within the Omp85 protein superfamily during chloroplast evolution. Plant Signaling and Behaviour. 7:2. 5. http://emboss.sourceforge.net/apps/cvs/emboss/apps/getorf.html. 6. Katoh, Standley (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. & Evol. 30, 772-780. 7. Huelsenbeck JP, Ronquist F. (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17(8), 754-755. 8. Gibbs, S. P. (1981) The chloroplast endoplasmic reticulum: structure, function, and evolutionary significance. Int. Rev. Cytol. 72, 49–99. 9. Cavalier-Smith T. (2003) Genomic reduction and evolution of novel genetic membranes and protein-targeting machinery in eukaryote-eukaryote chimaeras (meta-algae). Philos. Trans. R. Soc. Lond. B. Biol. Sci. 358, 109–134. 10. Palmer, J. D. (2003) The symbiotic birth and spread of plastids: how many times and whodunit? J. Phycol. 39, 1–9. 11. Bullmann L., Haarmann R., Mirus O., Bredemeier R., Hempel F., Maier U. G., Schleiff E. (2010) Filling the Gap, Evolutionarily Conserved Omp85 in Plastids of Chromalveolates. J. Biol. Chem. 285, 6848-6856. 12. Sorhannus U. (2004) Diatom phylogenetics inferred based on direct optimization of nuclear-encoded SSU rRNA sequences. Cladistics 20, 487–497.
0.4
Cyanobacteria
Plant OEP80
Diatom Toc75
Plant Toc75
Microcoleus_vaginatusOscillatoria_spCyanothece_sp
Thermosynechococcus_elongatusGloeobacter_violaceus
Brachypodium_distachyonOryza_sativa
Physcomitrella_patens
Arabidopsis_thaliana (atToc75-V)
Brachypodium_distachyon
Selaginella_moellendorffii
Arabidopsis_lyrata
Populus_trichocarpa
Volvox_carteri
Arabidopsis_thalianaAquilegia_coerulea
Zea_mays
Selaginella_moellendorffiiAquilegia_coerulea
Physcomitrella_patens
Arabidopsis_lyrata
Chlamydomonas_reinhardtii
Arabidopsis_thaliana
Aquilegia_coerulea
Oryza_sativa
Zea_mays
Ricinus_communis
Surirella_toc75
Thalassiosira_pseudonana
Phaeodactylum_2
Pseudo-nitzschiaFragilariopsis
Pseudo-nitzschia_2
Skeletonema
Phaeodactylum
Thalassiosira_oceanica
Cyanidioschyzon_merolae
Arabidopsis_lyrata
Galdieria_sulphuraria
Arabidopsis_thaliana (atToc75-III)Arabidopsis_lyrata
Oryza_sativa
Volvox_carteri
Pisum_sativum
Arabidopsis_thaliana (atToc75-I)
Oryza_sativa
Aquilegia_coerulea
Physcomitrella_patensPhyscomitrella_patensSelaginella_moellendorffii
Brachypodium_distachyonZea_maysBrachypodium_distachyon
Chlamydomonas_reinhardtii
Physcomitrella_patens
Arabidopsis_thaliana (atToc75-IV)
Arabidopsis_lyrata
Primary endosymbiosis
Secondary endosymbiosis
Gene duplication