static-content.springer.com10.1007/s114…  · Web viewQuality of initial data was checked using...

8
Supplementary materials Details of DNA sequencing Quality of initial data was checked using the NGS QC Toolkit with default parameters. Assembly of the Illumina 2×100 bp paired-end reads was performed using the CLC Genomics Workbench version 6.5 (assembly parameters: automatic word size, automatic bubble size, a minimum contig length of 200 bp and an insert size of 450-550 bp). The G+C content of the contigs was calculated using an in- house Perl script. Contigs longer than 1Kbp were collected for binning steps by adapting a previously described pipeline [11]. Firstly, their coverage by the reads was calculated using Bowtie (version 2.0.0) and SAMtools. The binning of the draft genomes was conducted based on read coverage and G+C content of their respective contigs. With minor modifications, the contigs from the same species could be clustered in reference to their similar G+C contents and sequencing depths (Fig. S1a). R Studio and R Script (https://github.com/MadsAlbertsen/multi-metagenome) were applied to group these contigs (Fig. S2A). Tetranucleotide frequencies (TNFs) of these contigs were calculated using calc.kmerfreq.pl in the pipeline [11], and principal component analysis (PCA) of the TNFs was conducted using the Vegan package 2.0-5. The contigs with similar TNFs were considered to be derived from the same genome and were grouped again (Fig. S2B). The draft genome consisting of grouped contigs in Fig. S2B has been deposited in the GenBank and is accessible under BioProject number PRJNA264957.

Transcript of static-content.springer.com10.1007/s114…  · Web viewQuality of initial data was checked using...

Page 1: static-content.springer.com10.1007/s114…  · Web viewQuality of initial data was checked using the NGS QC Toolkit with ... automatic word size, automatic ... //github.com/MadsAlbertsen/multi-metagenome)

Supplementary materials

Details of DNA sequencing

Quality of initial data was checked using the NGS QC Toolkit with default parameters. Assembly

of the Illumina 2×100 bp paired-end reads was performed using the CLC Genomics Workbench

version 6.5 (assembly parameters: automatic word size, automatic bubble size, a minimum contig

length of 200 bp and an insert size of 450-550 bp). The G+C content of the contigs was calculated

using an in-house Perl script. Contigs longer than 1Kbp were collected for binning steps by

adapting a previously described pipeline [11]. Firstly, their coverage by the reads was calculated

using Bowtie (version 2.0.0) and SAMtools. The binning of the draft genomes was conducted

based on read coverage and G+C content of their respective contigs. With minor modifications,

the contigs from the same species could be clustered in reference to their similar G+C contents

and sequencing depths (Fig. S1a). R Studio and R Script

(https://github.com/MadsAlbertsen/multi-metagenome) were applied to group these contigs (Fig.

S2A). Tetranucleotide frequencies (TNFs) of these contigs were calculated using calc.kmerfreq.pl

in the pipeline [11], and principal component analysis (PCA) of the TNFs was conducted using

the Vegan package 2.0-5. The contigs with similar TNFs were considered to be derived from the

same genome and were grouped again (Fig. S2B). The draft genome consisting of grouped

contigs in Fig. S2B has been deposited in the GenBank and is accessible under BioProject

number PRJNA264957.

Page 2: static-content.springer.com10.1007/s114…  · Web viewQuality of initial data was checked using the NGS QC Toolkit with ... automatic word size, automatic ... //github.com/MadsAlbertsen/multi-metagenome)

Figure S1. Saline sediment sampling under a microbial mat.

The sediment was sampled in the Thuwal cold seeps of the Red Sea (A). The saline sediment

under the mat was obtained by a ROV equipped with a CTD (B). In the push core, the sediment

was covered by milky water, which consisted of hypersaline seepage water (C). Scale bar = 2 cm.

Page 3: static-content.springer.com10.1007/s114…  · Web viewQuality of initial data was checked using the NGS QC Toolkit with ... automatic word size, automatic ... //github.com/MadsAlbertsen/multi-metagenome)

Figure S2. Genomic binning of the Aerophobetes bacterium TCS1.

The contigs belonging to TCS1 were figured out by G+C content and coverage (A). Principal component analysis was applied to check the consistency of the tetranucleotide frequencies of the contigs (B). The contigs circled in (B) were collected as the draft genome of TCS1.

Page 4: static-content.springer.com10.1007/s114…  · Web viewQuality of initial data was checked using the NGS QC Toolkit with ... automatic word size, automatic ... //github.com/MadsAlbertsen/multi-metagenome)

Figure S3. Phylogenetic relationships of the TCS1 lineage.

The maximum-likelihood tree was constructed using concatenated 31 conserved genes. The bootstrap supports were based on 1000 replicates.

Page 5: static-content.springer.com10.1007/s114…  · Web viewQuality of initial data was checked using the NGS QC Toolkit with ... automatic word size, automatic ... //github.com/MadsAlbertsen/multi-metagenome)

Figure S4. 16S rRNA sequence similarities of pyrosequencing amplicons to TCS1.

16S rRNA amplicons that showed high similarity (>95%) to that of the TCS1 were selected for the stem-leaf plot. Average and standard deviation of the similarities are demonstrated for sediment samples under mat (mat.sedi) and in the brine pool (brine.sedi).

Page 6: static-content.springer.com10.1007/s114…  · Web viewQuality of initial data was checked using the NGS QC Toolkit with ... automatic word size, automatic ... //github.com/MadsAlbertsen/multi-metagenome)

Table S1. Comparison of KEGG pathways in acetogens

Genome features and Value for indicated taxona

KEGG pathways 1 2 3 4 5 6

Genome size (Mbp) 1.27 5.00 3.29 4.13 2.63 3.86

G+C content (%) 43.0 51.3 68.9 62.9 55.8 49.8

Carbohydrate metabolism

ko00010 Glycolysis / Gluconeogenesis 18 23 11 16 27 19

ko00020 Citrate cycle (TCA cycle) 6 35 17 19 18 5

ko00030 Pentose phosphate pathway 20 11 9 12 19 20

ko00040 Pentose and glucuronate interconversions

9 5 4 4 6 14

ko00051 Fructose and mannose metabolism

13 11 6 6 16 17

ko00052 Galactose metabolism 7 5 4 4 10 15

ko00500 Starch and sucrose metabolism 6 19 3 13 14 19

ko00520 Amino sugar and nucleotide sugar metabolism

11 28 16 19 21 26

ko00620 Pyruvate metabolism 9 36 25 23 26 17

ko00650 Butanoate metabolism 1 25 15 18 11 6

Energy metabolism

ko00190 Oxidative phosphorylation 8 16 35 40 26 15

ko00720 Carbon fixation pathways 8 45 29 24 21 9

ko00910 Nitrogen metabolism 1 11 16 14 9 8

ko00920 Sulfur metabolism 3 15 13 15 14 14

Lipid metabolism

ko00061 Fatty acid biosynthesis 6 3 14 9 10 5

ko00062 Fatty acid elongation 0 2 1 0 0 0

ko00071 Fatty acid degradation 1 16 12 7 2 5

Membrane transport

ko02010 ABC transporters 34 94 24 51 55 92

ko03070 Bacterial secretion system 5 15 5 11 10 7

Page 7: static-content.springer.com10.1007/s114…  · Web viewQuality of initial data was checked using the NGS QC Toolkit with ... automatic word size, automatic ... //github.com/MadsAlbertsen/multi-metagenome)

Signal transduction

ko02020 Two-component system 2 68 51 56 42 34

Cell motility

ko02030 Bacterial chemotaxis 2 2 6 25 17 14

ko02040 Flagellar assembly 0 1 0 24 26 16

athe taxons are denoted as: 1. Aerophobetes bacterium TCS1 (Aerophobetes); 2. Desulfotignum phosphitoxidans DSM13687 (Deltaproteobacteria); 3. Geothrix fermentans DSM14018 (Acidobacteria); 4. Holophaga foetida TMBS4 (Acidobacteria); 5. Moorella thermoacetica ATCC39073 (Firmicutes); 6. Treponema azotonutricium ZAS-9 (Spirochaetes).

Table S2: Phylogenetic relationships of the marker proteins involved in Wood-Ljungdahl pathway

Page 8: static-content.springer.com10.1007/s114…  · Web viewQuality of initial data was checked using the NGS QC Toolkit with ... automatic word size, automatic ... //github.com/MadsAlbertsen/multi-metagenome)

Species AccessionCooSAerophobetes bacterium SCGC AAA255-F10

WP_029962778

Acetothermia bacterium BAL56233

Calescamantes bacterium JGI 0000106-I17 WP_029229838

Methanosarcina mazei WP_015410877

Desulfotignum phosphitoxidans WP_006964991

Desulfobacterium autotrophicum WP_015906132

Methanosarcina acetivorans WP_011023215

Syntrophorhabdus aromaticivorans WP_028893419

Thermacetogenium phaeum WP_015049729

Desulfobacca acetoxidans WP_013705176

Syntrophaceticus schinkii WP_044664559

Methanosaeta harundinacea WP_014587172

Methanosarcina barkeri WP_011308433

Caldanaerobacter subterraneus WP_009610668

Thermoanaerobacter kivui AIS53154

Desulfotomaculum acetoxidans WP_015757841

Methanotorris igneus WP_013798453

Desulfococcus oleovorans WP_012176598

Desulfotomaculum thermocisternum WP_027355632

Tepidanaerobacter acetatoxydans WP_013777659

Ammonifex degensii WP_015738374

Deferrisoma camini WP_025323520

Tepidanaerobacter acetatoxydans AFV95333

Clostridium arbusti WP_010234217

Ferroglobus placidus WP_012966999

Aerophobetes bacterium TCS1 118_1

AcsA/BAerophobetes bacterium SCGC AAA255-F10

WP_029962769

Oxobacter pfennigii ADR32217Acetothermia bacterium BAL57910

Desulfatibacillum alkenivorans WP_012609825

Spirochaeta sp. JC202 WP_037563760

Calescamantes bacterium JGI 0000106-I17 WP_029229843