Supporting Information - PNAS · 10/8/2014 · GMF1061042970635f_1_839_1 Diplosphaera colitermitum...
Transcript of Supporting Information - PNAS · 10/8/2014 · GMF1061042970635f_1_839_1 Diplosphaera colitermitum...
-
Supporting InformationSakowski et al. 10.1073/pnas.1401322111SI Materials and MethodsMetagenomic Libraries.ChesapeakeBay libraryCBBwassampled inSeptember 2002 and amplified using the linker-amplified shotgunlibrary (LASL)method before transformation and picking randomcolonies for Sanger sequencing (1). Chesapeake Bay library CBJwas sampled in October 2004 as part of the first Global OceanSurvey (2). DNA was inserted in a medium-copy plasmid andrandomly selected clones were Sanger-sequenced (2). Dry Tortu-gas libraries were sampled from surface seawater near the DryTortugas, Florida (24°29′N, 83°4′W) in January 2004. ChesapeakeBay libraries CFA through CFDwere collected over 24 h at stationCB 858 (38°58′N, 76°23′W) in July 2007. Water samples werecollected on July 30, 2007 at 0600 hours (CFA), 1130 hours (CFB),1630 hours (CFC), and July 31, 2007 at 0600 hours (CFD). Totalviral nucleic acids from the time series samples were separatedinto dsDNA, ssDNA, and RNA fractions using hydroxyapatitechromatography. The ssDNA and RNA fractions from each timepoint were pooled and transformed into dsDNA to provide li-braries CBS and CBR, respectively. Dry Tortugas and ChesapeakeBay libraries were amplified by the LASL method before trans-formation and sequencing (3). After the induction treatment virusparticles were concentrated by tangential-flow filtration (4). TheGulf of Maine Library GMF was sampled at station GOM04(44°07′5 ″N, 67°58′3 ″W) in January 2006.
Identification and Distribution of Putative Viroplankton RibonucleotideReductases. All sequences were screened by a Conserved DomainBLAST search (5) to confirm homology to ribonucleotide re-ductase (RNR). Each read was queried against the CDD database(v3.10) using Conserved Domain BLAST (5) and BLASTx (6) andsorted by the top results.
Interlibrary RNR Frequency Normalization.Differences in read lengthbetween libraries were corrected as follows to allow for interlibraryRNR frequency comparisons:
% RNR of library corrected =
0B@
rR×RNR
L
1CA× 100%;
where r is the mean read length of the individual library, R is themean read length of all libraries being compared, RNR is thenumber of reads with homology to RNR within the individuallibrary, and L is the number of reads in the individual library.
Intralibrary RNR Frequency Normalization. RNR frequencies werenormalized within a library by the mean read length of the libraryand gene length of the top BLAST hit reference phage for eachmetagenomic sequence. The corrected number of reads wascalculated as follows:
# Corrected readsðRNRcÞ =�rG
�×RNRn;
where r is the mean read length of the individual library, G is thegene length of the RNR from reference phage, and RNRn is the
number of reads with a top BLAST hit to a particular referencephage within the individual library.The proportion of each group/subunit combination per library
was calculated as follows:
% RNR group=subunit =�
RNRcPRNRc
�× 100%;
where RNRc is the corrected number of reads for a given RNRgroup/subunit.
Alignments and Phylogenetic Trees. Subsequent alignments and treeswere made by extracting sequences from the original alignment(Figs. S2, S3, and S4). Because of the high number of referencesequences within the class I Other, class II Other, and class IIRibonucleotide TriPhosphate Reductase (RTPR) groups (Fig. 1) itwas necessary to cluster these sequences at 80% identity using thefurthest neighbor algorithm in mothur (7). Representative se-quences from each cluster were aligned with the metagenomicsequences. Metagenomic sequences belonging to the class IIRTPR group were also clustered at 80% identity to reduce thenumber of sequences on the tree (Fig. S1).
Predicted RNR Group Abundances in the Chesapeake Bay. Theabundance (viruses per milliliter) of identified RNR groups waspredicted for libraries CFA–CFD using direct count values ob-tained from epifluorescence microscopy (8) and recruitment toRefSeq viral genomes. The abundance of each group was calcu-lated as (VA) × (BR/TB) × (RNR/G), where VA is the observedviral abundance (permilliliter) for a given library, BR is the numberof bases in the library that recruited to reference viral genomes, TBis the number of total bases in the library, RNR is the number ofpredicted RNR genes sampled in a given group in the library, andG is the number of total predicted genomes sampled in the library.
Identification of Redoxins. Thioredoxins and glutaredoxins fromRNR-encoding phages within the order Caudovirales were ob-tained from the GenBank nr database. These sequences werecompiled to create a reference database. Phage genomes werequeried against this reference database using BLASTx with ane-value cutoff of 1e-01 to identify hypothetical or unannotatedproteins that may be putative redoxins.
Contig Assembly and Annotation from the Rhode River. Fifty liters ofsurface water from theRhodeRiver was sampled at the SmithsonianEnvironmental Research Center in Edgewater, Maryland. The
-
4. Wommack KE, Sime-Ngando T, Winget DM, Jamindar S, Helton RR (2010) Filtration-based methods for the collection of viral concentrates from large water samples.Manual of Aquatic Viral Ecology, eds Wilhelm SW, Weinbaur MG, Suttle CA (Am. Soc.of Limnology and Oceanography, Waco, TX), pp 110–117.
5. Marchler-Bauer A, et al. (2011) CDD: A Conserved Domain Database for the functionalannotation of proteins. Nucleic Acids Res 39(Database issue):D225–D229.
6. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignmentsearch tool. J Mol Biol 215(3):403–410.
7. Schloss PD, et al. (2009) Introducing mothur: Open-source, platform-independent,community-supported software for describing and comparing microbial communities.Appl Environ Microbiol 75(23):7537–7541.
8. Chen F, Lu JR, Binder BJ, Liu YC, Hodson RE (2001) Application of digital imageanalysis and flow cytometry to enumerate marine viruses stained with SYBR gold.Appl Environ Microbiol 67(2):539–545.
9. John SG, et al. (2011) A simple and efficient method for concentration of ocean vi-ruses by chemical flocculation. Environ Microbiol Rep 3(2):195–202.
10. Noguchi H, Taniguchi T, Itoh T (2008) MetaGeneAnnotator: Detecting species-specific patterns of ribosomal binding site for precise gene prediction in anony-mous prokaryotic and phage genomes. DNA Res 15(6):387–396.
Sakowski et al. www.pnas.org/cgi/content/short/1401322111 2 of 10
www.pnas.org/cgi/content/short/1401322111
-
DTF1061043203475r_1_800_1; DT (1); GM (9)
Ectocarpus siliculosus (9)
GMF1061042942020f_103_880_1; CB (1); DT (1); GM (2)
CFA1061053817669f_1_691_1
Rhodococcus phage RGL3 (2)
Bacteriophage RM378
Alkaliphilus metalliredigens QYMF (2)
Candidatus Protochlamydia amoebophila UWE25
Mycobacterium phage Tiger (22)
Mycobacterium phage Gladiator (16)
CFA Contig 1 (6); CB (21); DT (12); GM (13)
DTF Contig 15 (2)
Thermus phage P23-45
Thermus phage P74-26
CFA1061053822165r_5_739_1
Nitrosococcus halophilus Nc4
Roseiflexus castenholzii DSM 13941 (4)
CBJ1098127015314_915_1_1
Celeribacter phage P12053L
Roseophage SIO1
Acanthocystis turfacea Chlorella virus 1 (7)
GMF Contig 14 (3); DT (2); GM (9)
GMF1061042970635f_1_839_1
Diplosphaera colitermitum TAV2 (2)
Salpingoeca sp. ATCC 50818
Monosiga brevicollis MX1
GMF1061042661264f_1_935_1
GMF Contig 3 (8); DT (6); GM (21)
DTF Contig 5 (3); CB (1); DT (25); GM (2)
Puniceispirillum phage HMO-2011
Chlamydomonas reinhardtii
Volvox carteri f. nagariensis
0.4
Mycobacterium phage
Phage RM378
Clade II
ATCV-1Clade I
Clade III
Clade IV
Clade I
Clade II
Clade III
Clade IV
Fig. S1. Unrooted maximum likelihood tree with 100 bootstrap replicates of class II RNR reference and putative metagenomic RTPR sequences. Metagenomicsequences from the large tree (Inset) were clustered at 80% identity. Representative metagenomic sequences were placed on the tree, with the number ofreads from each environment within that cluster listed. Bacterial references from the large tree (Inset) were clustered at 80% identity. Representative se-quences were placed on each tree. Numbers in parentheses following bacterial references indicate the number of reference sequences within that cluster. Scalebar represents amino acid substitutions per site. Bacteria are shown in purple, eukaryotes and eukaryotic viruses in orange, myoviruses in red, siphoviruses inblue, podoviruses in green, and metagenomic sequences in black. Celeribacter phage P12053L was colored as a podovirus based on its T7-like DNA polymeraseeven though it is officially listed as an unclassified dsDNA phage. Black, gray, and white circles represent bootstrap support ≥100%, 75%, and 50%, re-spectively. CB, Chesapeake Bay; DT, Dry Tortugas; GM, Gulf of Maine.
Sakowski et al. www.pnas.org/cgi/content/short/1401322111 3 of 10
www.pnas.org/cgi/content/short/1401322111
-
Cyanophage Ma-LMM01 Cyanophage Ma-HPM05
Cyanophage Ma-LMM02 Cyanophage Ma-LMM03
Cyanothece sp. CCY 0110 NrdA2
P-SSM2
Synechococcus sp. PCC7002
S-CBS2
CBR1061057323945r_825_1_1
P60
Pelagibacter phage HTVC008M CBJ1098101650305_23_923_1
Arthrospira platensis NIES-39 Arthrospira maxima CS-328
P-SSP9 DTS Contig 2 (4)
DTS1061059628551r_1_780_1
CBJ1098101635164_885_1_1 CFB1061053832039f_822_1_1
P-SS2
S-ShM2
Synechocystis sp. PCC6803
Cyanobacterium UCYN-A
DTF1061043186908f_854_1_1
S-SSM7
P-HM2 P-HM1
Cyanothece sp. PCC 7424 Microcystis aeruginosa PCC 7806
Cyanothece sp. CCY 0110 NrdA1 Cyanothece sp. ATCC 51142
Syn5
GMF Contig 7 (6)
Syn9
Syn1 S-PM2
CBJ1098127013765_232_910_2
Synechococcus sp. PCC 7335
CBJ Contig 2 (2) S-SM2
CFD Contig 14 (3)
S-CBS4
DTR1061059708722r_64_828_2
Synechococcus sp. JA-3-3Ab Synechococcus sp. JA-2-3B'a(2-13) Nostoc sp. PCC 7120 Anabaena variabilis ATCC 29413
Syn33
P-SSM7 P-SSM4
DTF1061043225462r_796_1_1 P-RSP5
P-SSP10
CBB017C04.y01_102_812_1
CFA Contig 10 (2) CFB Contig 5 (4)
P-GSP1
DTF Contig 8 (3) DTF1061042876837_990_1_1
Synechococcus elongatus PCC 7942 Thermosynechococcus elongatus BP-1
S-SM1
P-HP1
DTF1061043197464f_808_1_1 DTF1061043192230f_735_1_1
CBB008H08.y01_1_669_1
GMF Contig 39 (2)
P-SSP7
Prochlorococcus marinus subsp. marinus str. CCMP1375 Prochlorococcus marinus str. NATL1A
Syn19
P-SSP11 P-SSP5
S-CBP4
GMF1061042926875f_1_685_1
P-SSP3 P-SSP2
P-RSM4 S-SSM5
CBJ1098214050374_886_1_1 S-CBP3
Synechococcus sp. RCC307 Synechococcus sp. WH 5701
Prochlorococcus marinus str. MIT 9515 Prochlorococcus marinus subsp. pastoris str. CCMP1986 Prochlorococcus marinus str. MIT 9312
GMF Contig 11 (4)
Prochlorococcus marinus str. MIT 9303 Prochlorococcus marinus str. MIT 9313
Prochlorococcus marinus str. MIT 9301 Prochlorococcus marinus str. MIT 9215
GMF1061042934309f_687_1_1 GMF1061042937625r_817_1_1
GMF Contig 23 (3)
Synechococcus sp. WH 8102
GMF1061042970087r_1_912_1 KBS-S-1A
S-RIP2 DTS Contig 4 (2)
Synechococcus sp. CC9605
Synechococcus sp. WH 7803 Synechococcus sp. WH 7805
Synechococcus sp. RS9917
GMF Contig 5 (7)
CBB019C09.y01_1_709_1 S-RIP1
Synechococcus sp. CC9902 Synechococcus sp. BL107
Synechococcus sp. RS9916 Synechococcus sp. CC9311
DTR1061059708668r_1_902_1 DTS Contig 3 (3)
0.3
T4-like cyanomyoviruses
Cyanosiphoviruses&
cyanopodoviruses
Cyanobacteria
CyanobacteriaClass I
Class II
Cyanosiphoviruses&
cyanopodoviruses
T4-like cyanomyoviruses
Fig. S2. Unrooted maximum likelihood tree with 100 bootstrap replicates of class I alpha and class II RNR reference and putative metagenomic Cyano se-quences. Numbers in parentheses indicate the number of reads assembled in each contig. Scale bar represents amino acid substitutions per site. Bacteria areshown in purple, myoviruses in red, siphoviruses in blue, podoviruses in green, and metagenomic sequences in black. Black, gray, and white circles representbootstrap support ≥100%, 75%, and 50%, respectively.
Sakowski et al. www.pnas.org/cgi/content/short/1401322111 4 of 10
www.pnas.org/cgi/content/short/1401322111
-
Thermus phage phiYS40 Thermus phage TMA
Aeromonas salmonicida subsp. salmonicida A449 (15)Thioalkalivibrio sp. K90mix
Roseovarius sp. 217 phage 1
Magnetospirillum magneticum AMB-1 (3) Roseophage DSS3P2 Roseophage EE36P1
Phenylobacterium zucineum HLK1 (3) Roseobacter sp. CCS2 (14)
Clostridium difficile QCD-32g58 (4)
Thermobaculum terrenum ATCC BAA-798
CFD Contig 11 (3) DTF Contig 26 (2)
DTF1061043181921_1782_938_2 Salinibacter ruber DSM 13855
Rhodospirillum rubrum ATCC 11170 Burkholderia multivorans ATCC 17616 (3)
Bacillus coagulans 36D1 (8)
Phage phiJL001
Zunongwangia profunda SM-A87 (10)
Oxalobacter formigenes HOxBLS
Paenibacillus sp. JDR-2 (2) Bacillus tusciae DSM 2912 (2)
Roseobacter phage RDJL Phi 1
GMF1061042906920f_834_1_1 GMF1061042906719f_1_672_1
GMF Contig 40 (2) GMF1061042963326f_1_823_1
Burkholderia pseudomallei DM98 (7)
Pseudomonas phage YuA Pseudomonas phage M6
Candidatus Puniceispirillum marinum IMCC1322 GMF1061042661109r_1_940_1
GMF1061042967907r_717_1_1
Ralstonia eutropha H16 (7) Methylococcus capsulatus str. Bath
GMF1061042934442r_760_1_1 GMF1061042970573r_630_1_1
GMF1061043206965f_804_1_1 GMF Contig 10 (5)
0.4
α-proteobacteria
Clade I Clade II
Clade III
Bacteroidetes
γ-proteobacteria
PhiJL-likeclade
α-proteobacteria
β-proteobacteriaFirmicutes
Class CI lass II
Clade III
Clade II
Clade I
PhiJL-like
Class I
Class IIAeromonas phage 44RR2.8t Aeromonas phage 31
Aeromonas phage phiAS4 Aeromonas salmonicida bacteriophage 25
Blattabacterium sp. (Blattella germanica) str. Bge (7) Vibrio phage pVp-1
CFC Contig 5 (2)
Thermus aquaticus Y51MC23 (7) Cyanophage S-TIM5
GMF1061042927759r_1_807_1
GMF Contig 56 (2)
Deftia phage phiW-14
GMF Contig 4 (7) Halophage AAJ-2005
Lymphocystis disease virus-isolate China Lymphocystis disease virus 1
DTF1061042915561r_791_1_1 GMF1061043206175r_1_828_1
GMF Contig 42 (2)
CFA1061053829314f_1_770_1 Francisella novicida FTG (6)
Grouper iridovirus Singapore grouper iridovirus
GMF1061043245027r_933_1_1
Rana tigrina ranavirus
Frog virus 3 Soft-shelled turtle iridovirus
GMF1061042968114f_929_1_1
GMF1061042925962f_784_1_1
GMF1061042971560f_1_840_1 CFA1061053812415f_675_1_1
Regina ranavirus Ambystoma tigrinum stebbensi virus
GMF Contig 30 (3) GMF1061042926741f_1_727_1
Pelagibacter phage HTVC019P GMF1061042949966r_830_1_1
CBS1061057326800r_735_1_1
GMF1061042926426f_648_1_1 GMF1061042943969r_1_765_1
Candidatus_Pelagibacter_ubique_HTCC1002 (27)
Caulobacter phage CcrColossus
Pseudomonas phage KPP10
Caulobacter phage CcrRogue
Pseudomonas phage PAK P3 Pseudomonas phage P3 CHA
GMF Contig 2 (8) CBJ1098101801525_1_814_1
DTF1061043183606r_811_1_1 GMF1061042924671r_733_1_1
GMF Contig 22 (3) GMF1061042968032r_1_866_1
Invertebrate iridescent virus 6
Caulobacter phage phiCbK
GMF Contig 25 (3) GMF1061043214682f_1_747_1
CFB Contig 1 (13) CBJ1098101648625_866_21_1
DTF1061043129216r_923_1_1
Invertebrate iridescent virus 3 Aedes taeniorhynchus iridescent virus
Caulobacter phage CcrKarma
GMF1061042943894f_1_624_1 GMF1061042957534r_1_794_1
DTF Contig 3 (4)
Caulobacter phage CcrSwift Caulobacter phage CcrMagneto
GMF Contig 9 (6)
GMF Contig 24 (3) DTF1061042874584r_837_1_1
DTF1061043202875r_1_816_1
GMF Contig 49 (2)
GMF1061039282610r_1_967_1 GMF1061042969816r_1_904_1
GMF1061042925690f_1_710_1 CFC Contig 1 (6) CFA Contig 12 (2) CFD Contig 3 (5)
0.3
Fig. S3. Unrooted maximum likelihood tree with 100 bootstrap replicates of class I alpha (Left) and class II (Right) RNR reference and putative metagenomicOther sequences. Numbers in parentheses following metagenomic contigs indicate the number of reads assembled in each contig. Bacterial references fromthe large tree (Inset) were clustered at 80% identity. Representative sequences were placed on each tree. Numbers in parentheses following bacterial ref-erences indicate the number of reference sequences within that cluster. Scale bar represents amino acid substitutions per site. Bacteria are shown in purple,eukaryotic viruses in orange, myoviruses in red, siphoviruses in blue, podoviruses in green, and metagenomic sequences in black. Black, gray, and white circlesrepresent bootstrap support ≥100%, 75%, and 50%, respectively.
Sakowski et al. www.pnas.org/cgi/content/short/1401322111 5 of 10
www.pnas.org/cgi/content/short/1401322111
-
Fig. S4. Distribution and dynamics of cyanophage populations. ORFs with a top BLASTx hit to a cyanophage were translated, aligned, and clustered at 98%identity. (A) Rank abundance of cyanophage-like RNR clusters in the Chesapeake Bay, Gulf of Maine, and Dry Tortugas. The morphology of the referencephage with the closest homology to sequences in each cluster is identified. Myoviruses are shown in red, siphoviruses in blue, and podoviruses in green. (B)Comparison of RNR distribution frequency in extracted region for peptide cluster analysis and those predicted by normalization of RNR sequences. Gulf ofMaine (GM), Chesapeake Bay (CB), and Dry Tortugas (DT). (C) Dynamics of phage populations by cluster in the Chesapeake Bay time series. Myoviruses areshown in red, siphoviruses in blue, and podoviruses in green.
Sakowski et al. www.pnas.org/cgi/content/short/1401322111 6 of 10
www.pnas.org/cgi/content/short/1401322111
-
Primase/Helicase DNA Polymerase A
Hypo
thet
ical
Prot
ein
HypotheticalProtein
Methyltransferase
DUF3
310
Exonuclease
Endo
nucle
ase
I
Hypo
thet
ical
Prot
ein
Ribonucleotide Reductase
Primase/Helicase
DNA Polymerase AExonuclease Domain DNA Polymerase A
Hypo
thet
ical
Prot
ein
ThymidylateSynthase
Hypo
thet
ical
Prot
ein
DUF3
310
Exonuclease
Hypo
thet
ical
Prot
ein
Hypo
thet
ical
Prot
ein
Nucle
otid
e
Pyro
phos
phoh
ydro
lase
Ribonucleotide Reductase
DNA Primase DNA Helicase
Hypo
thet
ical
Prot
ein
Hypo
thet
ical
Prot
ein
ThymidylateSynthase
Hypo
thet
ical
Prot
ein
Hypo
thet
ical
Prot
ein
DNA Polymerase A Hypothetical Protein Exonuclease
Hypo
thet
ical
Prot
ein
Endo
nucle
ase
I
Hypo
thet
ical
Prot
ein
DUF3
310
Hypo
thet
ical
Prot
ein
Ribonucleotide Reductase
500bp
Myoviridae
Podoviridae
Eukaryotic virus
Non-viral/ No hit
Contig 5585
Top BLAST Hit
ORF1 ORF2 ORF3 ORF4 ORF5 ORF6 ORF7 ORF8 ORF9 ORF10
ORF # Top BLAST hit Viral Sequence with Greatest Homolgy123456789
10
gamma proteobactgerium SCGC AAA160-D02 (1e-165) Vibrio phage CHOED (7e-142)gamma proteobactgerium SCGC AAA160-D02 (2e-157) Podovirus GOM (7e-133)
Puniceispirillum phage HMO-2011 (8e-80) Puniceispirillum phage HMO-2011 (8e-80)gamma proteobactgerium SCGC AAA160-D02 (2e-06) Vibrio phage CHOED (1e-03)
Rhizobium leguminosarum (43-31) Tetraselmis viridis virus S20 (1e-18)Puniceispirillum phage HMO-2011 (6e-23) Puniceispirillum phage HMO-2011 (6e-23)
Celeribacter phage P12053L (1e-73) Celeribacter phage P12053L (1e-73)Puniceispirillum phage HMO-2011 (9e-32) Puniceispirillum phage HMO-2011 (9e-32)
Rickettsia felis URRWXCal2 (2e-04) N/APuniceispirillum phage HMO-2011 (0) Puniceispirillum phage HMO-2011 (0)
ORF # Top BLAST hit Viral Sequence with Greatest Homolgy123456789
10111213
Enterobacteria phage phi92 (2e-06) Enterobacteria phage phi92 (2e-06)gamma proteobactgerium SCGC AAA160-D02 (1e-27) Roseobacter phage SIO1 (6e-26)
Roseobacter phage SIO1 (7e-77) Roseobacter phage SIO1 (7e-77)Dialister micraerophilus (9e-2) N/A
Roseobacter phage SIO1 (2e-101) Roseobacter phage SIO1 (2e-101)N/A N/A
Vibrio cholerae (1e-19) Yersinia phage phiA1122 (9e-18)Celeribacter phage P12053L (3e-50) Celeribacter phage P12053L (3e-50)
Synechococcus phage S-CRM01 (1e-29) Synechococcus phage S-CRM01 (1e-29)Odoribacter laneus (6e-2) N/A
N/A N/ACeleribacter phage P12053L (8e-29) Celeribacter phage P12053L (8e-29)Puniceispirillum phage HMO-2011 (0) Puniceispirillum phage HMO-2011 (0)
ORF1 ORF2 ORF3 ORF4 ORF5 ORF6 ORF7 ORF8 ORF9 ORF10 ORF11 ORF12 ORF13
ORF1 ORF2 ORF3 ORF4 ORF5 ORF6 ORF7 ORF8 ORF9 ORF10 ORF11 ORF12 ORF13 ORF14 ORF15
ORF # Top BLAST hit Viral Sequence with Greatest Homolgy123456789
101112131415
Puniceispirillum phage HMO-2011 (4e-103) Puniceispirillum phage HMO-2011 (4e-103)Puniceispirillum phage HMO-2011 (0) Puniceispirillum phage HMO-2011 (0)
Polymorphum gilvum SL003B-26A1 (2e-09) N/ARoseobacter phage SIO1 (6e-97) Roseobacter phage SIO1 (6e-97)
Celeribacter phage P12053L (1e-09) Celeribacter phage P12053L (1e-09)N/A N/A
Puniceispirillum phage HMO-2011 (0) Puniceispirillum phage HMO-2011 (0)Puniceispirillum phage HMO-2011 (1e-115) Puniceispirillum phage HMO-2011 (1e-115)Puniceispirillum phage HMO-2011 (1e-134) Puniceispirillum phage HMO-2011 (1e-134)
N/A N/APuniceispirillum phage HMO-2011 (3e-56) Puniceispirillum phage HMO-2011 (3e-56)Puniceispirillum phage HMO-2011 (1e-08) Puniceispirillum phage HMO-2011 (1e-08)
Olsenella profusa (1e-15) Salicola phage CGphi29 (2e-15)N/A N/A
Paramecium bursaria Chlorella Virus OR0704.2.2 (7e-95)Paramecium bursaria Chlorella Virus OR0704.2.2 (7e-95) Paramecium bursaria Chlorella Virus OR0704.2.2 (7e-95)
DNA Primase DNA Polymerase A Exonuclease
Hypo
thet
ical
Prot
ein
Glut
ared
oxin
Endo
nucle
ase
IHy
poth
etica
lPr
otei
nHy
poth
etica
lPr
otei
n
Ribonucleotide Reductase Class I alpha
Ribo
nucle
otid
e Re
duct
ase
Clas
s I b
eta
ORF1ORF2ORF3ORF4ORF5ORF6ORF7ORF8ORF9ORF10
500bp
Contig 12643
ORF # Annotation Homologous SequencesHomologous SequencesHomologous Sequences Top BLAST hit (E value)
12345678910
HTVC011P HTVC019P SIO1RNR beta Novosphingobium sp. PP1Y (3e-14)RNR alpha Pelagibacter phage HTVC019P (0)
Hypothetical protein Clostridium leptum (4e-16)Hypothetical protein NA
Endonuclease I Celeribacter phage P12053L (1e-27)Glutaredoxin alpha proteobacterium HIMB59 (1e-20)
Hypothetical protein NAExonuclease Celeribacter phage P12053L (1e-77)
DNA Polymerase A Roseobacter phage SIO1 (2e-125)DNA Primase Azorhizobium caulinodans ORS 571 (3e-151)
Class I ‘Other’
Class II ‘RTPR’
Contig 8066
500bp
Contig 12399
500bp
Fig. S5. Predicted ORFs on Rhode River contigs 12643, 5585, 8066, and 12399. These contigs contained class I Other, clade II (contig 12643) and RTPR (contigs5585, 8066, and 12399) RNR sequences. The contigs were assembled from ∼50 million Illumina 2 × 150 bp reads using MetaVelvet. ORFs were predicted usingMetageneAnnotator. Annotations were assigned by consensus BLASTx results. ORFs without hits less than 1e-3 or lacking hits to definitive genes were an-notated as hypothetical protein. ORFs with homology to phage sequences in the Caudovirales were colored by the viral family of the top BLASTx repre-sentatives. Scale bar represents 500 nucleotides.
Sakowski et al. www.pnas.org/cgi/content/short/1401322111 7 of 10
www.pnas.org/cgi/content/short/1401322111
-
100
33
24
31
33
60
58
56
65
90
57
78
97
100
88
88
58
98
58
100
100
89
61
82
100
100
100
95
100
DTF Contig 5 (3); CB (1); DT (25); GM (2)
Chlamydomonas reinhardtii
Volvox carteri f. nagariensis
Rhode River Contig 5585
GMF Contig 14 (3); DT (2); GM (9)
Acanthocystis turfacea Chlorella virus 1 (7)
Rhode River Contig 8066
CFA Contig 1 (6); CB (21); DT (12); GM (13)
DTF Contig 15 (2)
Nitrosococcus halophilus Nc4
Roseiflexus castenholzii DSM 13941 (4)
Bacteriophage RM378
CBJ1098127015314915_1_1
GMF1061042970635f_1_839_1
Diplosphaera colitermitum TAV2 (2)
Salpingoeca sp. ATCC 50818
Monosiga brevicollis MX1
GMF1061042661264f_1_935_1
GMF Contig 3 (8); DT (6); GM (21)
Roseobacter phage SIO1
CFA1061053822165r_5_739_1
Candidatus Protochlamydia amoebophila UWE25
Thermus phage P23-45
Thermus phage P74-26
Alkaliphilus metalliredigens QYMF (2)
GMF1061042942020f_103_880_1; CB (1); DT (1); GM (2)
CFA1061053817669f_1_691_1
Mycobacterium phage Tiger (22)
Mycobacterium phage Gladiator (16)
Rhodococcus phage RGL3 (2)
Ectocarpus siliculosus (9)
DTF1061043203475r_1_800_1; DT (1); GM (9)
0.4
Clade I
Clade II
Clade III
Clade IV
44 38
100 97
95 97
70 40
50 100
100 49
87
60
99
96 32
70
98
75
18
100 84
100
67 31
84 27
100
36
66
33
28 70
100 100
97
42
29 100
46
6
95 99
7
23
98
23
38
73 55
67
100
100
86
99
95
100
53 100
61
56
62 43
57
51
100
80
99
45
20
71
100
100
98
100
Rhode River Contig 12643
CFA1061053812415f_675_1_1 GMF1061042971560f_1_840_1
GMF1061043214682f_1_747_1 GMF Contig 25 (3)
GMF Contig 2 (8)
CFD Contig 3 (5) CFA Contig 12 (2) CFC Contig 1 (6) GMF1061042925690f_1_710_1 GMF Contig 49 (2)
GMF Contig 9 (6) DTF Contig 3 (4)
GMF1061042957534r_1_794_1 GMF1061042943894f_1_624_1
CBJ1098101801525_1_814_1
GMF1061042968032r_1_866_1 GMF Contig 22 (3)
GMF1061042924671r_733_1_1 DTF1061043183606r_811_1_1
GMF1061042943969r_1_765_1 GMF1061042926426f_648_1_1
Candidatus Pelagibacter ubique HTCC1002 (27)
Aedes taeniorhynchus iridescent virus Invertebrate iridescent virus 3
Invertebrate iridescent virus 6
CBJ1098101648625_866_21_1 CFB Contig 1 (13)
GMF1061042969816r_1_904_1 GMF1061039282610r_1_967_1DTF1061043202875r_1_816_1
DTF1061042874584r_837_1_1 GMF Contig 24 (3)
DTF1061043129216r_923_1_1
Caulobacter phage CcrMagneto Caulobacter phage CcrSwift Caulobacter phage CcrKarma
Caulobacter phage phiCbK Caulobacter phage CcrRogue
Caulobacter phage CcrColossus
Pseudomonas phage P3 CHAPseudomonas phage PAK P3Pseudomonas phage KPP10
CBS1061057326800r_735_1_1
GMF1061042949966r_830_1_1 Pelagibacter phage HTVC019P
GMF1061042925962f_784_1_1
Francisella novicida FTG (6) CFA1061053829314f_1_770_1
Deftia phage phiW-14
Rana tigrina ranavirus
Ambystoma tigrinum stebbensi virus Regina ranavirus
Soft-shelled turtle iridovirus Frog virus 3
Singapore grouper iridovirus Grouper iridovirus
Lymphocystis disease virus 1 Lymphocystis disease virus-isolate China
GMF Contig 4 (7) Halophage AAJ-2005
Thermus aquaticus Y51MC23 (7) Cyanophage S-TIM5
GMF1061042927759r_1_807_1
GMF1061043245027r_933_1_1
GMF1061042926741f_1_727_1 GMF Contig 30 (3) GMF1061042968114f_929_1_1
GMF Contig 42 (2)
GMF1061043206175r_1_828_1 DTF1061042915561r_791_1_1
GMF Contig 56 (2) Blattabacterium sp. (Blattella germanica) str. Bge (7)
CFC Contig 5 (2) Vibrio phage pVp-1
Aeromonas salmonicida bacteriophage 25 Aeromonas phage phiAS4
Aeromonas phage 31 Aeromonas phage 44RR2.8t
Clade I
Clade II
Clade III
0.2
A. B.Class II ‘RTPR’ Class I ‘Other’
Fig. S6. Unrooted maximum likelihood trees with 100 bootstrap replicates of Rhode River RNRs from contigs >5 kb. (A) RTPR RNRs from contigs 5585 and 8066(bold) with class II RNR reference and putative metagenomic RTPR sequences. Metagenomic sequences were clustered at 80% identity. Representative met-agenomic sequences were placed on the tree, with the number of reads from each environment within that cluster listed. Bacterial references were clustered at80% identity. Representative sequences were placed on each tree. (B) Rhode River contig 12643 (bold) on the class I alpha Other tree. Numbers in parenthesesfollowing metagenomic contigs indicate the number of reads assembled in each contig. Bacterial references were clustered at 80% identity. Representativesequences were placed on the tree. Numbers in parentheses following bacterial references indicate the number of reference sequences within that cluster. Scalebar represents amino acid substitutions per site. Bacteria are shown in purple, eukaryotic viruses in orange, myoviruses in red, siphoviruses in blue, podoviruses ingreen, and metagenomic sequences in black. Integer values are bootstrap support values. CB, Chesapeake Bay; DT, Dry Tortugas; GM, Gulf of Maine.
Sakowski et al. www.pnas.org/cgi/content/short/1401322111 8 of 10
www.pnas.org/cgi/content/short/1401322111
-
CB (dsDNA) CB (ssDNA) DT (dsDNA) DT (ssDNA)
18% 48% 16% 25%
37% 84% 31% 73%
38% 94% 42% 93%
Do
mai
nd
sDN
A v
iru
ses
Po
do
viri
dae
2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000 30000 32000 34000 36000 38000 40000 42000 44000
2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000 30000 32000 34000 36000 38000 40000 42000 44000
2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000 30000 32000 34000 36000 38000 40000
2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000 30000 32000 34000 36000 38000 40000 42000 44000
24 33
50 270
52 183
2547
26225
CB (dsDNA)CB (ssDNA)DT (dsDNA)DT (ssDNA)
RNR
DN
A P
ol
DN
A p
rimas
e/ h
elic
ase
Endo
nucl
ease
RNA
pol
ymer
ase
A.
B.
Fig. S7. Taxonomic distribution and alignment of ssDNA virome sequences. (A) Taxonomic distribution of translated ORFs in Chesapeake Bay and Dry Tor-tugas dsDNA and ssDNA virome libraries. Chesapeake Bay libraries CFA–CFD were combined for taxonomic composition analysis. Podoviral sequences witha top BLASTp hit to known cyanopodoviral sequences were categorized as cyanophage-like. (B) Recruitment of dsDNA and ssDNA virome library reads tocyanopodoviral genomes. Reads from Chesapeake Bay dsDNA libraries CFA–CFD were combined before mapping. Reads were mapped to each genome in-dependently. Maximal coverage values of reads from dsDNA libraries and ssDNA libraries against each genome are listed on the left and right sides of eachplot, respectively. Genomes were aligned with Mauve. Colors on horizontal axes are aligned regions.
Sakowski et al. www.pnas.org/cgi/content/short/1401322111 9 of 10
www.pnas.org/cgi/content/short/1401322111
-
Table S1. Frequency of sampled genomes and RNR alpha subunits in cyanophage and pelagiphage populations
Group No. of predicted RNR genes No. of predicted genomes Predicted RNR frequency, %
Cyano I 22 16 138Cyano II 63 85 74Pelagiphage
HTVC008M 4 6HTVC010P — 44HTVC011P — 15HTVC019P 36 11
Pelagiphage total 40 76 53
Table S2. Distribution frequencies of RNR alpha subunit sequences among designated groups
Library
Groups CBB CBJ CBR CBS CFA CFB CFC CFD CIA DTF DTR DTS GMF
Nucleic acid type dsDNA dsDNA RNA ssDNA dsDNA dsDNA dsDNA dsDNA dsDNA dsDNA RNA ssDNA dsDNACyanoClass I 25% (5) 16% (4) — — 4% (1) 5% (1) 3% (