Bacteriophage DNA Polymerases from Thermal Aquifers 1000 nm David Mead1, Vinay Dhodda1, Robert...

1
K 1000 nm David Mead 1 , Vinay Dhodda 1 , Robert DiFrancesco 1 , Melodee Patterson 1 , Ronald Godiska 1 , Mya Breitbart 2 , Forest Rohwer 2 , Mark Young 3 , Paul Richardson 4 , Thomas Schoenfeld 1 1 Lucigen Corporation, Middleton, WI; 2 San Diego State University, San Diego, CA; 3 Montana State University, Bozeman, MT; 4 Joint Genome Institute, Walnut Creek, CA Bacteriophage DNA Polymerases from Thermal Aquifers Waterborne viruses (phages) in thermal springs have been largely unexamined, despite their potential importance in the biosphere. Phages promote microbial diversity by predation of the most abundant microbes and by transfer of genes through transduction and lysogeny. Sequence analysis (>37,000 reads) of thermal aquifer phage libraries provides insight into viral complexity, lifestyles, molecular diversity, and access to coding sequences for useful proteins. As expected, a large fraction of identifiable sequences encode phage specific proteins such as nucleic acid metabolizing enzymes, phage structural proteins, lytic enzymes, and mobile elements such as plasmids and transposons. Numerous similarities to enzymes associated with temperate phage, such as integrases, suggest that lysogeny is common in thermal environments. Unexpectedly, a Thermosynechococcus-like photosynthesis regulatory gene was found next to an integrase gene. Over 200 DNA polymerase genes have been identified with approximately 58 being full length. We have expressed 10 active enzymes, one of which allows isothermal amplification at elevated temperatures with greater specificity than conventional polymerases. This subset of thermostable phage DNA polymerases appears much more diverse than known microbial or phage enzymes. The fact that no pol genes were re-isolated suggests the level of diversity seen so far is the tip of a very large iceberg. Phage in the Environment Viruses are now recognized as the most abundant forms of life on earth. An estimated 10 31 viruses exist in the oceans alone, probably out numbering their hosts by ten fold (1). Aquatic viruses range in concentration from 10 4 to 10 7 /ml in the water column (2), and in excess of 10 8 /cm 3 in the sediment (3). Viruses play an important role in global ecology, modulating the abundance and diversity of microbial populations through their lytic and lysogenic activity (4, 5) and play a critical role in the global nutrient and energy cycle (6). Phages are also believed to be a major driving force in cellular evolution (7), serving as a primary vector for horizontal genetic exchange and promoting diversity by their predation and lysogeny of the most abundant microbes. They also alter the phenotypes of bacteria they lysogenize. Phage at High Temperatures While the oceans contain ~1.3 X 10 21 liters, ground water aquifers may contain 10 19 liters of pore space (8). Marine and fresh water ecosystems are characterized by moderate to cold temperatures, whereas high temperature aquifers predominate at pressure and temperature gradients deep in the earth and in geothermal sites such as Yellowstone National Park and hydrothermal vents on the ocean floor. Temperatures as high as 230°C and pressures of 300 psi have been measured at Yellowstone National Park within 200 feet of the surface (9). Life in the deep/hot biosphere is purely prokaryotic, improving the chances of more completely describing selected biomes. Furthermore, the geochemical energy in hydrothermal environments allows unique chemolithoautotrophic metabolic pathways. Most work on hot spring extremophiles has concentrated on sedentary surface microbes that grow at less than 100°C; this study looks at planktonic viruses and microbes in the emergent water column, which, presumably, originates at much higher temperatures. Viruses are the only known predators in thermal aquifers and can have a significant effect on hot spring microbial food webs. Breitbart (10) estimates that thermal aquifers may contain 3.7 X 10 29 prokaryotic cells and 3.7 X 10 30 viruses, which is roughly equivalent to the total estimated viruses in the ocean 8 . Viruses may be responsible for as much as 3.6 Gtons of carbon turnover per year, which is comparable to the impact of phages on the ocean. Viruses, particularly those of the thermal environments, may be important contributors to global molecular diversity (11). We have used a metagenomics approach to studying the viruses in terrestrial thermal pools. Sampling of Four Thermal Sites in California and Yellowstone. Numerous sites in Long Valley California and Yellowstone National Park have been sampled. All of the work described focuses on four sites; one in Long Valley and three in Yellowstone. Viral particles were isolated and concentrated from several hundred liters of hot spring water by tangential flow filtration. ABSTRACT Viral Library Construction The construction of representative phage community DNA libraries is complicated by low yields of viral DNA, and the need to remove contaminating cellular and free DNA. A series of differential filtration and centrifugation steps was used to isolate and concentrate the phage until epifluorescence microscopy indicated an absence of contaminating microbial cells. Nuclease treatment was used to remove free DNA. We do not use density centrifugation in cesium chloride due to the unacceptable loss of already limited amounts of viral material, the incomplete separation of microorganisms and the potential bias introduced with this technique. Improved Vectors for Library Construction Representative libraries of phage DNA are particularly problematic to construct, due in part to toxic coding sequences in their genomes. The CloneSmart ® vectors were developed to improve the number of recombinants and reduce cloning bias in libraries. These vectors eliminate transcription and translation of recombinant inserts and have proven effective in cloning numerous otherwise toxic phage genes (personal communication Ry Young, Texas A&M). Viral Morphotypes Comparison of viral metagenomic libraries to the GenBank non-redundant database. Panel A) BLASTx results were examined to detect homology to proteins derived from phage, virus or other organisms (mostly bacteria and archaea). BLASTx results were categorized by source of the strongest hit. Panel B) Phage-like genes were categorized in functional groups using keywords shown below. Little Hot Creek (top, left), Bath (top, right), BearPaw (bottom, left) & Octopus (bottom, right) Hot Spring. Bath Hot Spring shows significant geyser activity due to superheated water emanating from the underground aquifer. Imaging of Hot Spring Phage. Panel A. Phage particles were captured on an AnoDisk filter (Millipore) and stained with SYBR Gold (Molecular Probes). The particles were imaged using a Bio-Rad 1024 laser scanning confocal microscope (U. Wisc. Keck Imaging Center). The numerous small spots are phage particles. An individual cell can be seen in upper center. Panels B to K are transmission electron micrographs of phages. Panel B shows a phage cultured from YNP. Also shown are phages directly isolated from Moundview (C), Azure (D), Bath (E and I), Octopus (F), Cavern (G), Paint Pots (J), and Azure (K) Hot Springs, all of YNP. (Electron micrographs are courtesy of Sue Brumfield, Montana State University.) DNA replication/repair 19% recombination 10% transcription/ translation 17% metabolism/ modification 32% lytic enzyme 11% structural protein 9% mobile element 2% DNA replication/repair 17% recombination 10% transcription/ translation 18% metabolism/ modification 29% lytic enzyme 17% structural protein 2% DNA replication/repair 19% recombination 10% transcription/ translation 19% metabolism/ modification 25% lytic enzyme 8% mobile element 18% mobile element 7% structural protein 1% Little Hot Creek BearPaw Octopus DNA replication/repair recombination transcription/translation Panel B metabolism/modification mobile element Shown are sequence reads in which the strongest similarity was to proteins that match functional group keywords shown in Table 2. Functional Group Key Word DNA replication/repair DNA repair, DNA polymerase, replication, primase, helicase, reverse transcriptase, repA, DNA mismatch repair Recombination integrase, resolvase, intron, terminase, DNA/RNA ligase, recN Transcription/Translation RNA polymerase, transcription, tRNA synthetase Mobile elements vector, transposase, pathogen, virulence/virulent, toxin Nucleic acid metabolism/modification nuclease, nucleotidyl transferase, DNA glycosylase, DNA methylase, reductase, nucleotidase, restriction, RNase, DNase Lytic enzyme sidase, protease, peptidase, proteinase, trypsin, chitin, lysozome, lysin, lytic Structural protein tail, tape measure, head Functional Groupings of Keywords into Categories NanoClone Library Construction Methods The low amounts of starting material necessitate use of a unique genome amplification technology. NanoClone Library Construction (see below) uses DNA sheared to the appropriate molecular weight and end repaired. Linkers are ligated to the ends of the fragments. These linkers serve as priming sites for amplification (see below). The amplification products are cloned into pSMART vectors. M 50 5 500 50 5 0 gel M ng ng pg pg pg Anonymous genome amplification technology applied to phage lambda using known amounts of starting material. Amplified DNA was excised from the agarose gel and cloned into pSMART HCKan vector for sequencing. The primary band in the 5 pg lane produced authentic lambda DNA sequences. Rox NNNNNNNNNNNN NNNNNNNNNNNNNNNN Blank Taq 4110 653 Unextended Extended A tailed 3063 3173 488 2323 25 30 35 40 45 50 3’ exo 25 30 35 40 45 50 25 30 35 40 45 50 25 30 35 40 45 50 25 30 35 40 45 50 25 30 35 40 45 50 25 30 35 40 45 50 25 30 35 40 45 50 Thermostable phage DNAP activity assay. ROX-labeled primer/ template mix (top) is extended. Unextended product runs at 37 nt. Extension products runs at 41 nt. A-tailing products run at 42 nt. 3’ exo products run at less than 37 nt. Minor peaks at 25, 30, 35, 40, 45, and 50 nt are TAMRA-labeled standards. PyroPhage DNAPs and a Taq control are indicated. 100 nm I 200 nm J 200 nm G 200 nm D A 200 nm E 200 nm F 200 nm H The pSMART vectors are “transcription-free”. They eliminate vector-driven transcription of insert DNA and terminate transcription that initiates from promoters within the insert. These vectors allow cloning of many types of difficult DNAs, providing unbiased library construction and recovery of toxic genes. The pSMART-HC vectors are high-copy (300-500 copies/cell, similar to pUC19). The pSMART-LC vectors contain the ROP gene to decrease the copy number (15-20 copies/cell), providing higher insert stability. In typical blunt cloning experiments, the background of non-recombinant vector is < 0.1%. Terminator Terminator Terminator Blunt Cloning Site pSMART R O P O r i A m p Terminator Terminator Terminator Blunt Cloning Site pSMART R O P O r i K a n pSMART -HCKan, -LCKan pSMART -HCAmp, -LCAmp Phage-like genes A high proportion of sequence similarities are to genes normally found in viruses and phage, confirming the viral origin of the DNA used for library construction. Frequency of phage-like genes Molecular Diversity of PyroPhage pol genes PyroPhage polygenes were aligned to known eukaryotic, prokaryotic and viral polymerases using CLUSTAL W. The PyroPhage DNA enzymes are distinct from available thermal stable DNA polymerases. Viral Metagenomic Libraries from Hot Springs Library Hot Spring Temp ºC Phage/ml in Spring (a) Vol. Sampled (liters) Theoretical Yield (ng) (b) Actual Yield (ng) % Yield of DNA (c) Vector Average Insert Size L1.1 Little Hot Creek, CA 84 not tested 450 --- 80 --- pcrSM 3-6 kb Y4.9 Octopus, YNP 80 2.5X10 4 1058 571 10 1.7% pcrSM 2-3 kb Y4.16 Bearpaw, YNP 74 5.9 X 10 4 450 573 59 10% pcrSM 2-3 kb Y2.1 Bath, YNP 93 3.7 X10 4 360 288 90 31% pUC 3-6 kb Number Sequence Reads No Homology No Match to Keywords Virus or Phage Homology DNA pol Gene Homology Complete pol Genes Expressed pol Genes Purified L1.1 7479 2363 3775 262 14 2 1 1 Y4.9 21,797 12,705 6548 2036 148 34 7 2 Y4.16 7545 2510 3619 333 57 20 2 1 Y2.1 765 200 410 19 3 2 totals 37,586 17,778 14,352 2,650 222 58 10 4 Summary of thermostable phage libraries screened in this project. a) Phage concentrations are based on direct counts by epifluorescence, b) Theoretical yields are total amount of phage DNA in samples assuming an average of 54 attograms of DNA/phage particle and use of only a portion of the sample (40%) for each DNA preparation (phage/ml X vol. sampled (l) X 1000 ml/l X 54 attogram DNA/phage X 40%), c) Percent yield is actual yield divided by theoretical yield. Sequencing of the Libraries, Assembly of Contigs and Functional Genomics More than 37,000 reads (~37 Mb) of viral DNA were determined by DOE Joint Genome Institute. Individual reads were trimmed and vector sequences removed by SeqManII (DNASTAR). Trimmed sequences were assembled using SeqManII (DNASTAR) at a minimum match of 95% and match size of 20 nucleotides. Assembly of the reads • Little Hot Creek 3.6 Mbp in 3014 contigs • Bearpaw 6.1 Mbp in 6,191 contigs • Octopus 14.4 Mbp in 13,543 contigs In Silico Analysis of the Libraries Translations of the trimmed reads were aligned to known sequences in the GenBank nr protein database using BLASTx. Sequences were considered similar if the Expect value was equal to or less than 0.001. CloneSmart ® Vectors: No Background, No Transcription Panel A 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Little Bear Octopus overall Hot Creek Paw nonviral viral no similarity Library LHC Bearpaw Octopus head 3 10 15 helicase 132 122 274 integrase 67 30 22 ligase 0 11 18 lysin 118 260 389 methylase 50 32 127 methyltransferase 55 100 262 nuclease 94 145 232 polymerase 147 235 548 Library LHC Bearpaw Octopus primase 14 19 21 reverse transcriptase 42 35 11 restriction 71 34 59 resolvase 24 15 87 ribonuclease 20 13 62 tail 17 38 289 topoisomerase 18 19 24 transposase 208 86 51 Phage Lifestyles Inferences into the lifestyles of phage in thermal aquifers can be based on apparent similarity to genes of known function. For example, the identification of 119 apparent integrase genes suggests that lysogeny is a common lifestyle in thermal aquifers. Likewise, apparent DNA polymerase genes suggest lytic phages, and reverse transcriptases suggest RNA phages. Thermal Stable DNA Polymerases from Viral Metagenomic Libraries For several reasons, thermal stable viral DNA polymerases are expected to be very attractive alternatives to regent enzymes derived from microbial enzymes. PyroPhage DNAPs PyroPhage DNAPs are expressed from the genes of viruses (phage) that inhabit boiling hot springs. PyroPhage DNAPs as Alternatives for DNA Amplification • Viral DNAPs are replication enzymes, in contrast to currently available DNA repair enzymes, and represent the first viable alternative to existing reagent enzymes. Based on known viral enzymes, PyroPhage DNPs are expected to have substantial advantages. Unprecedented Molecular Diversity: This enzyme family is the most diverse known, which suggests a range of activities that can be tailored to the needs of the application. Improvements can be extrapolated from the few well characterized viral DNAPs: - Excellent strand displacement and processivity will allow detection and analysis of “difficult” sequences. - Lower error rate results in more reliable data. - Absence of end-product inhibition improves quantitative PCR and increases yield. - Strand displacement allows isothermal amplification, which will increase throughput and reduce instrumentation costs. - Reduced stuttering will allow use of optimal markers for genetic tests. - Improved incorporation of nucleotide analogs will broaden the range of detection methods. - Accessory functions may expand functionality. Detection of PyroPhage Coding Sequences by Sequence Similarity The BLASTx search revealed 222 apparent pol genes. Of these, 58 appear to be complete genes. All of the latter were tested for expression using a DNA polymerase assay developed at Lucigen. Family A PyroPhage Polymerase Percent Identities PyroPhage 3063 PyroPhage 488 PyroPhage 3173 PyroPhage 967 Aae Taq Tth Tma T5_phage Bst Eco_polI T7_phage PyroPhage 3063 --- PyroPhage 488 31 --- PyroPhage 3173 28 45 --- PyroPhage 967 31 51 82 --- Aae 63 29 23 34 --- Taq 24 25 22 25 22 --- Tth 26 24 18 26 25 85 --- Tma 30 22 22 28 28 42 42 --- T5_phage 15 19 17 19 23 18 18 21 --- Bst 24 23 20 28 24 40 41 40 14 --- Eco_polI 30 22 19 26 28 38 38 43 21 40 --- T7_phage 16 13 15 17 18 17 17 16 12 15 13 --- Family B PyroPhage Polymerase Percent Identities PyroPhage 4110 PyroPhage 2323 PyroPhage 2783 Tli Pfu Pae RM378_phage HHV CMV vaccinia_virus baculovirus Chlorellavirus T4_phage Eco_polIIIa Human_polalpha Phi29Phage PyroPhage 4110 --- PyroPhage 2323 83 --- PyroPhage 2783 91 86 --- Tli 23 21 22 --- Pfu 23 21 22 74 --- Pae 27 26 27 35 35 --- RM378_phage 14 14 15 13 15 13 --- HHV 18 17 17 19 16 21 13 --- CMV 19 15 17 18 17 21 15 28 --- vaccinia_virus 19 19 18 16 15 17 14 13 16 --- baculovirus 12 11 14 16 15 10 10 14 14 13 --- Chlorellavirus 20 20 20 23 20 20 14 19 24 17 17 --- T4_phage 15 16 17 16 17 12 13 12 12 10 9 11 --- Eco_polIIIa 12 12 12 12 14 12 9 12 8 10 7 8 9 --- Human_polalpha 17 18 15 21 23 21 18 15 14 13 14 18 13 10 --- Phi29Phage 9 9 11 10 8 5 10 9 12 10 13 11 7 4 9 --- Isothermal Amplification using PyroPhage DNA polymerases Five units of 3173 polymerase was used to amplify one nanogram each of ssM13mp18 and pUC19 plasmid DNA. Random decamer primers were added to 0.5 μM or 5 μM, as indicated. Reactions were incubated at 95°C prior to addition of enzyme, then 16 hours at 55°C with enzyme. One fiftieth of each reaction was resolved on a 1% agarose gel. Results are shown in Panel A. To verify if the amplification was specific for the template DNA, one μl of the amplification product of the positive pUC19 reaction was tested in a PCR reaction using primers specific for the ampicillin resistance gene of the original plasmid template. As a negative control, a reaction containing all the components of the positive reaction, including input DNA, but without nucleotides to prevent amplification, was tested by PCR. As a positive control, the same sequence of was amplified directly from 1ng of pUC19. Additional amplifications using 2.5 units of 488 DNA polymerases and 15 units of 967 DNA polymerase are also shown. In both cases, the reaction conditions were essentially the same as described except that the template was double-stranded, linear lambda phage DNA, the reaction temperature was 50°C and the concentration of magnesium was varied from 1.5 to 20 mM. Family A 5’ Exo Domain Family A Active Site. Discrimination against modified nucleotides Family B 3’ Exo Domain Family B Active Site. Discrimination against modified nucleotides 1091 LSTSSGFPTGA Tma LSTSTGIPTNA Tth LTTSRGEPVQA (D) Taq LTTSRGEPVQA 3063 RQLAKAVNFGLIYG 2710 RQIGKSANFGLIYG 488 RQIAKSANFGLIYG 1795 RQIAKSANFGLIYG T7 Ph RDNAKTFIYGFLYG Aae RQLAKAINFGLIYG Tma RRAGKMVNFSIIYG Tth RRAAKTINFGVLYG (D) (Y) Taq RRAAKTINFGVLYG 3063 FLYIDTETVGD 4110 VAAFDIEVDAT 2323 VAAFDIEVDAT Pfu ILAFDIETLYH (A A) Vent LLAFDIETFYH 4110 QFAFKLILVSAYG 3001 QFAFKLILVSAYG 2323 QFAFKLILVSAYG Pfu QKAIKLLANSFYG (L) Vent QRAIKLLANSYYG Alignment of PyroPhage DNA polymerase domains with commercially relevant motifs. Selected PyroPhage DNAP sequences are aligned to mutations of domains of Taq and Vent that improve these enzymes as reagents. Also shown are Tth, Tma, and phage T7 domains. Conserved amino acids (compared to Taq and Vent) are shown in red. Amino acids that were substituted to create G46D (Taq 5’ exo domain), R660D/F667Y (Taq active site), D141A/E143A (Vent 3’ exo domain) and A488L (Vent active site) and are shown in blue. Variations different from both wild type and modified are shown in green. Other Interesting Genes Found in Contigs from the Libraries. A cultivated Pyrobaculum spherical virus has 83% identity to a 3 kb contig from Bear Paw hot springs metagenomic DNA. BLASTx Analysis of selected contigs (BearPaw) CONTIG 1494 3 kb Matches Pyrobaculum spherical virus • A non-lytic dsDNA virus of Pyrobaculum sp. D11 (PSV) • Isolated from Obsidian Pool YNP • Genome is 28 kb • ORFs have no gene matches to Public databases Conservation of gene order and identity between the BearPaw CONTIG 1494 and Pyrobacullum Sp. Virus ORF88A ORF137 ORF235 ORF239 ORF211 ORF107 89% ID 88% ID 87% ID 87% ID 79% ID 77% ID 500 1000 1500 2000 2500 500 1000 1500 2000 2500 BtpA Photosystem 1 Phage Integrase biogenesis protein References 1. Suttle CA. (2005) Viruses in the sea. Nature 437(7057):356-61. 2. Wommack KE, Colwell RR. 2000 Virioplankton: viruses in aquatic ecosystems. Microbiol Mol Biol Rev. 64(1):69-114. 3. Breitbart M, Felts B, Kelley S, Mahaffy JM, Nulton J, Salamon P, Rohwer F. (2004) Diversity and population structure of a near-shore marine-sediment viral community. Proc R Soc Lond B Biol Sci. 271(1539):565-74. 4. Chibani-Chennoufi S, Bruttin A, Dillmann ML, Brussow H (2004) Phage–host interaction: An ecological perspective. J Bacteriol 186: 3677–3686. 5. Paul, JH. Microbial gene transfer: an ecological perspective. J Mol Microbiol Biotechnol. 1999 1:45-50.). 6. Fuhrman, JA. Marine viruses and their biogeochemical and ecological effects. 1999. Nature 399, 541-548 7. Weinbauer MG, Rassoulzadegan F. (2004) Are viruses driving microbial diversification and diversity? Environ Microbiol. 6(1):1-11. 8. Gold T (1992) The deep, hot biosphere. Proc Natl Acad Sci U S A 89: 6045-9. 9. White, D. E., Fournier, R.O., Muffler, L.J.P, and Truesdale, A.H. (1975). Physical results of research drilling in thermal areas of Yellowstone National Park, Wyoming. Geological Survey Professional Paper No. 892: 1-89. 10. Breitbart M, Wegley L, Leeds S, Schoenfeld T, Rohwer F. (2004) Phage community dynamics in hot springs. Appl Environ Microbiol. 70(3):1633-40. 11. Villarreal LP, DeFilippis VR. (2000) A hypothesis for DNA viruses as the origin of eukaryotic replication proteins. J Virol. 74(15):7079-84. Support This material is based upon work supported by the National Science Foundation under Awards Number 0109756 and 0215988, National Human Genome Research Institute under Award Number 1 R43 HG02714-01 to TS and Department of Energy under Award Numbers 70588S02-I and DE-FG02-02ER83484 to DAM. Any opinions, findings, and conclusions or recommendations expressed in this poster are those of the authors and do not necessarily reflect the views of NSF, NHGRI or DOE. Sequencing was performed at DOE Joint Genome Institute as part of their Genomes-to-Life program. Panel A Isothermal amplification of circular templates using PyroPhage 3173 POL template --ssM13-- --pUC-- none primer uM 5 0.5 5 0.5 5 10 kb- 6 kb- 3 kb- 1 kb- Panel B Verification by PCR of amplification specificity -10 kb -6 kb -3 kb -1 kb pUC control pos amp neg amp Panel C Isothermal amplification of lambda DNA using PyroPhage 488 and 967 POLs ......... 488 .......... ......... 967......... mgC12 mM 1.5 15 1.5 ...................... 20 1.5 15 1.5 ......................20 template - - + + + + + + - - + + + + + + 10 kb- 6 kb- 3 kb- 1 kb- 2120 W. Greenview Drive Middleton, WI 53562 www.lucigen.com 1-888-575-9695 Identification of a Photosystem I gene next to an integrase. Little Hot Creek contig 27 (95% assembly) was compared to the GenBank nr database. The 2740 bp contig showed homologies to two similar genes, a Photosystem I biogenesis protein from Thermosynechococcus elongatus and a phage integrase from Rhodoferax ferrireducens. B 100 nm 200 nm C Family A Family B Ten clones express DNAP Clone Source Strongest similarity E value % identity % conserved Exo 3063 BearPaw Aquifex pyrophilus pol I 0.0 63 79 3’ 488 Little Hot Creek Aquifex pyrophilus pol I 1e-46 33 51 No 3173 Octopus Desulfitobacterium hafniense pol I 2e-37 30 48 3’ 4110 Octopus Pyrodictium occultum pol II 3e-55 28 46 No 2323 Octopus Pyrobaculum aerophilum pol II 1e-47 28 45 3’ 653 Bearpaw Pyrococcus furiosus virus pol 2e-12 37 59 3’ 967 Octopus Aquifex aeolicus pol I 3e-44 36 53 No 2783 Octopus Sulfolobus tokodaii pol II 3e-56 27 46 3’ 2072 Octopus Sulfolobus tokodaii pol II 2e-10 39 60 2123 Octopus Pyrococcus abyssi pol II 1e-04 35 51 photosystem I biogenesis protein [Thermosynechococcus elongatus BP-1] Length=296 Score = 52.0 bits (123), Expect = 9e-06 Identities = 43/136 (31%), Positives = 66/136 (48%), Gaps = 11/136 (8%) Frame = +1 Query 133 MGVDSSGLTGEGEALMRLAAE-HPSIEFFASVAFKYMP--EEPDPVTAADNARMAGFVPT 303 M D + G L+R E I+ FA V K+ P+ TA + G Sbjct 125 MATDQGLIEGPAHQLLRYRRELGQDIKIFADVMVKHAQPLHSPNLATAVRDTFDRGLADG 184 Query 304 T--SGSATGAPPDLE--KIRAMAARG-PLAVASGMTPDNVHLYSPYLSDILVATGIAAD- 465 SG ATG PP E + A AA+G PL + SG + DNV PY++ ++VA+ + + Sbjct 185 VILSGWATGQPPTEEDLSVAARAAKGQPLFIGSGASWDNVAQLVPYVNGVIVASSLKRNG 244 Query 466 --EHHLDPGKLARFIK 507 E +DP +++RF++ Sbjct 245 QIEQPIDPIRVSRFVE 260 Phage integrase [Rhodoferax ferrireducens DSM 15236] Length=347 Score = 74.7 bits (182), Expect = 4e-12 Identities = 80/287 (27%), Positives = 117/287 (40%), Gaps = 31/287 (10%) Frame = +1 Query 106 RDSETYRHIWSTWCKYLQGGQAGGRSRPIPWYEVDAATVVGFLQ--SGPASRKEKLESSS 279 R + YR W W +L PW + V +L S A+ ++ +SS Sbjct 45 RSVKQYRSTWFNWVAWLPPHT--------PWEKAAPEQVSAYLHQLSASATARQTQPNSS 96 Query 280 NE----TTKRRYWRVLDRIYNYAKAHNWVDSNPLVGLTTNDKPKSEDTLGTILDPHVWHA 447 T+RRYWR+L IY +A W ++NP T + P SE IL Sbjct 97 RRPASTVTQRRYWRMLRDIYAHAVVMAWCEANPCAQAT--EIPASEAMASMILPAWALRQ 154 Query 448 AEKLLAHPDRFDPI----SVRNRAILQILFGLGLAPQEVRALKTXXXXXXXXXXXXVNPS 615 + + H + VRN A+L +L G E+ +L+ L +E Sbjct 155 LQDGILHQASRQAVRKWQDVRNDALLLLLLHTGAKTGELVSLRVDQALKIRTEKHG-EQW 213 Query 616 KVHVDGHNTLRPRTLTLC-PKTSAAIREWLKARPAVAT--------SKSGQILFCTPKGP 768 + +DG + R +TL P+ AA+ +WL+ R V +KS I P Sbjct 214 AIQIDGEKDCQQRHITLDEPRAGAALAQWLRVRQHVPRKSPWLFFGAKSHVIDGKRELSP 273 Query 769 LGSVSLYLLVKSFLKKASDLAQREEP-PQAGPQVIRNSVLVRLLEDG 906 L S ++++LV LK E AG + IRNSVL R LE G Sbjct 274 LSSKTIFILVAGALKAHLPPNTFEGMLSHAGAEAIRNSVLARWLEAG 320 lytic enzyme structural protein

Transcript of Bacteriophage DNA Polymerases from Thermal Aquifers 1000 nm David Mead1, Vinay Dhodda1, Robert...

Page 1: Bacteriophage DNA Polymerases from Thermal Aquifers 1000 nm David Mead1, Vinay Dhodda1, Robert DiFrancesco1, Melodee Patterson1, Ronald Godiska1, Mya Breitbart2, Forest Rohwer2, Mark

K 1000 nm

David Mead1, Vinay Dhodda1, Robert DiFrancesco1, Melodee Patterson1, Ronald Godiska1, Mya Breitbart2, Forest Rohwer2, Mark Young3, Paul Richardson4, Thomas Schoenfeld1 1Lucigen Corporation, Middleton, WI; 2San Diego State University, San Diego, CA; 3Montana State University, Bozeman, MT; 4Joint Genome Institute, Walnut Creek, CA

Bacteriophage DNA Polymerases from Thermal Aquifers

Waterborne viruses (phages) in thermal springs have been largely unexamined, despite their potential importance in the biosphere. Phages promote microbial diversity by predation of the most abundant microbes and by transfer of genes through transduction and lysogeny. Sequence analysis (>37,000 reads) of thermal aquifer phage libraries provides insight into viral complexity, lifestyles, molecular diversity, and access to coding sequences for useful proteins. As expected, a large fraction of identifiable sequences encode phage specific proteins such as nucleic acid metabolizing enzymes, phage structural proteins, lytic enzymes, and mobile elements such as plasmids and transposons. Numerous similarities to enzymes associated with temperate phage, such as integrases, suggest that lysogeny is common in thermal environments. Unexpectedly, a Thermosynechococcus-like photosynthesis regulatory gene was found next to an integrase gene. Over 200 DNA polymerase genes have been identified with approximately 58 being full length. We have expressed 10 active enzymes, one of which allows isothermal amplification at elevated temperatures with greater specificity than conventional polymerases. This subset of thermostable phage DNA polymerases appears much more diverse than known microbial or phage enzymes. The fact that no pol genes were re-isolated suggests the level of diversity seen so far is the tip of a very large iceberg.

Phage in the Environment Viruses are now recognized as the most abundant forms of life on earth. An estimated 1031 viruses exist in the oceans alone, probably out numbering their hosts by ten fold (1). Aquatic viruses range in concentration from 104 to 107/ml in the water column (2), and in excess of 108/cm3 in the sediment (3). Viruses play an important role in global ecology, modulating the abundance and diversity of microbial populations through their lytic and lysogenic activity (4, 5) and play a critical role in the global nutrient and energy cycle (6). Phages are also believed to be a major driving force in cellular evolution (7), serving as a primary vector for horizontal genetic exchange and promoting diversity by their predation and lysogeny of the most abundant microbes. They also alter the phenotypes of bacteria they lysogenize.

Phage at High TemperaturesWhile the oceans contain ~1.3 X 1021 liters, ground water aquifers may contain 1019 liters of pore space (8). Marine and fresh water ecosystems are characterized by moderate to cold temperatures, whereas high temperature aquifers predominate at pressure and temperature gradients deep in the earth and in geothermal sites such as Yellowstone National Park and hydrothermal vents on the ocean floor. Temperatures as high as 230°C and pressures of 300 psi have been measured at Yellowstone National Park within 200 feet of the surface (9).

Life in the deep/hot biosphere is purely prokaryotic, improving the chances of more completely describing selected biomes. Furthermore, the geochemical energy in hydrothermal environments allows unique chemolithoautotrophic metabolic pathways. Most work on hot spring extremophiles has concentrated on sedentary surface microbes that grow at less than 100°C; this study looks at planktonic viruses and microbes in the emergent water column, which, presumably, originates at much higher temperatures. Viruses are the only known predators in thermal aquifers and can have a significant effect on hot spring microbial food webs. Breitbart (10) estimates that thermal aquifers may contain 3.7 X 1029 prokaryotic cells and 3.7 X 1030 viruses, which is roughly equivalent to the total estimated viruses in the ocean8. Viruses may be responsible for as much as 3.6 Gtons of carbon turnover per year, which is comparable to the impact of phages on the ocean. Viruses, particularly those of the thermal environments, may be important contributors to global molecular diversity (11). We have used a metagenomics approach to studying the viruses in terrestrial thermal pools.

Sampling of Four Thermal Sites in California and Yellowstone. Numerous sites in Long Valley California and Yellowstone National Park have been sampled. All of the work described focuses on four sites; one in Long Valley and three in Yellowstone. Viral particles were isolated and concentrated from several hundred liters of hot spring water by tangential flow filtration.

ABSTRACT

Viral Library Construction The construction of representative phage community DNA libraries is complicated by low yields of viral DNA, and the need to remove contaminating cellular and free DNA. A series of differential filtration and centrifugation steps was used to isolate and concentrate the phage until epifluorescence microscopy indicated an absence of contaminating microbial cells. Nuclease treatment was used to remove free DNA. We do not use density centrifugation in cesium chloride due to the unacceptable loss of already limited amounts of viral material, the incomplete separation of microorganisms and the potential bias introduced with this technique.

Improved Vectors for Library ConstructionRepresentative libraries of phage DNA are particularly problematic to construct, due in part to toxic coding sequences in their genomes. The CloneSmart® vectors were developed to improve the number of recombinants and reduce cloning bias in libraries. These vectors eliminate transcription and translation of recombinant inserts and have proven effective in cloning numerous otherwise toxic phage genes (personal communication Ry Young, Texas A&M).

Viral Morphotypes

Comparison of viral metagenomic libraries to the GenBank non-redundant database.Panel A) BLASTx results were examined to detect homology to proteins derived from phage, virus or other organisms (mostly bacteria and archaea). BLASTx results were categorized by source of the strongest hit.Panel B) Phage-like genes were categorized in functional groups using keywords shown below.

Little Hot Creek (top, left), Bath (top, right), BearPaw (bottom, left) & Octopus (bottom, right) Hot Spring. Bath Hot Spring shows significant geyser activity due to superheated water emanating from the underground aquifer.

Imaging of Hot Spring Phage. Panel A. Phage particles were captured on an AnoDisk filter (Millipore) and stained with SYBR Gold (Molecular Probes). The particles were imaged using a Bio-Rad 1024 laser scanning confocal microscope (U. Wisc. Keck Imaging Center). The numerous small spots are phage particles. An individual cell can be seen in upper center. Panels B to K are transmission electron micrographs of phages. Panel B shows a phage cultured from YNP. Also shown are phages directly isolated from Moundview (C), Azure (D), Bath (E and I), Octopus (F), Cavern (G), Paint Pots (J), and Azure (K) Hot Springs, all of YNP. (Electron micrographs are courtesy of Sue Brumfield, Montana State University.)

DNA replication/repair

19%

recombination10%

transcription/translation

17%

metabolism/modification

32%

lytic enzyme

11%

structural protein

9%

mobile element 2%

DNA replication/repair

17%

recombination10%

transcription/translation

18%metabolism/modification

29%

lytic enzyme

17%

structural protein 2%

DNA replication/repair

19%

recombination10%

transcription/translation

19%metabolism/modification

25%

lytic enzyme

8%

mobile element

18% mobile

element7%

structural protein 1%

Little Hot Creek BearPaw Octopus

DNA replication/repair

recombination

transcription/translation

Panel B

metabolism/modification

mobile element

Shown are sequence reads in which the strongest similarity was to proteins that match functional group keywords shown in Table 2.

Functional Group Key Word

DNA replication/repair DNA repair, DNA polymerase, replication, primase, helicase, reverse transcriptase, repA, DNA mismatch repair

Recombination integrase, resolvase, intron, terminase, DNA/RNA ligase, recN

Transcription/Translation RNA polymerase, transcription, tRNA synthetase

Mobile elements vector, transposase, pathogen, virulence/virulent, toxin

Nucleic acid metabolism/modification nuclease, nucleotidyl transferase, DNA glycosylase, DNA methylase, reductase, nucleotidase, restriction, RNase, DNase

Lytic enzyme sidase, protease, peptidase, proteinase, trypsin, chitin, lysozome, lysin, lytic

Structural protein tail, tape measure, head

Functional Groupings of Keywords into Categories

NanoClone Library Construction MethodsThe low amounts of starting material necessitate use of a unique genome amplification technology. NanoClone Library Construction (see below) uses DNA sheared to the appropriate molecular weight and end repaired. Linkers are ligated to the ends of the fragments. These linkers serve as priming sites for amplification (see below). The amplification products are cloned into pSMART vectors.

M 50 5 500 50 5 0 gel M ng ng pg pg pg

Anonymous genome amplification technology applied to phage lambda using known amounts of starting material. Amplified DNA was excised from the agarose gel and cloned into pSMART HCKan vector for sequencing. The primary band in the 5 pg lane produced authentic lambda DNA sequences.

Rox

NNNNNNNNNNNNNNNNNNNNNNNNNNNN

Blank

Taq

4110

653

Unextended

Extended

A tailed

3063

3173

488

2323

25 30 35 40 45 50

3’ exo

25 30 35 40 45 50

25 30 35 40 45 50 25 30 35 40 45 50

25 30 35 40 45 50 25 30 35 40 45 50

25 30 35 40 45 50 25 30 35 40 45 50

Thermostable phage DNAP activity assay. ROX-labeled primer/template mix (top) is extended. Unextended product runs at 37 nt. Extension products runs at 41 nt. A-tailing products run at 42 nt. 3’ exo products run at less than 37 nt. Minor peaks at 25, 30, 35, 40, 45, and 50 nt are TAMRA-labeled standards. PyroPhage DNAPs and a Taq control are indicated.

100 nmI 200 nm J

200 nmG200 nm

D

A

200 nmE200 nmF

200 nm

H

The pSMART vectors are “transcription-free”. They eliminate vector-driven transcription of insert DNA and terminate transcription that initiates from promoters within the insert. These vectors allow cloning of many types of difficult DNAs, providing unbiased library construction and recovery of toxic genes.

The pSMART-HC vectors are high-copy (300-500 copies/cell, similar to pUC19). The pSMART-LC vectors contain the ROP gene to decrease the copy number (15-20 copies/cell), providing higher insert stability. In typical blunt cloning experiments, the background of non-recombinant vector is < 0.1%.

TerminatorTerminator

Terminator

Blunt Cloning Site

pSMART

ROP

Ori

Am

p

TerminatorTerminator

Terminator

Blunt Cloning Site

pSMART

ROP

Ori K

an

pSMART -HCKan, -LCKan pSMART -HCAmp, -LCAmp

Phage-like genesA high proportion of sequence similarities are to genes normally found in viruses and phage, confirming the viral origin of the DNA used for library construction.

Frequency of phage-like genes

Molecular Diversity of PyroPhage pol genesPyroPhage polygenes were aligned to known eukaryotic, prokaryotic and viral polymerases using CLUSTAL W. The PyroPhage DNA enzymes are distinct from available thermal stable DNA polymerases.

Viral Metagenomic Libraries from Hot Springs

Library Hot Spring TempºC

Phage/ml in Spring (a)

Vol. Sampled (liters)

Theoretical Yield (ng) (b)

Actual Yield (ng)

% Yield of DNA (c) Vector Average

Insert Size

L1.1 Little Hot Creek, CA 84 not tested 450 --- 80 --- pcrSM 3-6 kb

Y4.9 Octopus, YNP 80 2.5X104 1058 571 10 1.7% pcrSM 2-3 kb

Y4.16 Bearpaw, YNP 74 5.9 X 104 450 573 59 10% pcrSM 2-3 kb

Y2.1 Bath, YNP 93 3.7 X104 360 288 90 31% pUC 3-6 kb

Number Sequence

Reads

No Homology

No Match to Keywords

Virus orPhage

Homology

DNA pol Gene Homology

Completepol Genes

Expressedpol Genes Purified

L1.1 7479 2363 3775 262 14 2 1 1

Y4.9 21,797 12,705 6548 2036 148 34 7 2

Y4.16 7545 2510 3619 333 57 20 2 1

Y2.1 765 200 410 19 3 2

totals 37,586 17,778 14,352 2,650 222 58 10 4

Summary of thermostable phage libraries screened in this project. a) Phage concentrations are based on direct counts by epifluorescence, b) Theoretical yields are total amount of phage DNA in samples assuming an average of 54 attograms of DNA/phage particle and use of only a portion of the sample (40%) for each DNA preparation (phage/ml X vol. sampled (l) X 1000 ml/l X 54 attogram DNA/phage X 40%), c) Percent yield is actual yield divided by theoretical yield.

Sequencing of the Libraries, Assembly of Contigs and Functional GenomicsMore than 37,000 reads (~37 Mb) of viral DNA were determined by DOE Joint Genome Institute. Individual reads were trimmed and vector sequences removed by SeqManII (DNASTAR). Trimmed sequences were assembled using SeqManII (DNASTAR) at a minimum match of 95% and match size of 20 nucleotides.

Assembly of the reads• Little Hot Creek 3.6 Mbp in 3014 contigs• Bearpaw 6.1 Mbp in 6,191 contigs• Octopus 14.4 Mbp in 13,543 contigs

In Silico Analysis of the LibrariesTranslations of the trimmed reads were aligned to known sequences in the GenBank nr protein database using BLASTx. Sequences were considered similar if the Expect value was equal to or less than 0.001.

CloneSmart® Vectors: No Background, No Transcription

Panel A

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0% Little Bear Octopus overall Hot Creek Paw

nonviral viral no similarity

Library LHC Bearpaw Octopus

head 3 10 15

helicase 132 122 274

integrase 67 30 22

ligase 0 11 18

lysin 118 260 389

methylase 50 32 127

methyltransferase 55 100 262

nuclease 94 145 232

polymerase 147 235 548

Library LHC Bearpaw Octopus

primase 14 19 21

reverse transcriptase 42 35 11

restriction 71 34 59

resolvase 24 15 87

ribonuclease 20 13 62

tail 17 38 289

topoisomerase 18 19 24

transposase 208 86 51

Phage Lifestyles Inferences into the lifestyles of phage in thermal aquifers can be based on apparent similarity to genes of known function. For example, the identification of 119 apparent integrase genes suggests that lysogeny is a common lifestyle in thermal aquifers. Likewise, apparent DNA polymerase genes suggest lytic phages, and reverse transcriptases suggest RNA phages.

Thermal Stable DNA Polymerases from Viral Metagenomic LibrariesFor several reasons, thermal stable viral DNA polymerases are expected to be very attractive alternatives to regent enzymes derived from microbial enzymes.

PyroPhage™ DNAPsPyroPhage DNAPs are expressed from the genes of viruses (phage) that inhabit boiling hot springs.

PyroPhage DNAPs as Alternatives for DNA Amplification • Viral DNAPs are replication enzymes, in contrast to currently available DNA repair enzymes, and represent the first viable alternative to existing reagent enzymes.

Based on known viral enzymes, PyroPhage DNPs are expected to have substantial advantages.• Unprecedented Molecular Diversity: This enzyme family is the most diverse known, which suggests a range of activities that can be tailored to the needs of the application.• Improvements can be extrapolated from the few well characterized viral DNAPs: - Excellent strand displacement and processivity will allow detection and analysis of “difficult” sequences. - Lower error rate results in more reliable data. - Absence of end-product inhibition improves quantitative PCR and increases yield. - Strand displacement allows isothermal amplification, which will increase throughput and reduce instrumentation costs. - Reduced stuttering will allow use of optimal markers for genetic tests. - Improved incorporation of nucleotide analogs will broaden the range of detection methods. - Accessory functions may expand functionality.

Detection of PyroPhage Coding Sequences by Sequence SimilarityThe BLASTx search revealed 222 apparent pol genes. Of these, 58 appear to be complete genes. All of the latter were tested for expression using a DNA polymerase assay developed at Lucigen.

Family A PyroPhage Polymerase Percent Identities

Pyro

Phag

e 30

63

Pyro

Phag

e 48

8

Pyro

Phag

e 31

73

Pyro

Phag

e 96

7

Aae

Taq

Tth

Tma

T5_p

hage

Bst

Eco_

polI

T7_p

hage

PyroPhage 3063 ---

PyroPhage 488 31 ---

PyroPhage 3173 28 45 ---

PyroPhage 967 31 51 82 ---

Aae 63 29 23 34 ---

Taq 24 25 22 25 22 ---

Tth 26 24 18 26 25 85 ---

Tma 30 22 22 28 28 42 42 ---

T5_phage 15 19 17 19 23 18 18 21 ---

Bst 24 23 20 28 24 40 41 40 14 ---

Eco_polI 30 22 19 26 28 38 38 43 21 40 ---

T7_phage 16 13 15 17 18 17 17 16 12 15 13 ---

Family B PyroPhage Polymerase Percent Identities

Pyro

Phag

e 41

10

Pyro

Phag

e 23

23

Pyro

Phag

e 27

83

Tli

Pfu

Pae

RM37

8_ph

age

HHV

CMV

vacc

inia

_viru

s

bacu

lovi

rus

Chlo

rella

viru

s

T4_p

hage

Eco_

polII

Ia

Hum

an_p

olal

pha

Phi2

9Pha

ge

PyroPhage 4110 ---

PyroPhage 2323 83 ---

PyroPhage 2783 91 86 ---

Tli 23 21 22 ---

Pfu 23 21 22 74 ---

Pae 27 26 27 35 35 ---

RM378_phage 14 14 15 13 15 13 ---

HHV 18 17 17 19 16 21 13 ---

CMV 19 15 17 18 17 21 15 28 ---

vaccinia_virus 19 19 18 16 15 17 14 13 16 ---

baculovirus 12 11 14 16 15 10 10 14 14 13 ---

Chlorellavirus 20 20 20 23 20 20 14 19 24 17 17 ---

T4_phage 15 16 17 16 17 12 13 12 12 10 9 11 ---

Eco_polIIIa 12 12 12 12 14 12 9 12 8 10 7 8 9 ---

Human_polalpha 17 18 15 21 23 21 18 15 14 13 14 18 13 10 ---

Phi29Phage 9 9 11 10 8 5 10 9 12 10 13 11 7 4 9 ---

Isothermal Amplification using PyroPhage DNA polymerasesFive units of 3173 polymerase was used to amplify one nanogram each of ssM13mp18 and pUC19 plasmid DNA. Random decamer primers were added to 0.5 μM or 5 μM, as indicated. Reactions were incubated at 95°C prior to addition of enzyme, then 16 hours at 55°C with enzyme. One fiftieth of each reaction was resolved on a 1% agarose gel. Results are shown in Panel A. To verify if the amplification was specific for the template DNA, one μl of the amplification product of the positive pUC19 reaction was tested in a PCR reaction using primers specific for the ampicillin resistance gene of the original plasmid template. As a negative control, a reaction containing all the components of the positive reaction, including input DNA, but without nucleotides to prevent amplification, was tested by PCR. As a positive control, the same sequence of was amplified directly from 1ng of pUC19.

Additional amplifications using 2.5 units of 488 DNA polymerases and 15 units of 967 DNA polymerase are also shown. In both cases, the reaction conditions were essentially the same as described except that the template was double-stranded, linear lambda phage DNA, the reaction temperature was 50°C and the concentration of magnesium was varied from 1.5 to 20 mM.

Family A5’ Exo Domain

Family A Active Site.Discrimination againstmodified nucleotides

Family B3’ Exo Domain

Family B Active Site. Discrimination againstmodified nucleotides

1091 LSTSSGFPTGATma LSTSTGIPTNA Tth LTTSRGEPVQA (D) Taq LTTSRGEPVQA

3063 RQLAKAVNFGLIYG2710 RQIGKSANFGLIYG488 RQIAKSANFGLIYG1795 RQIAKSANFGLIYGT7 Ph RDNAKTFIYGFLYGAae RQLAKAINFGLIYGTma RRAGKMVNFSIIYG Tth RRAAKTINFGVLYG (D) (Y)Taq RRAAKTINFGVLYG

3063 FLYIDTETVGD4110 VAAFDIEVDAT2323 VAAFDIEVDATPfu ILAFDIETLYH (A A) Vent LLAFDIETFYH

4110 QFAFKLILVSAYG3001 QFAFKLILVSAYG2323 QFAFKLILVSAYGPfu QKAIKLLANSFYG (L)Vent QRAIKLLANSYYG

Alignment of PyroPhage DNA polymerase domains with commercially relevant motifs. Selected PyroPhage DNAP sequences are aligned to mutations of domains of Taq and Vent that improve these enzymes as reagents. Also shown are Tth, Tma, and phage T7 domains. Conserved amino acids (compared to Taq and Vent) are shown in red. Amino acids that were substituted to create G46D (Taq 5’ exo domain), R660D/F667Y (Taq active site), D141A/E143A (Vent 3’ exo domain) and A488L (Vent active site) and are shown in blue. Variations different from both wild type and modified are shown in green.

Other Interesting Genes Found in Contigs from the Libraries. A cultivated Pyrobaculum spherical virus has 83% identity to a 3 kb contig from Bear Paw hot springs metagenomic DNA.

BLASTx Analysis of selected contigs(BearPaw) CONTIG 1494 ≈ 3 kb

Matches Pyrobaculum spherical virus• A non-lytic dsDNA virus of Pyrobaculum sp. D11 (PSV)• Isolated from Obsidian Pool YNP• Genome is 28 kb• ORFs have no gene matches to Public databases

Conservation of gene order and identity between the BearPaw CONTIG 1494 and Pyrobacullum Sp. Virus

ORF88A ORF137 ORF235 ORF239 ORF211 ORF107 89% ID 88% ID 87% ID 87% ID 79% ID 77% ID

500 1000 1500 2000 2500

500 1000 1500 2000 2500

BtpA Photosystem 1 Phage Integrase biogenesis protein

References1. Suttle CA. (2005) Viruses in the sea. Nature 437(7057):356-61. 2. Wommack KE, Colwell RR. 2000 Virioplankton: viruses in aquatic ecosystems. Microbiol Mol Biol Rev. 64(1):69-114.3. Breitbart M, Felts B, Kelley S, Mahaffy JM, Nulton J, Salamon P, Rohwer F. (2004) Diversity and population structure of a near-shore marine-sediment viral community. Proc R Soc Lond B Biol Sci.

271(1539):565-74. 4. Chibani-Chennoufi S, Bruttin A, Dillmann ML, Brussow H (2004) Phage–host interaction: An ecological perspective. J Bacteriol 186: 3677–3686.5. Paul, JH. Microbial gene transfer: an ecological perspective. J Mol Microbiol Biotechnol. 1999 1:45-50.).6. Fuhrman, JA. Marine viruses and their biogeochemical and ecological effects. 1999. Nature 399, 541-5487. Weinbauer MG, Rassoulzadegan F. (2004) Are viruses driving microbial diversification and diversity? Environ Microbiol. 6(1):1-11. 8. Gold T (1992) The deep, hot biosphere. Proc Natl Acad Sci U S A 89: 6045-9.9. White, D. E., Fournier, R.O., Muffler, L.J.P, and Truesdale, A.H. (1975). Physical results of research drilling in thermal areas of Yellowstone National Park, Wyoming. Geological Survey Professional

Paper No. 892: 1-89.10. Breitbart M, Wegley L, Leeds S, Schoenfeld T, Rohwer F. (2004) Phage community dynamics in hot springs. Appl Environ Microbiol. 70(3):1633-40.11. Villarreal LP, DeFilippis VR. (2000) A hypothesis for DNA viruses as the origin of eukaryotic replication proteins. J Virol. 74(15):7079-84.

SupportThis material is based upon work supported by the National Science Foundation under Awards Number 0109756 and 0215988, National Human Genome Research Institute under Award Number 1 R43 HG02714-01 to TS and Department of Energy under Award Numbers 70588S02-I and DE-FG02-02ER83484 to DAM. Any opinions, findings, and conclusions or recommendations expressed in this poster are those of the authors and do not necessarily reflect the views of NSF, NHGRI or DOE. Sequencing was performed at DOE Joint Genome Institute as part of their Genomes-to-Life program.

Panel A Isothermal amplification

of circular templates using PyroPhage 3173 POL

template --ssM13-- --pUC-- noneprimer uM 5 0.5 5 0.5 5

10 kb-6 kb-

3 kb-

1 kb-

Panel B Verification by PCR of

amplification specificity

-10 kb-6 kb-3 kb

-1 kb

pUC

cont

rol

pos

amp

neg

amp

Panel CIsothermal amplification of lambda DNA

using PyroPhage 488 and 967 POLs

.........488 .......... ......... 967.........

mgC12 mM 1.5 15 1.5 ......................20 1.5 15 1.5 ......................20template - - + + + + + + - - + + + + + +

10 kb-6 kb-3 kb-

1 kb-

2120 W. Greenview Drive Middleton, WI 53562 www.lucigen.com1-888-575-9695

Identification of a Photosystem I gene next to an integrase. Little Hot Creek contig 27 (95% assembly) was compared to the GenBank nr database. The 2740 bp contig showed homologies to two similar genes, a Photosystem I biogenesis protein from Thermosynechococcus elongatus and a phage integrase from Rhodoferax ferrireducens.

B 100 nm

200 nmC

Fa

mily

A

Fam

ily B

Ten clones express DNAPClone Source Strongest similarity E value % identity % conserved Exo

3063 BearPaw Aquifex pyrophilus pol I 0.0 63 79 3’

488 Little Hot Creek Aquifex pyrophilus pol I 1e-46 33 51 No

3173 Octopus Desulfitobacterium hafniense pol I 2e-37 30 48 3’

4110 Octopus Pyrodictium occultum pol II 3e-55 28 46 No

2323 Octopus Pyrobaculum aerophilum pol II 1e-47 28 45 3’

653 Bearpaw Pyrococcus furiosus virus pol 2e-12 37 59 3’

967 Octopus Aquifex aeolicus pol I 3e-44 36 53 No

2783 Octopus Sulfolobus tokodaii pol II 3e-56 27 46 3’

2072 Octopus Sulfolobus tokodaii pol II 2e-10 39 60

2123 Octopus Pyrococcus abyssi pol II 1e-04 35 51

photosystem I biogenesis protein [Thermosynechococcus elongatus BP-1] Length=296

Score = 52.0 bits (123), Expect = 9e-06 Identities = 43/136 (31%), Positives = 66/136 (48%), Gaps = 11/136 (8%) Frame = +1

Query 133 MGVDSSGLTGEGEALMRLAAE-HPSIEFFASVAFKYMP--EEPDPVTAADNARMAGFVPT 303 M D + G L+R E I+ FA V K+ P+ TA + G Sbjct 125 MATDQGLIEGPAHQLLRYRRELGQDIKIFADVMVKHAQPLHSPNLATAVRDTFDRGLADG 184

Query 304 T--SGSATGAPPDLE--KIRAMAARG-PLAVASGMTPDNVHLYSPYLSDILVATGIAAD- 465 SG ATG PP E + A AA+G PL + SG + DNV PY++ ++VA+ + + Sbjct 185 VILSGWATGQPPTEEDLSVAARAAKGQPLFIGSGASWDNVAQLVPYVNGVIVASSLKRNG 244

Query 466 --EHHLDPGKLARFIK 507 E +DP +++RF++Sbjct 245 QIEQPIDPIRVSRFVE 260

Phage integrase [Rhodoferax ferrireducens DSM 15236] Length=347

Score = 74.7 bits (182), Expect = 4e-12 Identities = 80/287 (27%), Positives = 117/287 (40%), Gaps = 31/287 (10%) Frame = +1

Query 106 RDSETYRHIWSTWCKYLQGGQAGGRSRPIPWYEVDAATVVGFLQ--SGPASRKEKLESSS 279 R + YR W W +L PW + V +L S A+ ++ +SSSbjct 45 RSVKQYRSTWFNWVAWLPPHT--------PWEKAAPEQVSAYLHQLSASATARQTQPNSS 96

Query 280 NE----TTKRRYWRVLDRIYNYAKAHNWVDSNPLVGLTTNDKPKSEDTLGTILDPHVWHA 447 T+RRYWR+L IY +A W ++NP T + P SE IL Sbjct 97 RRPASTVTQRRYWRMLRDIYAHAVVMAWCEANPCAQAT--EIPASEAMASMILPAWALRQ 154

Query 448 AEKLLAHPDRFDPI----SVRNRAILQILFGLGLAPQEVRALKTXXXXXXXXXXXXVNPS 615 + + H + VRN A+L +L G E+ +L+ L +E Sbjct 155 LQDGILHQASRQAVRKWQDVRNDALLLLLLHTGAKTGELVSLRVDQALKIRTEKHG-EQW 213

Query 616 KVHVDGHNTLRPRTLTLC-PKTSAAIREWLKARPAVAT--------SKSGQILFCTPKGP 768 + +DG + R +TL P+ AA+ +WL+ R V +KS I PSbjct 214 AIQIDGEKDCQQRHITLDEPRAGAALAQWLRVRQHVPRKSPWLFFGAKSHVIDGKRELSP 273

Query 769 LGSVSLYLLVKSFLKKASDLAQREEP-PQAGPQVIRNSVLVRLLEDG 906 L S ++++LV LK E AG + IRNSVL R LE GSbjct 274 LSSKTIFILVAGALKAHLPPNTFEGMLSHAGAEAIRNSVLARWLEAG 320

lytic enzyme

structural protein