Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring
description
Transcript of Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring
Thanks to: DARPA & DOE-GtL
Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, Xeotron/Invitrogen
For more info see: arep.med.harvard.edu
1-Feb-2005 9:15-10 MITRE
Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring
Synthetic - homologous recombination
testing of DNA motifs
1.3 2.4 (1.3 in argR)
1.1 1.3
0.7 2.5
0.2 1.4
1.4 3.5
RNA Ratio (motif- to wild type) for each flanking gene
Bulyk, McGuire,Masuda,Church Genome Res. 14:201–208
Synthetic Genomes & Proteomes. Why?
• Test or engineer cis-DNA/RNA-elements •Access to any protein (complex) including post-transcriptional modifications• Affinity agents for the above.• Protein design, vaccines, solubility screens • Utility of molecular biology DNA -- RNA -- Protein
in vitro "kits" (e.g. PCR -- T7 -- Roche)
Toward these goals design a chassis:• 115 kbp genome. 150 genes.• Nearly all 3D structures known.• Comprehensive functional data.
(PURE) translation utility
Removing tRNA-synthetases, translational release-factors,RNases & proteases
Selection of scFvs[antibodies] specific for HBV DNA polymerase using ribosome display. Lee et al. 2004 J Immunol Methods. 284:147
Programming peptidomimetic syntheses by translating genetic codes designed de novo. Forster et al. 2003 PNAS 100:6353
High level cell-free expression & specific labeling of integral membrane proteins. Klammt et al. 2004 Eur J Biochem 271:568
Cell-free translation reconstituted with purified components. Shimizu et al. 2001 Nat Biotechnol. 19:751-5.
in vitro genetic codes
5'
mS yU eU
UGGUUG CAG
AAC... GUU A 3'GAAACCAUG
fM TN V E
| | | | | || | |
5' Second base 3'
U
A
C
C U
mSyU
eU
A C U
G
A
0
500
1000
1500
2000
2500
3000
3500
30 40 50 60 70 80
3H-E dpm
time (min.)
fM yU mS eU E |
Forster, et al. (2003) PNAS 100:6353-7
80% average yieldper unnatural coupling. eU = 2-amino-4-pentenoic acid
yU = 2-amino-4-pentynoic acid mS = O-methylserine gS = O-GlcNAc–serine bK = biotinyl-lysine
Escherichia coli Mycoplasma 3D structureColiphage 29 DNA polymerase + +Coliphage P1 Cre recombinase - + >Coliphage Lox/Cre recombinase site - +Coliphage T7 RNA polymerase + + >Coliphage T7 RNA polymerase initiation site + + >Coliphage T7 RNA polymerase termination site + +RNase P RNA + -RNase P protein + + >RNase P site/RNA primer for DNA polymerase + +Small subunit 16S ribosomal RNA + +All 21 small subunit ribosomal proteins (1-21) + except 1,21 +Large subunit 5S ribosomal RNA + +Large subunit 23S ribosomal RNA + +Large subunit 23S rRNA G2445>m2G methylase: unknown ? -Large subunit 23S rRNA U2449>dihydroU synthetase: unknown ? -Large subunit 23S rRNA U2457>pseudoU synthetase ? -Large subunit 23S rRNA C2498>Cm methylase: unknown ? -Large subunit 23S rRNA A2503>m2A methylase: unknown ? -Large subunit 23S rRNA U2504>pseudoU synthetase ? -All 33 large subunit ribosomal proteins (1-7,9-11,13-25,27-36) + except 25, 30 +Translational initiation factor 1 + +Translational initiation factor 2 + +Translational initiation factor 3 + +Translational elongation factor Tu + +Translational elongation factor Ts + +Translational elongation factor G + +Translational release factor 1 + +Translational release factor 2 - +Translational release factor Gln methylase + +Translational release factor 3 - +Ribosome recycling factor + +33/45 Transfer RNAs (see Fig. 2) 29/33 +tRNA(I) C34>lysidine synthetase ? +tRNA(R) A34>I deaminase ? +tRNA(ASV) U34>cmo5U (=V) synthetase: unknown - -tRNA(R) U34>2sU Cys desulfurase - +tRNA(R) nm5U34 methylase ? +tRNA(R) U34>cmnm5U GTPase ? +tRNA(R) U34>cmnm5U synthetase ? +tRNA(R) cmnm5U34>nm5U,mnm5U synthetase ? -tRNA(R) G37 N1-methylase + +tRNA(RNIKM) A37>t6A N6-threonylcarbamoyl-A synthetase: unknown + -tRNA(CLFSWY) A37>i6A synthetase - +tRNA(CLFSWY) i6A37>s2i6A(ms2i6A) synthetase - +All 22 aminoacyl-tRNA synthetase subunits (20 enzymes) + except G subunit, Q + except G subunitMet-tRNA formyltransferase + +Chaperonin DnaK + +Chaperonin GroEL + +Chaperonin GroES + +
Total genes = 150Forster & Church
Oligos for 150 & 776
synthetic genes(for E.coli minigenome & M.mobile whole genome
respectively)
Up to 760K Oligos/Chip18 Mbp for $700 raw (6-18K genes)
<1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid Sheng , Zhou, Gulari, Gao (U.Houston) 24K Agilent Ink-jet standard reagents 48K Febit 100K Metrigen 380K Nimblegen Photolabile 5'protection Nuwaysir, Smith, Albert
Tian, Gong, Church
Improve DNA Synthesis CostSynthesis on chips in pools is 5000X less expensive per
oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!)
Solution: Amplify the oligos then release them.
10 50 10 => ss-70-mer (chip)
20-mer PCR primers with restriction sites at the 50mer junctions
Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
=> ds-90-mer
=> ds-50-mer
Improve DNA Synthesis Accuracyvia mismatch selection
Tian & Church Other mismatch methods: MutS (&H,L)
Genome assembly
Moving forward: 1. Tandem, inverted and dispersed repeats (hierarchical assembly, size-selection and/or scaffolding)2. Reduce mutations (goal <1e-6 errors) to reduce # of intermediates 3. 15kb to 5Mb by homologous recombination (Nick Reppas)4. Phage integrase site-specific recombination, also for counters.
Stemmer et al. 1995. Gene 164:49-53;Mullis 1986 CSHSQB.
50
75
125 225 425 825 … 100*2^(n-1)
All 30S-Ribosomal-protein DNAs(codon re-optimized)
Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
1.7 kb
0.3 kb
s190.3kb
Nimblegen 95K chip
Atactic <4K chip
Improving synthesis accuracy
Method Bp/error
Chip assembly only 160 Hybridization-selection 1,400MutS-gel-shift 10,000MutHLS cleavage 30,000 (10X better than PCR)
Tian & Church 2004Carr & Jacobson 2004Smith & Modrich 1997
Extreme mRNA makeover for protein expression in vitro
RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially.
RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable.
Solution: Iteratively resynthesize all mRNAs with less mRNA structure.
Tian & Church
20w 20m 17w 17m 16w 16m
10kd
W: wild-typeM: modified
Western blot based on His-tags
Safety Proposals
Church, G.M. A synthetic biohazard non-proliferation proposal. http://arep.med.harvard.edu/SBP/Church_Biohazard04c.doc (2004)
1. Monitor oligo synthesis via expansion of Controlled substances, Select Agents, &/or Recombinant DNA
2. Computational tools for the above
3. System modeling checks for synthetic biology projects
4. Multi-auxotroph, novel genetic code for the host genome, prevents functional transfer of DNA to other cells.
Why sequence?
• Synthetic biology & laboratory selections• Pathogen "weather map", biowarfare sensors• Cancer: mutation sets for individual clones, loss-of-heterozygosity• RNA splicing & chromatin modification patterns.• Antibodies or "aptamers" for any protein• B & T-cell receptor diversity: Temporal profiling, clinical • Preventative medicine & genotype–phenotype associations • Cell-lineage during development• Phylogenetic footprinting, biodiversity
Shendure et al. 2004 Nature Rev Gen 5, 335.
Personal genomics & cancer therapy
Mutations G719S, L858R, Del746ELREA in red.
EGFR Mutations in lung cancer: correlation with clinical response to gefitinib [Iressa] therapy. Paez, … Meyerson (Apr 2004) Science 304: 1497
Lynch … Haber, N Engl J Med. (Apr 2004) 350:2129.
Pao .. Mardis,Wilson,Varmus H, PNAS (Aug 2004) 101:13306-11.
Dulbecco R. (1986) A turning point in cancer research: sequencing the human genome. Science 231:1055-6.
Why 'single molecule' sequencing?
(1) Single-cells: Preimplantation (PGD), uncultivatable
(2) Co-occurrence on a molecule, complex, cell RNA splice-forms & DNA haplotypes
(3) Cost: $1K-100K "personal genomes"http://grants.nih.gov/grants/guide/rfa-files/RFA-HG-04-003.html
(4) Precision: Counting 109 RNA tags (to reduce variance)
(~5e5 RNAs per human cell)Fixed 5e3 5e4 5e6 5e9 (goal) costs EST SAGE MPSS Polony-FISSeq (polymerase colony)
CD44 Exon Combinatorics (Zhu & Shendure)
• Alternatively Spliced Cell Adhesion Molecule• Specific variable exons are up-or-down-regulated in
various cancers (>2000 papers)• v6 & v7 enable direct binding to chondroitin sulfate,
heparin…
Zhu,J, et al. Science. 301:836-8.
Zhu J, Shendure J, Mitra RD, Church GM. Science 301:836-8. Single molecule profiling of alternative pre-mRNA splicing.
EXON PATTERN Eph4 Eph4bDD TOTALEph4 FRATIO LSTP-PV------------7-8-9-10 609 764 1373 1.17 1E-4--------------8-9-10 320 390 710 1.13 3E-2----------6-7-8-9-10 431 251 682 -1.85 4E-18------4-5-6-7-8-9-10 218 216 434 -1.08 2E-1----------------9-10 68 143 211 1.96 7E-7--------5-6-7-8-9-10 86 39 125 -2.37 2E-6----3-4-5-6-7-8-9-10 40 56 96 1.30 9E-2------4-5---7-8-9-10 16 74 90 4.30 2E-9--2-3-4-5-6-7-8-9-10 44 28 72 -1.69 1E-21-2-3-4-5-6-7-8-9-10 22 5 27 -4.73 3E-4--------5---7-8-9-10 5 19 24 3.53 3E-3----3-4-5---7-8-9-10 1 15 16 13.95 4E-4--2-3-4-5---7-8-9-10 1 10 11 9.30 5E-3
Eph4 = murine mammary epithelial cell line
Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)
CD44 RNA isoforms
Chromosome-wide haplotyping
IL6-3572 : A
60-Mb
CD36-4366 : A/T
Human Chr. 7
A..A
A..T
73
3
1
150 Mb
Convergence on non-electrophorectic tag-sequencing methods?
Tag >400 14-26 20 100 26 bp (2-ends) EST SAGE MPSS 454 Polony-Seq Ronaghi• Single-molecule vs. amplified single molecule. • Array vs. bead packing vs. random• Rapid scans vs. long scans (chemically limited, 454)• Number of immobilized primers: 0: Chetverin'97 "Molecular Colonies" 1: Mitra'99 > Agencourt "Bead Polonies" 2: Kawashima'88, Adams'97 > Lynx/Solexa: "Clusters"
http://arep.med.harvard.edu/Polonator/Plone.htm
Bead Polony Sequencing Pipeline
In vitro libraries via paired tag
manipulation
Bead polonies via emulsion PCR
[Dre03]
Monolayered immobilization in acrylamide
Enrichment of amplified beads
SOFTWARE
Images → Tag Sequences
Tag Sequences → Genome
FISSEQ or “wobble”sequencing
Epifluorescence Scope with Integrated Flow
Cell
Polony Fluorescent In Situ Sequencing Libraries
Greg PorrecaAbraham Rosenbaum
1 to 100kb Genomic1 to 100kb Genomic
M
L R
M
PCRbead
Sequencingprimers
Selectorbead
2x20bp after MmeI (BceAI, AcuI)
Dressman et al PNAS 2003 emulsion
Cleavable dNTP-Fluorophore (& terminators)
Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65
Reduce
or
photo-cleave
Polony-FISSeq: up to 2 billion beads/slideCy5 primer (570nm) ; Cy3 dNTP (666nm)
Jay ShendureSelf Organizing Monolayer
• # of bases sequenced (total) 23,703,953
• # bases sequenced (unique) 73
• Avg fold coverage 324,711 X
• Pixels used per bead (analysis) ~3.6
• Read Length per primer 14-15 bp
• Insertions 0.5%
• Deletions 0.7%
• Substitutions (raw) 4e-5 • Throughput: 360,000 bp/min
Polony FISSeq Stats
Current capillary sequencing 1400 bp/min (600X speed/cost ratio, ~$5K/1X)
(This may omit: PCR , homopolymer, context errors)Shendure
High accuracy special case: homopolymers (e.g. AAA, CC, etc.)
• Use "compressed" tags , ACG = ACCG=ACCCG• Quantitate incorporation • Reversible terminators• FRET between adjacent 3' bases • Wobble sequencing
All five of these work.
• Maintenance of amplification fidelity using linear amplification from initial genomic fragment
Degenerate (aka “wobble) sequencing
“single tipped” vs “double tipped”
length of anchoring sequence
natural vs. universal nucleotides (i.e. deoxyinosine)
single fluor vs. four-color fluor mixtures of dNTPs for extensions
Sequenase vs Klenow vs BST
Exonuclease stripping vs heat stripping
CTAGCGAGCTAGNNNNNNNNACTAGCGAGCTAGNNNNNNNNGCTAGCGAGCTAGNNNNNNNNCCTAGCGAGCTAGNNNNNNNNT
anchor degenerate
“tip”
Wobble vs Simple base-extension
1/4 vs 2.5/4 base/cycle
>8 vs 14-200 base reads
3e-3 vs 4e-5 non-homopolymer errors
3e-3 vs 1e-1 homopolymer errors
40' per cycle, 60 hr per 20 cycles
Sequencing single molecules
Ecosystem studies need single-cell amplification because of multiple chromosomes (& RNAs) per cell. Many cells are hard to grow. Microbes exchange genome subsets.
(Even an 80% genome coverage is better than 100 kb BACs)
Many input molecules required to sequence one molecule. vs. one molecule sufficient to sequence via many copies of it.
Single cell sequencing
29 real-time amplification
No template control
Affymetrix quantitation of independent amplifications
.