Harvard Med, WashU, MIT NIH-CEGS, DARPA, PhRMA, DOE-GTL Sequencing: Helicos, Ambergen, Caliper,...
-
date post
19-Dec-2015 -
Category
Documents
-
view
219 -
download
0
Transcript of Harvard Med, WashU, MIT NIH-CEGS, DARPA, PhRMA, DOE-GTL Sequencing: Helicos, Ambergen, Caliper,...
Harvard Med, WashU, MIT
NIH-CEGS, DARPA, PhRMA, DOE-GTL
Sequencing: Helicos, Ambergen, Caliper, BC-Agencourt-GTC,
Synthesis: Nimblegen, Atactic/Invitrogen, CodonDevices
Systems: BGMedicine, Genomatica
MPM Santa Barbara Thu 21-Jul-2005 1:30 PM
Synthesizing & Sequencing Genomes
1974 Code
1984 Genome Project
1994 H.pylori
2004 Synthetic Biology
3 Exponential technologies(synergistic)
Shendure J, Mitra R, Varma C, Church GM, 2004 Nature Reviews of Genetics. Carlson 2003 ; Kurzweil 2002; Moore 1965
1E-3
1E-1
1E+1
1E+3
1E+5
1E+7
1E+9
1E+11
1E+13
1830 1850 1870 1890 1910 1930 1950 1970 1990 2010
urea
E.coli
B12
tRNA
operons
telegraph
Computation &Communication
(bits/sec~m$)
Synthesis (amu/project~M$)
Analysis(kamu~base/$) tRNA
Safer biology via synthetic biology• Cell & Eco Systems modeling• HiFi automated gene replacement•Inexpensive bio-weather-map custom biosensors (airborne & medical fluids), • International bio-supply-chain licensing (min research impact, max surveillance)• Metabolic dependencies prevent survival outside of controlled environments• Multi-epitope vaccines & biosynthetic drugs.• Cells resistant to most existing viruses • via codon changes see: arep.med.harvard.edu/SBP
difficulty
Constructing new genetic codes(two examples)
1. Codons: 313 UAG stop > UAA stop2. Delete RF1 (1 free codon, for new aa e.g. PEG-pAcPhe-hGH) 1. Codons: AGY Ser > UCX Ser2. tRNAs: AGY Ser > AGY Leu3. Codons: UUR/CUX Leu > AGY4. tRNAs: UUR Leu > UUR Ser5. Codons: UCX Ser > UUR Ser (Leu & Ser now switched & 8 codons free)
Why Synthetic Genomes & Proteomes?
• Test or engineer cis-DNA/RNA-elements• Drug biosynthesis e.g. Artemesinin (malaria)• Epitopes & vaccines (HIV gag)• Unnatural aa & post-translational modifications• De novo protein design & selection.• Humanizing imm/tox systems, E.colizing codons • 20 bit in vivo counters
•Why whole genomes?Changing the genetic code, safety, genome stability, enhanced restriction, recombination
10 Mbp of oligos / $1000 chip
<1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid Sheng , Zhou, Gulari, Gao (U.Houston) 12K Combimatrix Electrolytic 24K Agilent Ink-jet standard reagents380K Nimblegen Photolabile 5'protection Nuwaysir, Smith, Albert
Tian et al. Nature. 432:1050
~1000X lower oligo costs
(= 2 E.coli genomes or 20 Mycoplasmas /chip)
Improve DNA Synthesis CostSynthesis on chips in pools is 1000X less expensive per
oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!)
Solution: Amplify the oligos then remove tags.
10 50 10 => ss-70-mer (chip)
20-mer PCR primers with restriction sites at the 50mer junctions
Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
=> ds-90-mer
=> ds-50-mer
Improve DNA Synthesis Accuracyvia mismatch selection
Tian & Church Other mismatch methods: MutS (&H,L)
Improving DNA synthesis accuracy
Method Bp/error
Chip assembly only 160 Hybridization-selection 1,400MutS-gel-shift 10,000PCR 35 cycles 10,000MutHLS cleavage 100,000In vivo replication 1,000,000,000
0 MutS
2 MutS
Tian & Church 2004 NatureCarr & Jacobson 2004 NAR
Smith & Modrich 1997 PNAS{
CAD-PAM: Computer-Aided Design -Polymerase Assembly Multiplexing
For tandem, inverted and dispersed repeats: Focus on 3' ends, hierarchical assembly, size-selection and scaffolding.
Mullis 1986 CSHSQB, Dillon&Rosen 1990 BioTechniques, Stemmer 1995 Gene, Smith et 2003 PNAS, Tian et al. 2004 Nature,
50
75
125 225 425 825 … 100*2^(n-1)
E.coli genome synthesis pipeline
0 1 2 3 4 PAM cycle# 550 75 125 225 425 #bp 825
50 HS PAM 425 MutS PAM 10K anneal 100K red5Mbp
USER-T4Pol USER-T4Pol USER-5'only 1 pool 480 pools 480 genomic 48 1 of 117K universal primers primer pairs
HS=Hybridization-SelectionUSER=Uracil DNA glycosylase &EndoVIII to remove flanking primer pairs
PCR in vitro
Isaacs, Carr, Emig, Gong, Tian, Reppas,
Jacobson, Church
in vivo
>3 days >1 day ~1day >7days
5 Mbp Genome assembly alternatives
1. cat
Automated in vivo homologous recombination:Serial electroporation: 48 stages: 1 strain (21 hr/stage)vs. Hierarchical conjugation: 7 stages: 48 > 24 > 12 > 6 > 3 > 2 > 1 strainsvs. Random/simultaneous 1 or more stages
3. cat2. kan
Reppas & Church
All 30S-Ribosomal-protein DNAs(codon re-optimized)
Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
1.7 kb
0.3 kb
s190.3kb
Nimblegen 95K chip
Atactic <4K chip 14.6 kb
Full codon remapping for enhanced protein expression
RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially.
RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable.
Solution: Iteratively resynthesize all mRNAs with less mRNA structure, lower %GC
Tian et al. Nature. 432:1050
20w 20m 17w 17m 16w 16m
10kd
W: wild-typeM: modified
Western blot based on His-tags
Genome engineering: safety via metabolic dependency
trp/tyrA pair of genomes shows the best co-growth
Reppas, Lin & Church unpublished.
First Passage SecondPassage
(ideally ecologically uncommon molecules)
0
1
2
3
4
5
6
7
8
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
# of passages
Do
ub
lin
g t
ime
(h
r)
Q1
Q3
Q2-1
Q2-2
EcNR1
Sequence monitoring of evolution(anticipate escape & resistance)
Molecular Genomic Imaging Center (CEGS)Harvard / Wash U
George Church, Rob MitraGreg Porreca, Jay Shendure
Sequencing by Ligation on Polony Beads
with Nick Reppas, Kun Zhang, Shawn Douglas, Mike Wang,
Abraham Rosenbaum, Agencourt
Personal Genomics, Stem Cells, ELSI
Synthetic Biology
A’
A’A’
A’
A’
A’
B
BB
B
BB
A
Single molecule from a library or population
B
BA’
A’
Primer is Extendedby Polymerase
B
A’
BA’
Polymerase colony (polony) PCR in a gel (or emulsion-beads)
Primer A has 5’ immobilizing Acrydite (or double biotin to SA-beads)
Mitra & Church Nucleic Acids Res. 27: e34
New Polony Sequencing Pipeline
In vitro paired tag libraries
Bead polonies via emulsion PCR
Monolayer gel
immobilization
Enrich amplified
beads
SOFTWARE
Images → Tag Sequences
Tag Sequences → GenomeSBE or SBLsequencing
Epifluorescence & Flow Cell
Mitra, Shendure, Porreca, Rosenbaum, Church unpub.
~1 kb genomic fragment
paired genomic tags
(17 to 18 bp each)
common sequences
MmeI
Fisseq-F Fisseq-RT30 Tag 2Tag 1Fisseq-FLeft RightMid Seq2Seq1
In vitro construction of a complex, mate-paired library
43 bp 32 25
Total = 134-136 bp amplicon
ACUCAUC…(3’)…TAGAGT????????????????TGAGTAG…(5’)
5’-Cy5-nnnnAnnnn-3’ 5’-Cy3-nnnnGnnnn-3’ 5’-TR-nnnnCnnnn-3’ 5’-Cy3+Cy5-nnnnTnnnn-3’
5'PO4
Sequencing by Ligation (SBL) with fluorescent combinatorial 9-mers
Excitation Emission 647 700 555 605 572 630 555 700
nm
Consensus Accuracy False Positives (E.coli) False Positives (Human)1E-3 4,000 3,000,000
1E-4 BERMUDA/ABI 400 300,000
1E-6 Polony-SBL 4 3000
Goal of Resequencing Discovery of Uncommon VariationE.g. cancer mutations <1E-6
Why low error rates?
ABI 2004 Jun 2005 2006 >2007
# bp/expt - 2e7 3e7 3e8 60e9
Complexity (bp) - 74 4e6 3e9 6e9
Avg Fold Cov 8 3e5 6 0.1 10
Pix per bp - 300 1724 333 1Read-length 900 14 (SBE) 25 (pair) 35 42
$ / kb (e<1e-5) 2.4 - .24 .12 2e-5
$/ 1X 3e9 b 2e6 - 2e5 5e4 200
Indel Error 5e-3 0.6% 1e-3 1e-3 1e-3
Subst Error 4e-3 4e-6 1e-3 1e-3 1e-3
3X Cons Err 1e-4 - 1e-6 3e-7 1e-7
Kb / min 0.8 360 27 1e3 1e6
Pix / sec - 2e5 2e6 6e6 2e7Enz $/mg - 8 8 8 0.4
Sequencing cost (10 to 100,000 fold improvements)
Position Type Gene LocationABI
Confirmation
Comments
986,334 T > G ompF TATA box Only in evolved strain
931,960 8 bp del lrp frameshift Only in evolved strain
1,976,500 776 bp del insB_5 IS element MG1655 heterogeneity
3,957,960 C > T ppiC 5' UTR MG1655 heterogeneity
4,654,533 T > C cI Glu > Glu heterogeneity
4,647,960 T > C ORF61 Lys > Gly heterogeneity
985,797 T > G ompF Glu > Ala (in progress)
454,864 T > C tig Gly > Gly(in progress)
4,648,691 G > A exo Phe > Phe(in progress)
Mutation Discovery in Engineered/Evolved E.coli
rs3778973
rs1557917
rs39284
rs10500042
rs4717028
GM10835
C
G
C
G
C
T
A
T
A
T
TT=137 CT=2 (TC=1) CC=131
153Mb
Single human chromosome molecules(polony sequencing)
Genome Analysis Policy
• Insurance/employment: What probability & level of advantage can be hidden/examined?
• Individual/group stigma
• Choice, stem cells, cloning
• Privacy & transparency
NHGRI/DOE ELSI, Genetic Screening Study Group
Anonymity, privacy, disclosure, identity
"Open-source" meets Personal Genome-Phenome Project
• Are information-rich resources (e.g. facial imaging & genome sequence) really anonymous?
• What are the risks and benefits of "open-source"?
• What level of training is needed to give informed consent on open-ended studies?
• Harvard Medical School IRB Human Subjects protocol submitted 16-Sep-2004.