BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011 DNA ... ·...

9
1 BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011 DNA Synthesis, Amplification (PCR), Sequencing DNA synthesis is underrated . It is essential to genomics and systems biology! 1. DNA sequencing requires oligonucleotides for primers and PCR 2. Transcriptome analysis by microarrays uses DNA synthesis 3. Gene of any natural or non-natural sequence can be synthesized 4. Technology moving toward genome synthesis, Science July 2, 2010, p. 52. DNA SYNTHESIS – Biological (cellular) and chemical DNA synthesis is very different! Biology Chemistry DNA template required No template; solid support anchors chain Nucleotide triphosphates added Nucleotide phosphoramidates added Pyrophosphate leaving group Hydroxylamine leaving group Enzymatic — one step Multiple steps per base added Enzyme proofreading HPLC purification Chemistry - The key to chemical synthesis is attachment to solid support Attachment necessary to add each base, then wash away reagents for high yields at each step Automated synthesis, then HPLC purification Reaction cycles in instrument at left Below is what is in 4 of the bottles above = Chemical equivalent of nucleotide triphosphate DNA Synthesis Summary 1. Oligonucleotide synthesis is automated and cheap 2. Reaction yield approximately >99%

Transcript of BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011 DNA ... ·...

Page 1: BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011 DNA ... · 2019-03-29 · BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011

1

BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011

DNA Synthesis, Amplification (PCR), Sequencing

DNA synthesis is underrated. It is essential to genomics and systems biology! 1. DNA sequencing requires oligonucleotides for primers and PCR 2. Transcriptome analysis by microarrays uses DNA synthesis 3. Gene of any natural or non-natural sequence can be synthesized 4. Technology moving toward genome synthesis, Science July 2, 2010, p. 52.

DNA SYNTHESIS – Biological (cellular) and chemical DNA synthesis is very different!

Biology Chemistry DNA template required No template; solid support anchors chain Nucleotide triphosphates added Nucleotide phosphoramidates added Pyrophosphate leaving group Hydroxylamine leaving group Enzymatic — one step Multiple steps per base added Enzyme proofreading HPLC purification Chemistry - The key to chemical synthesis is attachment to solid support Attachment necessary to add each base, then wash away reagents for high yields at each step Automated synthesis, then HPLC purification Reaction cycles in instrument at left

Below is what is in 4 of the bottles above = Chemical equivalent of nucleotide triphosphate

DNA Synthesis Summary 1. Oligonucleotide synthesis is automated and cheap 2. Reaction yield approximately >99%

Page 2: BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011 DNA ... · 2019-03-29 · BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011

2

3. Advantages over biosynthesis = 1. Unusual or modified nucleotides can be added 2. Can add fluorescent tags or binding reagents, eg. Biotin 3. Novel genes can be synthesized 4. Oligonucleotide synthesis excellent for making 20mers, 30mers, 40 mers 5. Make gene-sized DNA by synthetic chemistry coupled with enzyme chemistry in vitro For an excellent recent review, see: Czar, et al. 2009. Gene synthesis demystified. Trends in Biotech 27:63. Synthesizing large DNA polymers (eg. artificial genome) uses chemical & enzymatic procedures (a) Chemistry – Automated solid state synthesis of DNA oligonucleotides

(b) Biology - Assembly of synthesized chains by duplex formation and enzymatic synthesis, and DNA ligation

Figure from Cantor & Smith, Genomics

Assembly of synthetic genome, Science July 2 ’10,p.52

Synthetic genome built in series of steps - largely order of magnitude jumps Oligos (10-100 bp) -> Gene-size (1000 bp) -> 10,000 bp -> 100,000 bp -> final genome

1. Started with commercially synthesized 1080 bp units (done as above, Cantor & Smith, Genomics) 2. Recombined overlapping1080 bp units in yeast to make 109 x 10,080 bp units transferred to E. coli 3. Recombined 10,080 units in yeast to make 100,000 kb units; unstable in E. coli so extracted, purified on gel 4. Sequenced, some had small deletions; correct ones used for further assembly; any yeast DNA removed 5. DNA fragments put into yeast spheroplasts to recombine 6. Clones checked by multiplex PCR that went across all junctions; one clone had all junctions 7. Synthesized Mycobacterium mycoides chromosome isolated from bacterial genome in agarose plugs 8. Transplanted genome into M. capricolum that was prepared to be “recipient”

Moving a bacterial genome into yeast, engineering it, and installing it back into a bacterium by genome transplantation. A yeast

vector is inserted into a bacterial genome by transformation. That genome is cloned

into yeast. After cloning, the repertoire of yeast genetic methods is used to create

insertions, deletions, rearrangements, or any combination of modifications in the

bacterial genome. This engineered genome is then isolated and transplanted into a

recipient cell to generate an engineered bacterium. Before transplantation it may be

necessary to methylate the donor DNA in order to protect it from the recipient cell’s

restriction system(s). This cycle can be repeated starting from the newly engineered

genome – From Lartigue, et al, Science, Sept. 2009, p. 1693.

Page 3: BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011 DNA ... · 2019-03-29 · BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011

3

History of DNA and sequencing

What new scientific questions can be addressed with rapid and cheap sequencing?

1. Sequencing all key cultured organisms

2. Sequencing all available organisms in important medical or biotech-related genera

3. Conducting large-scale environmental metagenome projects

4. Single-cell sequencing of major uncultivated microbes from many environments

5. Looking at differences in closely related strains

6. Evolving strains in the laboratory and sequencing to understand evolutionary adaptions

7. Individual reseachers can quickly search for enzymes/functions of interest via sequence/annotation

8. Linking human microbiome populations to health and vitality

NIH - microbial inhabitants in and on human body important to health

Human cells in our bodies = 1013

Bacterial cells in or on our bodies = 1014

Goals of the human microbiome project:

1. Determining if humans share common “core” microbiome

2. Correlating changes in microbiome to health states

3. Developing new tools to support goals 1 & 2

4. Addressing ethical, legal, social issues raised by research

For NIH microbiome project listings, see: http://nihroadmap.nih.gov/hmp/fundedresearch.asp

Sequencing methods – 1st generation A. Source of DNA B. Preparation of DNA C. Automated sequencing D. Computational assembly E. “Finishing” F. Analysis of data

Genomics schedule, From: Ussery, et al, Computing for Comparative Microbial Genomics

Page 4: BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011 DNA ... · 2019-03-29 · BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011

4

A. Source of DNA Traditionally, pure culture of bacterium or tissue of eukaryote Must be careful of contaminating DNA in laboratory

B. Preparation of DNA

Purification via isopropanol or ethanol precipitation CsCl gradient ultracentrifugation

C. Methods

1. Maxam and Gilbert -- First reasonably efficient method Basis: Chemical selective cleavage of DNA

Example: Maxam-Gilbert Ladder & its interpretation *ACTTGACGGCA….

2. Sanger dideoxy chain termination methods Basis: Synthesize nested-length fragment; different reaction mixtures, each with one dioxynucleotidetriphosphate

Both Maxam & Gilbert and Sanger dideoxy methods

require separation of nested-size DNA fragments Electrophoresis (left) – can separate DNA fragments differing by one base; can be monitored visually or by high-throughput automated process Goals:

1. Maximize throughput 2. Minimize sequencing errors

Speed of DNA sequencing constrained by electrophoresis electrical field strength Field Speed Heat

Heat inhomogeneties can decrease resolution

Detection of DNA Fragments.

In DNA sequencing over last 20 years radioisotopes were been replaced by fluorescent labels. Fluorescence has many advantages:

1. Stable 2. No radioactive waste 3. Can use different color dyes for each base 4. Amenable to high throughput, automated analysis.

Breakthrough: Use common fluoresecent excitor molecule to transfer energy to each one of the four primary fluorescent dyes.

Com

P

O

O-O P

O

O-O P

O

O-O CH2

OH

HH

H

A,T,G or C

O HH

OO

O-

OO

HO H

Absence of nucleophilichydroxyl group to react with γ-phosphate of NTP

P BCH2

O

A. Synthesis is enzymaticB. Sequenase - modified polymeraseC. Method most amenable to high-throughputD. Can be used with four-color fluorescence

CH3N+

OCH

CH

CH

O

CH3(CH2)5

CONH

(H2C)6O

P O-OO

O

O

N

N

NN

NH2

O

PrimerSequence

CYA

hvat 480nm. to excite

Transfer energyto different dyes attached to different nucleotideslanes for ultimate detection(see next page)

Energy Transfer ~ 100%

Page 5: BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011 DNA ... · 2019-03-29 · BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011

5

PYROSEQUENCING (Basis of 454 technology) Step 1 � A sequencing primer is hybridized to a single stranded, PCR amplified, DNA template, and incubated with the enzymes, DNA polymerase, ATP sulfurylase, luciferase and apyrase, and the substrates, adenosine 5´ phosphosulfate (APS) and luciferin. Step 2 The first of four deoxynucleotide triphosphates (dNTP) is added to the reaction. DNA polymerase catalyzes the incorporation of the deoxynucleotide triphosphate into the DNA strand, if it is complementary to the base in the template strand. Each incorporation event is accompanied by release of pyrophosphate (PPi) in a quantity equimolar to the amount of incorporated nucleotide. Polymerase (DNA)n + dNTP --------------- (DNA)n+1 + PPi Step 3 ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5´ phosphosulfate. This ATP drives the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP. The light produced in the luciferase-catalyzed reaction is detected by a charge coupled device (CCD) camera and seen as a peak in a pyrogram™. Each light signal is proportional to the number of nucleotides incorporated. Sulfurylase Sample chromatogram: Light emission measured APS + PPi --------------- ATP Luciferin ----------------- oxyluciferin Luciferase ATP ----------------- Light Step 4 Apyrase, a nucleotide degrading enzyme, continuously degrades unincorporated dNTPs and excess ATP. When degradation is complete, another dNTP is added. apyrase dNTP ----------------- dNDP + dNMP + phosphate ATP ----------------- ADP + AMP + phosphate Step 5 �

Addition of dNTPs is performed one at a time. It should be noted that deoxyadenosine α-thio triphosphate (dATPS) is used as a substitute for the natural deoxyadenosine triphosphate (dATP) since it is efficiently used by the DNA polymerase, but not recognized by the luciferase.

As the process continues, the complementary DNA strand is built up and the nucleotide sequence is determined from the signal peak in the pyrogram.

Page 6: BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011 DNA ... · 2019-03-29 · BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011

6

454 sequencer - Roche

Sample Input and Fragmentation. DNA fractionated into small, 300- to 800-basepair fragments.

Library Preparation. Using a series of standard molecular biology techniques, short adaptors (A and B) - specific for both the 3' and 5' ends - are added to each fragment. One Fragment = One Bead. The single-stranded DNA library is immobilized onto DNA Capture Beads. Each bead carries unique single-stranded DNA fragment The bead-bound library is emulsified in a water-in-oil mixture emPCR (Emulsion PCR) Amplification. Each unique library fragment is amplified within its own oil droplet microreactor, resulting several million per bead.

One Bead = One Read. Fragments are loaded onto a PicoTiterPlate, sized to allow only one bead per well. With addition of polymerase and nucleotide(s), one complementary to template strand gives chemiluminescence read by CCD camera

Data Analysis. The combination of signal intensity and positional information generated across the PicoTiterPlate device allows software to determine sequence of 1,000,000 individual reads per 10-hour run; de novo assembly of up to 400 mB

Helicos sequencer – hybridization-based method Nice company video at: http://www.helicosbio.com/Technology/TrueSingleMoleculeSequencing/tSMStradeHowItWorks/tabid/162/Default.aspx Step 1 Step 2 Step 3

• Helicos technology can give up to 1 billion base pairs per day – Average read length approximately 30 bp • Dr. Stephen Quake, a professor of bioengineering at Stanford University and a co-founder of Helicos BioSciences, sequenced his own genome using the Helicos instrument for under $50,000 in reagents

Page 7: BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011 DNA ... · 2019-03-29 · BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011

7

Direct DNA sequencing during synthesis, Pacific Biosciences http://www.pacificbiosciences.com/assets/files/pacbio_technology_backgrounder.pdf

Also see publication: Eid et al. 2009. Science 323 (5910): 133. Technical Innovations required Small volumes to generate single molecules reacting Fluorescence read as pyrophosphate leaves

Overall steps in DNA polymerase-dependent direct DNA sequencing

Page 8: BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011 DNA ... · 2019-03-29 · BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011

8

DNA Sequencing Strategies in Genomics “Shotgun” - Random fragmentation, sequencing, and assembly

(a) gap, or unsequenced region (b) overlap, assemble into larger “contig” (contiguous piece)

Poisson Calculations for Estimating “Coverage” in Shotgun Sequencing A simple formula can be used to estimate what percent of a clone will be sequenced for a certain level of random sequencing (Lander & Waterman 1988). The table below shows several examples. For example, if you obtain 100 kb of random sequence from a 100 kb clone (1 x coverage) then you would expect to have 63% of that clone sequenced. Some regions of the clone will have been sequenced several times and other regions will not have been sequenced at all.

These numbers assume random events. Actual results may deviate from these, perhaps the most significant reason being non-random shotgun cloning.

Lander ES, Waterman MS (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2, 231-239.

Fold coverage Percent of clone sequenced

0.25 x 22%

0.50 x 39%

0.75 x 53%

1 x 63%

2 x 88%

3 x 95%

4 x 98%

5 x 99.4%

6 x 99.75%

7 x 99.91%

8 x 99.97%

9 x 99.99%

10 x 99.995%

Page 9: BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011 DNA ... · 2019-03-29 · BioC 5361 Microbial Genomics and Bioinformatics Lecture 8, September 29, 2011

9

Dealing with genome-sized DNA fragments First issue – DNA replicons are fragile to conditions that break bacterial cells open

E. coli cell enzymatically lysed showing chromosomal DNA

exploding out.

How could one get this DNA out intact? And get plasmids separately?

SOLUTION: Grow and lyse cells in agar plugs and break open enzymatically.

Second issue – Must be able to isolate large DNA fragments, replicons of millions of basepairs SOLUTION: Pulse field gel electrophoresis –oscillating direction of electric field

Directions of electric field pulses known to be effective are shown below:

Schematic diagrams of published pulsed field gel systems. Nomenclature: PFGE-pulsed field gradient gel electrophoresis, OFAGE-orthogonal field alternation gel electrophoresis, TAFE- transverse alternating field electrophoresis, FIGE- field inversion gel electrophoresis, CHEF- contour clamped homogeneous electric field, crossed field gel electrophoresis (Waltzer), and ST/RIDE- simultaneous tangential/rectangular inversion decussate electrophoresis. (Figure 2.1, pg. 8, Pulsed Field Gel Electrophoresis: A Practical Guide).

Pulse-Field Gel Run With Bacteria and Yeast Vibrio sp. that have 2 chromosomes E.coli a yeast

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Sizes (Mb)

5.7

4.6 3.5

1.8