The past, present, and future of DNA sequencing Dan Russell.

80
The past, present, and future of DNA sequencing Dan Russell

Transcript of The past, present, and future of DNA sequencing Dan Russell.

Page 1: The past, present, and future of DNA sequencing Dan Russell.

The past, present, and future of DNA sequencingDan Russell

Page 2: The past, present, and future of DNA sequencing Dan Russell.

Overview• Prologue: Assembly and Finishing• The Past: Sanger• The Present: Next-Gen (454, Illumina, …)• The Future: ? (Nanopore, MinION, Single-molecule)

Page 3: The past, present, and future of DNA sequencing Dan Russell.

Overview• Prologue: Assembly and Finishing• The Past: Sanger• The Present: Next-Gen (454, Illumina, …)• The Future: ? (Nanopore, MinION, Single-molecule)

Page 4: The past, present, and future of DNA sequencing Dan Russell.

Method Read Length

Sanger 600-1000 bp

454 300-500 bp

Illumina ~100 bp

Ion Torrent ~200 bp

But…• Phage Genome: 30,000 to 500,000 bp• Bacteria: Several million bp• Human: 3 billion bp

Page 5: The past, present, and future of DNA sequencing Dan Russell.

Shotgun Genome Sequencing

Complete genome copiesFragmented genome chunks

Page 6: The past, present, and future of DNA sequencing Dan Russell.

Shotgun Genome Sequencing

Fragmented genome chunks

NOT REALLY DONE BY DUCK HUNTERSHydroshearing, sonication, enzymatic

shearing

Page 7: The past, present, and future of DNA sequencing Dan Russell.

All the King’s horses and all the King’s men…

ATTGTTCCCACAGACCGCGGCGAAGCATTGTTCC ACCGTGTTTTCCGACCG

TTTCCGACCGAAATGGCTTGTTCCCACAGACCGTGAGCTCGATGCCGGCGAAGATGCCGGCGAAGCATTGT

TAATGCGACCTCGATGCCACAGACCGTGTTTCCCGA

AAGCATTGTTCCCACAG TGTTTTCCGACCGAAATCCGACCGAAATGGCTCCTGCCGGCGAAGCCTTGT

Assembly, aka

Page 8: The past, present, and future of DNA sequencing Dan Russell.

Dan’s recommended assemblers

Your Sequencing Technology Recommended Assembler

Sanger phredPhrap

Ion Torrent/454 Newbler

Illumina velvet

REGARDLESS OF ASSEMBLY PROGRAM, I’D RECOMMEND USING CONSED FOR FINISHING!

Page 9: The past, present, and future of DNA sequencing Dan Russell.

Some words have special meanings in scientific context

THEORY

FINISH

Special use of the word “finish”

Before annotation, phage genomes should be sequenced AND finished.

Page 10: The past, present, and future of DNA sequencing Dan Russell.

What is finishing?

Page 11: The past, present, and future of DNA sequencing Dan Russell.

When we put all the reads back together this time:

GAP!

But now we at least know the sequence on each side, so we can design primers to run a sequencing reaction towards the gap, and hopefully connect our contigs.

What is finishing?

Page 12: The past, present, and future of DNA sequencing Dan Russell.

What is finishing?

Page 13: The past, present, and future of DNA sequencing Dan Russell.

What is finishing?

A combination of computer and wet-bench work to ensure that the entire genome sequence is present and that all bases are high quality.

Page 14: The past, present, and future of DNA sequencing Dan Russell.

1. Shotgun sequencing to generate reads

2. Assembly of reads

3. Identification of weak areas

4. Targeted sequencing runs to fix

5. Verification of finished sequence

6. Generation of final fasta file

Done for all phages sequenced at Pitt

Done by most independent seq facilities

NOT DONE by most seq facilities

From DNA to Annotatable Sequence

Page 15: The past, present, and future of DNA sequencing Dan Russell.

1. Shotgun sequencing to generate reads

2. Assembly of reads

3. Identification of weak areas

4. Targeted sequencing runs to fix

5. Verification of finished sequence

6. Generation of final fasta file

Done for all phages sequenced at Pitt

Done by most independent seq facilities

NOT DONE by most seq facilities = “FINISHING”

From DNA to Annotatable Sequence

Page 16: The past, present, and future of DNA sequencing Dan Russell.

Overview• Prologue: Assembly and Finishing• The Past: Sanger• The Present: Next-Gen (454, Illumina, …)• The Future: ? (Nanopore, MinION, Single-molecule)

Page 17: The past, present, and future of DNA sequencing Dan Russell.

Fragments were cloned:

Page 18: The past, present, and future of DNA sequencing Dan Russell.
Page 19: The past, present, and future of DNA sequencing Dan Russell.

Sanger Sequencing ReactionsFor given template DNA, it’s like PCR except:

Uses only a single primer and polymerase to make new ssDNA pieces.

Includes regular nucleotides (A, C, G, T) for extension, but also includes dideoxy nucleotides.

A

A

AA

A A

AG

A

T

CC

C

C

CC

CT

TT

T

TG

G

G

G

G

G

Regular Nucleotides

Dideoxy Nucleotides

AA

AA AT

CC

CT

TT

TG

G

G

G

G

1. Labeled2. Terminators

Page 20: The past, present, and future of DNA sequencing Dan Russell.

Sanger Sequencing

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer G T C T T G G G C T

Page 21: The past, present, and future of DNA sequencing Dan Russell.

Sanger Sequencing

G T C T T G G G C T A G C G CA C G C G C C G G G T C A G A A C C C G A T C G C G

5’3’

5’T G C G C G G C C C A

Primer

G T C T T G G G C T5’

T G C G C G G C C C A 21 bp

Page 22: The past, present, and future of DNA sequencing Dan Russell.

Sanger Sequencing

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

G T C T T G G G C T A G C G C5’

T G C G C G G C C C A

G T C T T G G G C T5’

T G C G C G G C C C A 21 bp

26 bp

5’T G C G C G G C C C A

Primer G T C T T G G G C T A

Page 23: The past, present, and future of DNA sequencing Dan Russell.

Sanger Sequencing

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

G T C T T G G G C T A G C G C5’

T G C G C G G C C C A

G T C T T G G G C T5’

T G C G C G G C C C A 21 bp

26 bp

5’T G C G C G G C C C A G T C T T G G G C T A 22 bp

5’T G C G C G G C C C A

Primer G

Page 24: The past, present, and future of DNA sequencing Dan Russell.

Sanger Sequencing

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

G T C T T G G G C T A G C G C5’

T G C G C G G C C C A

G T C T T G G G C T5’

T G C G C G G C C C A 21 bp

26 bp

5’T G C G C G G C C C A G T C T T G G G C T A 22 bp

5’T G C G C G G C C C A G 12 bp

5’T G C G C G G C C C A

Primer G T C T T G G G C

Page 25: The past, present, and future of DNA sequencing Dan Russell.

Sanger Sequencing

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

G T C T T G G G C T A G C G C5’

T G C G C G G C C C A

G T C T T G G G C T5’

T G C G C G G C C C A 21 bp

26 bp

5’T G C G C G G C C C A G T C T T G G G C T A 22 bp

5’T G C G C G G C C C A G 12 bp

5’T G C G C G G C C C A G T C T T G G G C 20 bp

5’T G C G C G G C C C A

Primer G T C T T

Page 26: The past, present, and future of DNA sequencing Dan Russell.

Sanger Sequencing

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

G T C T T G G G C T A G C G C5’

T G C G C G G C C C A

G T C T T G G G C T5’

T G C G C G G C C C A 21 bp

26 bp

5’T G C G C G G C C C A G T C T T G G G C T A 22 bp

5’T G C G C G G C C C A G 12 bp

5’T G C G C G G C C C A G T C T T G G G C 20 bp

5’T G C G C G G C C C A G T C T T 16 bp

Page 27: The past, present, and future of DNA sequencing Dan Russell.

Sanger Sequencing

A C G C G C C G G G T ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?5’3’

G T C T T G G G C T A G C G C5’

T G C G C G G C C C A

G T C T T G G G C T5’

T G C G C G G C C C A 21 bp

26 bp

5’T G C G C G G C C C A G T C T T G G G C T A 22 bp

5’T G C G C G G C C C A G 12 bp

5’T G C G C G G C C C A G T C T T G G G C 20 bp

5’T G C G C G G C C C A G T C T T 16 bp

Page 28: The past, present, and future of DNA sequencing Dan Russell.

5’T G C G C G G C C C A G T C T T G G G 19 bp

5’T G C G C G G C C C A G T C T T G G G C T A 22 bp

Sanger Sequencing

G T C T T G G G C T5’

T G C G C G G C C C A 21 bp5’

T G C G C G G C C C A G T C T T G G G C 20 bp5’

T G C G C G G C C C A G 12 bp5’

T G C G C G G C C C A G T 13 bp5’

T G C G C G G C C C A G T C T T 16 bp5’

T G C G C G G C C C A G T C 14 bp5’

T G C G C G G C C C A G T C T 15 bp5’

T G C G C G G C C C A G T C T T G 17 bp5’

T G C G C G G C C C A G T C T T G G 18 bpLase

r R

eader

Page 29: The past, present, and future of DNA sequencing Dan Russell.

Sanger Sequencing OutputEach sequencing reaction gives us a chromatogram, usually ~600-1000 bp:

Page 30: The past, present, and future of DNA sequencing Dan Russell.

Sanger Throughput Limitations• Must have 1 colony picked for every 2

reactions• Must have 1 PCR tube for each reaction• Must have 1 capillary for each reaction

from The Economist

Improvements in costfrom making Sangerhigher throughput

Improvements in costfrom Next-Gensequencing technologies

Page 31: The past, present, and future of DNA sequencing Dan Russell.

Overview• Prologue: Assembly and Finishing• The Past: Sanger• The Present: Next-Gen (454, Illumina, …)• The Future: ? (Nanopore, MinION, Single-molecule)

Page 32: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by Ion Torrent Personal Genome Machine and 454

Page 33: The past, present, and future of DNA sequencing Dan Russell.

GenomicFragment

Adapters

Shotgun sequencing by PGM/454

Page 34: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

GenomicFragment

Barcode

Page 35: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Page 36: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Bead/ISP

AdapterComplementSequences

The idea is that each bead should be amplified all over with a SINGLE library fragment.

Page 37: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Page 38: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Page 39: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Page 40: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Page 41: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Page 42: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Page 43: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Page 44: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Page 45: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Page 46: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Page 47: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

Page 48: The past, present, and future of DNA sequencing Dan Russell.

Shotgun sequencing by PGM/454

~3.5 µm for Ion Torrent, ~30 µm for 454

Page 49: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

TTT

T T

Shotgun sequencing by PGM/454

Page 50: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

CCC

C C

Shotgun sequencing by PGM/454

Page 51: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

AAA

A A

Shotgun sequencing by PGM/454

Page 52: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

GGG

G GG

Shotgun sequencing by PGM/454

Page 53: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

G

TTT

T TT

Shotgun sequencing by PGM/454

Page 54: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

G T

CCC

C CC

Shotgun sequencing by PGM/454

Page 55: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

G T C

AAA

A A

Shotgun sequencing by PGM/454

Page 56: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

G T C

GGG

G G

Shotgun sequencing by PGM/454

Page 57: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

G T C

TTT

T TT T

Shotgun sequencing by PGM/454

Page 58: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

G T C T T

CCC

C C

Shotgun sequencing by PGM/454

Page 59: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

G T C T T

AAA

A A

Shotgun sequencing by PGM/454

Page 60: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

G T C T T

GGG

G GG G G

The real power of this method is that it can take place in millions of tiny wells in a single plate at once.

Shotgun sequencing by PGM/454

Page 61: The past, present, and future of DNA sequencing Dan Russell.

A C G C G C C G G G T C A G A A C C C G A T C G C G 5’3’

5’T G C G C G G C C C A

Primer

Only give polymerase one nucleotide at a time:

If that nucleotide is incorporated, enzymes turn by-products into light:

T C A G T C A G T C A G

1

2

3

4

5

G T C T T

GGG

G GG G G

The real power of this method is that it can take place in millions of tiny wells in a single plate at once.

Raw 454 data

Page 62: The past, present, and future of DNA sequencing Dan Russell.

Ion Torrent Sequencing

Page 63: The past, present, and future of DNA sequencing Dan Russell.

Illumina Sequencing

Page 64: The past, present, and future of DNA sequencing Dan Russell.

Next-Gen SequencingTake home message: Massively Parallel

1,000 monkeys at 1,000 typewriters is nothing

We’re talking 100,000 to 100 million concurrent reads

Page 65: The past, present, and future of DNA sequencing Dan Russell.

Overview• Prologue: Assembly and Finishing• The Past: Sanger• The Present: Next-Gen (454, Illumina, …)• The Future: ? (Nanopore, MinION, Single-molecule)

Page 66: The past, present, and future of DNA sequencing Dan Russell.

Largely because of PHIRE and SEA-PHAGES…

Page 67: The past, present, and future of DNA sequencing Dan Russell.

DNA Sequencing over Time

from The Economist

Page 68: The past, present, and future of DNA sequencing Dan Russell.
Page 69: The past, present, and future of DNA sequencing Dan Russell.

Single Molecule Sequencing

Page 70: The past, present, and future of DNA sequencing Dan Russell.
Page 71: The past, present, and future of DNA sequencing Dan Russell.

“The MinION has been used to successfully read the genome of a lambda bacteriophage, which has 48,500-ish base pairs, twice during one pass. That's impressive, because reading 100,000 base pairs during a single DNA capture has never been managed before using traditional sequencing techniques.

The operational life of the MinION is only about six hours, but during that time it can read more than 150 million base pairs. That's somewhat short of the larger human chromosomes (which contain up to 250 million base pairs), but Oxford Nanopore has also introduced GridION -- a platform where multiple cartridges can be clustered together. The company reckon that a 20-node GridION setup can sequence a complete human genome in just 15 minutes.”

—Wired

Page 72: The past, present, and future of DNA sequencing Dan Russell.
Page 73: The past, present, and future of DNA sequencing Dan Russell.

Epilogue

So should we really still be sequencing more mycobacteriophage genomes? We have 250+…

Page 74: The past, present, and future of DNA sequencing Dan Russell.

Chimps vs. Humans

Cluster A vs. Cluster B Mycobacteriophages

At the DNA level…

> 95% similar

< 50% similar

…but that’s just one pair of clusters, how many are there?

Page 75: The past, present, and future of DNA sequencing Dan Russell.

DNA Sequencing over Time

from The Economist

Page 76: The past, present, and future of DNA sequencing Dan Russell.

Comparing Different Technologies

Advantages Disadvantages

Lowest error rate

Long read length (~750 bp)

Can target a primer

High cost per base

Long time to generate data

Need for cloning

Amount of data per run

Sanger Sequencing

Page 77: The past, present, and future of DNA sequencing Dan Russell.

Comparing Different Technologies

Advantages Disadvantages

Low error rate

Medium read length(~400-600 bp)

Relatively high cost per base

Must run at large scale

Medium/high startup costs

454 Sequencing

Page 78: The past, present, and future of DNA sequencing Dan Russell.

Comparing Different Technologies

Advantages Disadvantages

Low startup costs

Scalable (10 – 1000 Mb of data per run)

Medium/low cost per base

Low error rate

Fast runs (<3 hours)

New, developing technology

Cost not as low as Illumina

Read lengths only ~100-200 bp so far

Ion Torrent Sequencing

Page 79: The past, present, and future of DNA sequencing Dan Russell.

Comparing Different Technologies

Advantages Disadvantages

Low error rate

Lowest cost per base

Tons of data

Must run at very large scale

Short read length(50-75 bp)

Runs take multiple days

High startup costs

De Novo assembly difficult

Illumina Sequencing

Page 80: The past, present, and future of DNA sequencing Dan Russell.

Comparing Different Technologies

Advantages Disadvantages

Can use single molecule as template

Potential for very long reads(several kb+)

High error rate (~10-15%)

Medium/high cost per base

High startup costs

PacBio Sequencing