High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark...

48
High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers

Transcript of High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark...

Page 1: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

High-Throughput Sequencing Technologies

Biological Sequence Analysis

BNFO 691/602 Spring 2014

Mark Reimers

Page 2: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Outline

• What can we do with next-generation sequencing?– De novo sequencing of simple genomes– Re-sequence individual variations– Generate genome-wide quantitative data for a

variety of assays• What technologies are now available and

which are up-and-coming?– Roche, Illumina, SOLiD, Ion Torrent, etc…

Page 3: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

What is High-Throughput Sequencing?

• Generating many thousands or millions of short (30 to 1,000 base) sequences by sequencing parts of longer (200+ base) DNA fragments

• Most research uses reads from one end of a fragment (single-end), but most technologies can be adapted to make paired-end reads on opposite strands

Page 4: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Full Genome Re-sequencing has been done for many cancers and rare clinical disorders

Page 5: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Exome sequencing is a cost-effective to identify de novo protein coding mutations

Page 6: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Targeted re-sequencing of a few relevant genes can identify diverse critical mutations

across a large number of cases

Page 7: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

RNA-seq

Page 8: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

ChIP-seq

Page 9: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

DNA methylation profiling

mC CC U

C C

U T

After PCR

PCR+Seq

Page 10: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

DNAse Hyper-sensitivity

• DNAse I enzyme cuts DNA • Much more likely to cut at open chromatin• Two approaches:

– Cut slowly then fragment and sequence ends– Cut rapidly then sequence short fragments

Page 11: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Mapping of chromatin interactions (5C)

(courtesy Elemento lab)

Page 12: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

HTS Technologies

• Roche-454 (will close 2016)• Illumina• SOLiD• Ion Torrent• Newer Technologies• Outlook

Page 13: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Founded by Jonathan Rothberg as a secret project (code-named ‘454’) within CuraGen

Page 14: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Roche 454 Sequencing

Metzker, NG 2010

Page 15: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Roche 454 Sequencing

Page 16: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Roche 454 Peak Heights Data

Page 17: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Advantages & Drawbacks

• PROLong reads are uniquely identifiable

Relatively quick ~20 hours total• CON

Cost is relatively high

Frequent errors in runs of bases

Frequent G-A transitions

Page 18: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Best Uses of Roche 454

• De novo small genome (prokaryote or small eukaryote genome) sequencing

• Metagenomics by16S profiling• Used to be best for metagenomics by

random sequencing – new long reads from Illumina are competitive

• Targeted re-sequencing of small samples

Page 19: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
Page 20: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Illumina (Solexa) Genome Analyzer and Flow Cell

Page 21: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Illumina On-Chip Amplification

Page 22: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Illumina (Solexa) Sequencing

Page 23: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Paired-End Illumina Method

Paired-end reads are easy on Illumina because the clusters are generated by ligated linkers.Different linkers and primers are attached to each end

Page 24: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Advantages & Drawbacks

• PRO– Very high throughput– Most widespread technology so that

comparisons seem easier• CON

– Sequencing representation biases, especially at beginning

– Slow – up to a week for a run

Page 25: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Best Uses of Illumina

• Expression analysis (RNA-Seq)• Chromatin Immunoprecipitation (ChIP-

Seq)• Metagenomics by random sequencing

Page 26: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

SOLiD Sequencing by Oligonucleotide

Ligation and Detection

Page 27: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

SOLiD History

• George Church licensed his ‘polony’ technique to Agencourt Personal Genomics

• ABI acquired the SOLiD technology from Agencourt in 2006

Page 28: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

SOLiD Preparation Steps• Prepare either

single or ‘mate-pair’ library from DNA fragments

• Attach library molecules to beads; amplify library by emulsion PCR

• Modify 3’ ends of clones; attach beads to surface

Page 29: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Emulsion PCR • Emulsion PCR isolates individual DNA molecules

along with primer-coated beads in aqueous droplets within an oil phase. A polymerase chain reaction (PCR) then coats each bead with clonal copies of the DNA molecule. The bead is immobilized for sequencing.

Page 30: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
Page 31: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

ABI SOLiD Sequencing Cycle

Page 32: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

SOLiD Reads Each Base Twice

Most bases are matched by two primers in different ligation cycles

Page 33: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

SOLiD Color Coding Scheme

Blue is color of

homopolymer

runs

If you translate color reads directly into base reads then every sequence with an error in the color calls will result in a frame-shift of the base calls. it is best to convert the reference sequence into color-space. There is one unambiguous conversion of a base reference sequence into color-space, but there are four possible conversions of a color string into base strings

Page 34: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Advantages & Drawbacks

• PRO– Very high throughput– Di-base ligation ensures built-in accuracy check

• Low error rate for low-coverage

– Can handle repetitive regions easily• CON

– Strong cycle-dependent biases (can be modeled and partly overcome – see Wu et al, Nature Methods, 2011)

– Low quality color calls (Phred < 20) are common– Reported problems with paired ends – most mapped

tags don’t map to the same chromosome

Page 35: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
Page 36: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Ion Torrent Sample Prep

• Emulsion PCR loads copies of unique sequences onto beads

• One bead is deposited in each well of a micro-machined plate

Page 37: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

An Ion Torrent Chip

From Ion Torrent promotional material

Page 38: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

When a nucleotide is incorporated into a strand of DNA by a polymerase, a hydrogen ion is released

From Ion Torrent promotional material

Page 39: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Ion Torrent Sequencing Process

As in 454, nucleotides are washed over the nascent strand in a prescribed sequence. Each time a nucleotide is incorporated, hydrogen ions are released and detected.

From Ion Torrent promotional material

Page 40: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Newest Machine – Ion Proton

• $150K per machine• Ion Proton I chip has 165 million sensors

– Intended for exomes• Ion Proton II chip has 660 million sensors

– 50X more than 318 chip– Claim $1K genome this year

Page 41: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Ion Torrent Signals

• Like 454, a series of pH signals over time as different nucleotides are added

From promotional literature

Page 42: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Ion Torrent Signals• Like 454, the reads don’t always make

integer multiples, and some guessing is needed

Page 43: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Ion Torrent Advantages & Drawbacks

• PRO– Very high throughput potential– Very fast (an afternoon)

• CON– Homopolymer run errors are still a

problem, but less so recently– Very uneven loading of sequences wastes

a lot of real estate on the chips– No prospect of paired-end reads

Loading Density

Homopolymer error rates

Page 44: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Newer Technologies

• Complete Genomics• Pacific Biosciences• Oxford Nanopore

Page 45: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Complete Genomics

• Service company only – no equipment sales

• ~$4,000 per human genome (2011 price)• DNA Nanoball technology generates

paired-end sequences plated at high density

• Sequenced by ligation

Page 46: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Pacific Biosciences• Single-molecule real-time (SMRT)

sequencing by circular strand technology using semiconductor technology

• Long reads promised at under $200 per genome

• High random error rates reported early

• Seems better now

Page 47: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Signals from Pac Bio Can Detect mC

From Agarwal et al, Nature Methods

Page 48: High-Throughput Sequencing Technologies Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.

Oxford Nanopore

• Single-molecule sequencing by threading DNA through a protein nanopore

• GridION is a general technology for sequencing polymers by measuring current – can do polypeptides also