Whole Genome Sequencing by Next- Generation Methods...

Whole Genome Sequencing by Next-Generation Methods: Genome-forward Medicine

Elaine R. Mardis, Ph.D.

2011 ASCP Annual Meeting

Elaine R. Mardis, Ph.D.

Professor of Genetics

Washington University School of Medicine

St. Louis MO

I have the following conflicts to declare:

� Speaker’s Bureau: Illumina, Inc.

� Scientific Advisory Board: Pacific Biosciences, Inc.

Conflict of Interest Slide


� Stockholder: Life Technologies, Inc.

� Next-Generation Sequencers (NGS)� Roche/454� Illumina� Life Technologies

� Third Generation Sequencers� Pacific Biosciences� Ion Torrent

MiSeq

Overview


� MiSeq� Oxford Nanopore

� NGS Practical Considerations� Coverage: depth and breadth� Error model� Representation bias

� NGS Applications

What is “Genome-Forward” Medicine?

� Informing the physician about genomic aspects of a patient’s diagnosis, using next-generation sequencing methods in place of conventional approaches.

� In certain applications of genome-forward

Defining “Genome-Forward” Medicine


� In certain applications of genome-forward medicine, having information about the patient’s genome, or that of their microbial or viral infecting genome, may better inform the physician about treatment choices and in turn, improve patient response.

The Trajectory of Throughput: 10 years


E.R. Mardis, Nature (2011) 470: 198-203

Comparative costs: sequencing a human genome


Capillary technology

Applied Biosystems 3730xl

(2004)

$15,000,000

Next-gen technology

Illumina HiSeq (2011)

$10,000

Next-generation sequencers

The Fundamentals


Next-generation DNA sequencing instruments

� All commercially-available sequencers have the following shared attributes:� Random fragmentation of starting DNA, ligation with custom linkers =

“a library”� Library amplification on a solid surface (either bead or glass)� Direct step-by-step detection of each nucleotide base incorporated

during the sequencing reaction

Next-generation DNA sequencing instruments


during the sequencing reaction� Hundreds of thousands to hundreds of millions of reactions imaged

per instrument run = “massively parallel sequencing”� Shorter read lengths than capillary sequencers� A “digital” read type that enables direct quantitative comparisons� A sequencing mechanism that samples both ends of every fragment

sequenced (“paired end” reads)

Paired-end reads

� All next-gen platforms now offer methods to derive sequence data from each end of the library fragments.

� Differences exist in the _distance_ between read pairs, based on the approach/platform.� “paired ends” : linear fragment with ability to sample both ends in

What are paired-end reads?


� “paired ends” : linear fragment with ability to sample both ends in separate reaction

� “mate pairs” : circularized fragment of >1kb, sequenced either by a single reaction read or two end read (platform dependent)

� In general, paired end reads offer advantages for human whole genome sequencing due to the repetitive nature of our DNA, and the difficulty in accurate placement (“mapping”) of NGS reads.

The Human Genome Reference• Mapping short reads to the genome reference sequence is a

required step for next-generation sequence data analysis, regardless of the preparatory steps used

• Once mapped to the genome, “localized” assembly of select reads can be used to define genomic alterations at single nucleotide resolution

• The human genome sequence serves as a “reference” to which we can compare other human genomes…

Read alignment to the Human Reference Genome


we can compare other human genomes…

• SNPs (single nucleotide polymorphisms)

• Point mutations (maintain or change amino acids)

• Insertion/deletions (add or remove >1base in the sequence)

• Translocations of chromosome fragments (inter- or intra-)

• Inversions (an entire fragment swaps ends)

• Amplification (multiple repeated copies of a genome segment) or Deletion (large blocks removed)

Roche/454 Library Prep

Roche/454 Library Prep

Single Stranded Adapter Ligated Library•Random Fragmentation•Adapter Ligation


•Adapter Ligation

• emPCR for clonalamplification

Roche/454 Pyrosequencing

Roche/454 Pyrosequencing

Centrifugation

Load Enzyme BeadsLoad beads into

PicoTiter™Plate


DNA Capture BeadContaining Millions of

Copies of a Single Clonal Fragment

A A T C G G C A T G C T A A A A G T C A

Anneal PrimerT

PPi

ATP

Light + oxy luciferin

Sulfurylase

Luciferase

APS

luciferin

Sequencing by Synthesis

454 Instrumentation Specifics

Instrument

Run

Time

(hr)

Read

Length

(bp)

Yield

(Mb/run

)

Error

Type

Error

Rate

(%)

Purchase

Cost

(x1000)

454 FLX+ 18-20 700 900 Indel 1 $30 A

454 FLX Titanium 10 400 500 Indel 1 $500

454 GS Jr. Titanium

10 400 50 Indel 1 $108

A – Requires the 454 FLX Titanium. This is the upgrade cost.


– Requires the 454 FLX Titanium. This is the upgrade cost.

Notable:

• Mate pair paired end reads of 3kb, 8kb and 20 kb separation without an increase in run time.

• Cost per run makes sequencing an entire human genome cost-prohibitive relative to other technologies (~ $20/Mbp)

• Great platform for targeted validation

Illumina Sequencing: Library Preparation

Illumina Sequencing: Library Preparation


Automated ProcessingLow gDNA Inputs

Illumina Sequencing by Synthesis

Emission

Incorporate

Detect

De-block


Excitation

De-block

Cleave

fluor

Illumina Instrumentation SpecificsKey updates

• 2010: HiSeq 2000

• Two flow cells per run

• 100 Gbp/FC or two genome equivalents per run

• New scanning mechanics - scans both surfaces of FC lanes

• 2011: HiSeq 2000

• Improved chemistry (v. 3): increased yield and accuracy

• 2011: MiSeq


Instrument

Run

Time

(days)

Read

Length

(bp)

Yield

(Gb/run)

Error

Type

Error

Rate

(%)

Purchase

Cost

(x1000)

GAIIx 14 150 x 150 96 Sub >0.1 $525

HiSeq 2000 8 100 x 100 200 x 2 Sub >0.1 $700

HiSeq 2000 v3

10 100 x 100 <600 Sub >0.1 $700

MiSeq 1 150 x 150 1 Sub* >0.1* $125

Life Technologies: sequencing by ligation

• custom adapter library• emPCR on magnetic beads• sequencing by ligation usingfluorescent probes from acommon primer• sequential rounds of ligation


• sequential rounds of ligationfrom a series of primers• fixed/known nucleotides foreach probeset identify twobases each cycle, or“two base encoding”

SOLiD Instrumentation Specifics

Instrument

Run

Time

(days)

Read

Length

(bp)

Yield

(Gb/run)

Error

Type

Error

Rate

(%)

Purchase

Cost

(x1000)

SOLiD 4 12 50 x 35 PE 71 A-T Bias >0.06 $475

SOLiD 5500 xl 875 x 35 PE

60 x 60 MP

155 A-T Bias >0.01 $595

5500 xl

•


• Front-end automation addresses bottlenecks at emPCR, breaking, and enrichment of beads

• 6-lane Flow Chip with independent lanes/2 per run

• Cost per whole genome data set is predicted to be $6K by 2011

• Very high accuracy data due to two-base encoding

• ECC Module – An optional 6th primer that increases accuracy to 99.999%

• Direct conversion of color space to base space

• True paired-end chemistry enabled – Ligation reaction can be used in either direction

Third generation sequencers??

� Recently, new sequencing platforms were introduced.� The Pacific Biosciences sequencer is a single molecule

detection system that marries nanotechnology with molecular biology.

� The Ion Torrent uses pH rather than light to detect nucleotide incorporations.

Third Generation Sequencing Instruments


nucleotide incorporations.� The MiSeq is a scaled down version of the HiSeq, with

faster chemistry and scanning.� All offer a faster run time, lower cost per run, reduced

amount of data generated relative to 2nd Gen platforms, and the potential to address genetic questions in the clinical setting.

Comparisons to Third-Generation Sequencers

CompanyPlatform

NameSequencing

Amplificatio

n

Run

Time

Roche 454 TiDNA Polymerase“Pyrosequencing”

emPCR 10 hours

IlluminaHi-

Seq/MiS DNA PolymeraseBridge

10 days/24


Illumina Seq/MiSeq

DNA Polymeraseamplification

days/24 hours

LifeSOLiD/5

500DNA Ligase emPCR 12 days

Ion Torrent PGMSynthesis

H+ detectionemPCR 2 hours

PacificBiosciences

RS Synthesis NONE 45 min

Pacific Biosciences RS

Shearing

(Covaris/Hydroshear)

Polish ends

DNA polymerase binding

Load library/polymerase

complex onto SMRT cell

Sample Prep Library/Polymerase

Complex

Movie 1 (v1.1 & v1.2)

Raw

reads

Post-filter

reads reads

Mapped

reads

Movie 2 (only v1.2)

Raw

reads

Post-filter

reads reads

Mapped

reads

Sequencing


SMRTbell™ ligation

Sequencing

primer annealing

The image part with relationship ID rId3 was not found in the file.

Zero Mode Waveguides (ZMWs)

Ver. 1.1 = 1 x 45,000 per SMRT cell

Ver. 1.2 = 2 x 75,000 per SMRT cell

SMRTbell Library Types

Circular Consensus

Small ~250bpSingle MovieMultiple passesMany sub-reads

Standard

Large ~2kbpSingle MovieFew sub-reads

VLR (“very long read”)

Larger ~6kbp

Single 45 minute Movie

Single long read provides

linking information


Sub ReadsSub Reads

Reads

Consensus Read

Read

Pacific Biosciences RS Instrumentation Specifics

InstrumentRun Time

(Hours)

Read

Length

(bp)

Yield

(Mb)

Error

Type

Error

Rate

(%)

Purchase

Cost

(x1000)

RS14 (~8

SMRTCells)1500

45 per SMRTCell

Insertions 15 $695

mean mapped sub-read accuracy: 85.7% (±1.3%)

mean mapped sub-read length: 697 (± 501)

maximum mapped read length: 7,772 bp


maximum mapped read length: 7,772 bp

maximum mapped sub-read length: 5,601 bp

1x45 min movie

8 SMRT cells

Strobe polymerase/strobe reagent/strobe protocol (45 min movie)

2x45 min movies

8 SMRT cells

Strobe polymerase/standard reagent/standard protocol

Ion Torrent PGM Instrument


Data Output per Run Trajectory: Ion Torrent


At present, only the 314 and 316 chips are commercially available

Ion Torrent Yield with Automation Modules

Ion Torrent Data Yield Improves with Automation


Oxford Nanopore SequencingExonuclease-aided sequencing


Pore translocation sequencing

NGS: Practical Considerations

Importance of coverage, error model, representation bias


What is “coverage”?� Coverage is a general term to describe the

– fold oversampling of a DNA target by sequencing data

� In covering a target region or genome, increasing depth of coverage leads to increased certainty of

Importance of Coverage


variant detection

� Coverage levels may vary due to G+C content, amplification or deletion in the region, or other biases, the uniqueness of the target region (“mappability”), and the error model of the data type

� Coverage breadth is equally important as depth!

Alignments: Coverage breadth

Breadth-of-coverage

topography

Altered greatly by the application of a minimum depth filter.

Breadth of coverage attrition through a range of minimum depth filters (1x, 5x, 10x, 15x,

Coverage breadth topography

Breadth

97.3%

67.3%

Depth

1X

5X


depth filters (1x, 5x, 10x, 15x, 20x) for a given region of interest (chr7:930225-931129).

Note the breadth of coverage drops from almost 100% at 1x depth requirement to ~20% when we require >= 20x depth for coverage membership.

22.2%

43.5%

27.8%

20X

10X

15X

Bias Assessment: GC content impacts coverage


� Whole Genome Sequencing (WGS)

� Hybrid Capture: Exome or Targeted sequencing

� Protein:DNA binding

� non-coding RNA sequencing: discovery/variants

Applying Next-generation Sequencing


non-coding RNA sequencing: discovery/variants

� Transcriptome sequencing (RNA-seq)

� Genome-wide methylation of DNA (Methyl-seq)

� Clinical sequencing for therapeutic decisions

E.R. Mardis, Annual Reviews in Genetics & Genomics (2008)

E.R. Mardis, Nature (2011) 470: 198-203

Whole Genome Sequencing Process

• Prepare paired end libraries as whole genome fragment/shotgun by

random shearing of genomic DNA,

adapter ligation, size selection.

• Produce paired end data from each

end of billions of library fragments,

WGS: Data Production and Alignment


end of billions of library fragments,

over-sampling about 30-fold to cover

at a depth sufficient to find all types

of genome alterations.

• Computer programs align the read

pair sequences onto the Human

Reference Genome and several

algorithms are used to discover

variants genome-wide.

Data Analysis Workflow for Variant Detection

30-fold coverage of tumor genomePE Illumina reads

Align read pairs to reference genome

Detect Single-Nucleotide Variantsand focused insertion/deletions


Detect anomalous read pair mapping, assemble reads and identify structuralvariations (inversions, translocations)

Use normalized read coverage levels andHMM-based algorithm to identify CNA andLOH regions

Design custom capture

probes for each putative

variant site Compare to

SNP array

analysis

Somatic Mutations in 50 AML Genomes


DNMT3A mutations in AML

Ley et al. NEJM 2010.

Indentifying Prognostic Mutations


�DNMT3A mutations are present in 22% of de novo AMLs, and 34% of cytogenetically normal patients.

�DNMT3A mutations are strongly associated with poor prognosis.

Hybrid Capture of Genomic Regions or Genes

• Hybrid capture - fragments from a whole genome library are selected by combining with probes that correspond to targeted genes.

• The probe DNAs are biotinylated, making selection from solution with streptavidin magnetic beads an effective means of purification.

• The human “exome” by definition,


• The human “exome” by definition, is the exons of all ~21,000 genes annotated in the human reference genome. Commercial exome kits target most but not all genes.

• Custom capture reagents can be synthesized to target specific loci that may be of interest in a clinical context.

Comparing Exome to Whole Genome Cancer Sequencing

• Exome sequencing costs ~1/10 WGS

• Simplified analysis• Sequence more samples

• WGS captures SVs and focal amp/dels not resolved by

vs.


amp/dels not resolved by SNP arrays

• Non-exonic mutations (“tier 2”) are likely to play a role in cancer

• Exome reagents do not capture all exons

• Tier 2/3 mutations facilitate heterogeneity analysis

Deep Digital Sequencing by Custom Hybrid Capture

� Custom capture probes can be designed to sample from somatic point mutations, in/dels and structural variant regions in the tumor genome.

� High depth next-generation sequencing of these captured regions provides a digital read-out of the allele frequency of each mutation in the tumor cell population.

Tumor Heterogeneity from Deep Read Count Analysis


mutation in the tumor cell population.

� The prevalence of a mutation reflects its ‘history’ in the tumor evolution. Older mutations are present at higher prevalence or allele frequency, newer ones at lower allele frequency.

� Using these data and statistical approaches, we can model the tumor heterogeneity for any sequenced sample, determining the mutational profile of each subclone and it’s proportion of the tumor cell population.

Mutation

Total reads = 1000Mutant reads = 500

All cells carry a heterozygous mutation

Total reads = 1000Mutant reads = 250

Wild-type & het mutant cells

“Deep Digital Sequencing”


Mutant reads = 500Mutant Allele Frequency = = 50%500

1000

Mutant reads = 250Mutant Allele Frequency = = 25%

2501000

Tumor heterogeneity: single dominant clone


Single cluster of somatic mutations

Tumor heterogeneity: multiple clones

Cluster 3: 88% tumor allele frequency



22% Tumor Content of Tumor Sample0.3% Tumor Content of Normal Sample


Tumor allele frequencies are adjusted for normal cell contaminationKernel density estimation determines the number of discrete clones

Comparing heterogeneity of primary and metastatic tumors

Primary = absentMetastasis= all cells

Primary tumor= heterozygous mutation present in all cells

Metastasis= heterozygous mutation present in all cells

Mutant Allele Frequency in Metastasis (%)


Mutant Allele Frequency in Primary Tumor (%)

Mutant Allele Frequency in Metastasis (%)

Tumor subpopulations: Single dominant clone

De Novo vs. Relapse: Monoclonal Disease

Clinical DataSex: MaleAge: 53Cyto: NormalTime to relapse: 235OS months: 30Induction: 7+3+3EConsolidation: HDACSCT: No


Tumor subpopulations: Multiple clones

De novo vs. Relapse: Oligoclonal Disease

Clinical DataSex: MaleAge: 68Cyto:NormalTime to relapse: 805OS months: 46.8Induction: 7+3DConsolidation: HDACSCT: No


SCT: No

Clonal Progression in AML1 Relapse

Relapse is based on clonal evolution


• Additional data from 7 AML tumor:relapse whole genome sequencing reveals that this scenario is common: a single dominant mutation cluster in a founder subclone persists through the chemotherapy and re-emerges with new somatic alterations.• Each tumor showed clear cut evidence of clonal evolution at relapse, and a higher frequency of transversions that were probably induced by DNA damage from chemotherapy.

Clonal Evolution: MDS to sAML


By WGS of the sAML tumor, we identified somatic mutations All validated mutations were evaluated by deep digital sequencing in both banked MDS and sAML to calculate allele frequency and heterogeneity

RNA Sequencing

RNA isolateSize selection for

nc RNA classes

Fragment,

RT w/randoms

polyA priming,

RT, ds DNA SAGE tags

Adapter-ligated fragments

for next-gen sequencing


for next-gen sequencing

Alignment to reference database

& discovery-Expression levels

-Novel splice isoforms

-Allelic bias in transcription

-Fusion transcripts

EN1 – a gene that is not expressed

RNA-seq informs therapeutic choices

Normal WGS


EN1 is not expressed (0 RNA-seq reads out of ~146 million mapped)

Tumor WGS

Tumor RNA-seq

Metastatic breast cancer (to brain)

IGV screenshot

RNA Correlation of Somatic Copy Number Alterations

RNA-seq provides correlative data

HER2

2011 ASCP Annual MeetingMetastatic breast cancer (to brain)

HER2 / ERBB2 is heavily amplified in this tumor

RNA-seq confirms the HER2, PR, & ER statusRNA-seq correlates with IHC Diagnosis

HER2 +ive • Gene expression values from RNA-seq

• Used to confirm HER2, PR, & ER status of each patient obtained by IHC

• Tumor is• HER2+, PR-, ER-

2011 ASCP Annual MeetingMetastatic breast cancer (to brain) vs. four primary HER2 +ive breast cancers

PR- ER-

Samples Data Coordination,

Analysis and

Concordance to

SubjectsGoals:

-Defining Core Microbiomes in

“healthy” subjects

-Changes in Microbiomes with

disease

Human Microbiome Project

Community

Phenotypes


Microbial

Communities

Reference

Sequences Metagenomics

16S rRNAWGS

Clinical DataSources of

strains

Phenotypes

Stability

of v

iromeover tim

e

HM

P: H

um

an V

irom

eS

tability

GI

Ora

l

2011 A

SC

P A

nnual M

eeting

Alphapapillomavirus

Polyomavirus

Roseolovirus

Lymphocryptovirus

Mastadenovirus

Alphapapillomavirus

Gammapapillomavirus

Human papillomavirus

Polyomavirus

Roseolovirus

Lymphocryptovirus

Mastadenovirus

Gammapapillomavirus

Human papillomavirus

Kris

tine W

ylie, W

UG

I Vagin

al

Nasal

400 MRSA Genomes

NGS of MRSA IsolatesLegend:

CD

MRSA.6326.326W87

MRSA.6330.330N40

MRSA.6347.347W106

MRSA.6365.G96

MRSA.6433.W211

MRSA.6433.N82

MRSA.6433.G99

MRSA.R103G

MRSA.6421.421N69

MRSA.6254.254W53

MRSA.6382.382W140

MRSA.6415.2790W210

MRSA.6415.4563W208

MRSA.6222.222N8

MRSA.6266.266.1G3

9

MRSA.6432.A4

6

MRSA.639

8.N80

MRSA.6

436.W21

2

MRSA.R1

03 714N

MRSA.R402 6488W

MRSA.R402N

MRSA.6370.370W132

MRSA.6342.342W101

MRSA.6341.341A26

MRSA.6331.331W92

MRSA.6412.4124774W170

MRSA.6224.224W20

MRSA.6329.329N39

MRSA.6329.329W90

MRSA.6245.245W45

MRSA.6245.245.1N21

MRSA.6245.245A12

MRSA.6245.245.1A13

MRSA.6245.245.1G25

MRSA.6216.216G6

MRSA.6216.216W14

MRSA.6246.246G26

MRSA.6246.246W46

MRSA.6326.326.1N37

MRSA.6231.231W27

MRSA.6422.4227588W179

MRSA.6422.422N70

MRSA.6331.331.1G55

MRSA.6332.332W95

MRSA.6209.209W5

MRSA.6386.N78

MRSA.6386.386A39

MRSA.6331.331.1G54

MRSA.6331.3314371W93

MRSA.6500.3300G82

MRSA.6501.3301A40

MRSA.6500.3300W155

MRSA.6348.348W107

MRSA.6374.374W134

MRSA.6335.335G56

MRSA.6344.344W102

MRSA.6345.345N44

MRSA.6345.345W103 M

RSA.6307.307A16

MRSA.6307.307N28

MRSA.6307.307G42

MRSA.6307.307W72

MRSA.6261.261G35

MRSA.6341.341.1N43

MRSA.6215.2157462W12

MRSA.6407.4076690W166

MRSA.6407.4074440W167

MRSA.6407.407N65

MRSA.6255.255G31

MRSA.6255.255W54

MRSA.6415.7010W209

MRSA.6415.N81

MRSA.6430.430G89

MRSA.6430.430W186

MRSA.6430.430N72

MRSA.6361.361W121

MRSA.6395.395W150

MRSA.6346.3463077W105

MRSA.6437.W213

MRSA.6437.N83

MRSA.6359.359W119

MRSA.6366.3667367W128

MRSA.6366.366W127

MRSA.6366.366G72

MRSA.6366.366N51

MRSA.6384.384A37

MRSA.6387.387G78

MRSA.6387.387N59

MRSA.6386.386W144

MRSA.6402.402N63

MRSA.6386.386G77

MRSA.6389.W206

MRSA.6389.A45

MRSA.6389.N79

MRSA.6401.4016575W160

MRSA.6349.349G62

MRSA.6258.258N25

MRSA.6258.258G33

MRSA.6258.258W56

MRSA.6266.2661353W66

MRSA.6266.2667011W67

MRSA.6266.266W65

MRSA.6266.2665921W68

MRSA.6266.266A15

MRSA.6266.266G38

MRSA.6427.427W184

MRSA.6427.427N71

MRSA.6243.W196

MRSA.6242.242W42

MRSA.6252.2527094W52

MRSA.6252.252G29

MRSA.6252.252W51

MRSA.6502.3302W157

MRSA.6501.3301W156

MRSA.6501.3301G83

MRSA.6501.3301N61

MRSA.6260.260G34

MRSA.6260.260W58

MRSA.6340.340A25

MRSA.6340.340W100

MRSA.6340.340G57

MRSA.6341.341N42

MRSA.6394.394G79

MRSA.6391.391W148

MRSA.6421.4217599W177

MRSA.6303.G93

MRSA.6303.W199

MRSA.6267.267.1N27

MRSA.6267.267W69

MRSA.6403.4036488W163

MRSA.6399.399W154

MRSA.6204.204W4

MRSA.6305.305W71

MRSA

.633

5.33

529

85W

190

MRSA.6335.335W96

MRSA.6250.250W49

MRSA.6339.339W99

MRSA.6328.328W89

MRSA.6384.384W142

MRSA.6406.406A41

MRSA.6351.351W113

MRSA.6319.319W81

MRSA.6377.3

77W136

MRSA.6376.376G74

MRSA.6398.398W153

MRSA.6

311.31

1.1A20

MRSA.6

311.31

1.1N32

MRSA.6

311.31

1N31

MRSA.6

311.31

1A19

MRSA.6

311.31

1W75

MRSA.

6438.

W214

MRSA.6

241.24

1N18

MRSA.

6241.2

41W41

MRSA.6416.4163680W172

MRSA.6416.416W171

MRSA.6416.416.1G87

MRSA.6346.346W104

MRSA.6336.336W97

MRSA.6337.337W98

MRSA.6265.265G37

MRSA.6265.265W64

MRSA.6350.350W109

MRSA.6349.349.1G63

MRSA.6349.349W108

MRSA.6321.W200

MRSA.6214.214N4

MRSA.6214.214W10

MRSA.6380.380W138

MRSA.6268.268.1G41

MRSA.6268.268G40

MRSA.6268.268W70

MRSA.6230.230N12

MRSA.6230.230.1G15

MRSA.6230.230.1N13

MRSA.6376.376.1A35

MRSA.6345.345G59

MRSA.6376.376.1N55

MRSA.6376.376A34

MRSA.6376.376W135

MRSA.6376.376.1G75

MRSA.6376.376N54

MRSA.6418.418W174

MRSA.6254.254G30

MRSA.6248.248W47

MRSA.6387.387W145

MRSA.6211.211W7

MRSA.6504.3304W158

MRSA.6506.3306W159

MRSA.6251.251N22

MRSA.6251.251W50

MRSA.6370.370N52

MRSA.6324.324.1N36

MRSA.6325.325W86

MRSA.6228.228.1G14

MRSA.6367.3

67W129

MRSA.6237.237

.1N15

MRSA.6237.237W

36

MRSA.6218.218.1N7

MRSA.6423.4232663W181

MRSA.6423.4237501W180

MRSA.6

439.W2

15

MRSA.6

363.G95

MRSA.63

17.31767

44W79

MRSA.631

7.317W78

MRSA.643

5.435W18

8

MRSA.633

6.336A24

MRSA.6371.

371N53

MRSA.6371.3

71G73

MRSA.6371.3

71W133


MRSA.6308.308W73

MRSA.6385.385N58

MRSA.6385.385G76

MRSA.6385.385W143

MRSA.6385.385A38

MRSA.6385.3855370W192

MRSA.6365.365N50

MRSA.6365.365W126

MRSA.6327.327G50

MRSA.6327.327W88

MRSA.6369.369W131

MRSA.6320.320A21

MRSA.6320.320W82

MRSA.6320.320N33

MRSA.6320.320.1N34

MRSA.6320.320G46

MRSA.6396.3961643W152

MRSA.6396.396G80

MRSA.6396.396W151

MRSA.6355.355A32

MRSA.6355.355W117

MRSA.6355.355G67

MRSA.6355.355N47

MRSA.6309.309W74

MRSA.6309.309.1A18

MRSA.6309.309.1N30

MRSA.6263.2636238W63

MRSA.6263.263W62

MRSA.6328.328A23

MRSA.6327.327.1G51

MRSA.6328.328G52

MRSA.6324.324G48

MRSA.6323.323W83

MRSA.6219.W195

MRSA.6219.G90

MRSA.6219.N74

MRSA.6219.G91

MRSA.6226.226W21

MRSA.6226.226A6

MRSA.6226.226N9

MRSA.6261.261A14

MRSA.6261.261W59

MRSA.6377.377A36M

RSA.6377.377N56 M

RSA.6229.229W24

MRSA.6223.223G12

MRSA.6223.223W19

MRSA.6363.W203

MRSA.6232.2326105W29

MRSA.6232.232G16

MRSA.6232.232W28

MRSA.6388.388.1N60

MRSA.6390.3905178W147

MRSA.6390.390W146

MRSA.6236.2366121W33

MRSA.6236.236.1G20

MRSA.6236.2366527W35

MRSA.6236.2366195W34

MRSA.6236.236G19

MRSA.6236.236W32

MRSA.6215.2151332W13

MRSA.6215.215A2

MRSA.6215.215G5

MRSA.6215.215W11

MRSA.6221.221G11

MRSA.6221.221W18

MRSA.6388.W205

MRSA.6218.218G9

MRSA.6218.218.1G10

MRSA.6218.218W16

MRSA.6419.419W175

MRSA.6218.218.1A4

MRSA.6218.218N6

MRSA.6360.360N49

MRSA.6360.360W120

MRSA.6210.210G3

MRSA.6210.210W6

MRSA.6227.227N10

MRSA.6227.227.1N11

MRSA.6227.227G13

MRSA.6227.227W22

MRSA.6401.401N62

MRSA.6356.356N48

MRSA.6356.356G68

MRSA.6356.356A33

MRSA.6356.356W118

MRSA.6397.W207

MRSA.6233.233A7

MRSA.6233.233N14

MRSA.6234.234W31

MRSA.6411.4113370W168

MRSA.6412.412W169

MRSA.6412.412N68

MRSA.6431.G98

MRSA.6334.N76

MRSA.6334.W201

MRSA.6203.203G2

MRSA.6203.203W3

MRSA.6203.203A1

MRSA.6308.308A17

MRSA.6362.362G69

MRSA.6362.362W122

MRS

A.64

31.4

31A4

2

MRSA.6431.4313236W187

MRSA.6431.431N73

MRSA.6352.352G64

MRSA.6352.352W114

MRSA.6401.4016336W161

MRSA.6353.353G65MR

SA.6247.G92

MRSA.6354.G94

MRSA.6354.W202

MRSA.6247.N75

MRSA.6247.A43

MRSA.6247.W197

MRSA.6247.2947W198

MRSA.6331.3314694W94

MRSA.6368.368W130

MRSA.6217.2

17.1A3MRSA.6

217.2173205W1

89MRSA.6

344.344A27

MRSA.6344.344G

58MRSA.6396

.396.1G81

MRSA.6217.217.1N5

MRSA.6217.217.1G8

MRSA.6217.217G7

MRSA.6217.217W15

MRSA.6383.383W141

MRSA.6342.342342807W191

MRSA.6230.2302975W26

MRSA.6230.230W25

MRSA.6230.230N12

MRSA.6257.257W55

MRSA.6257.257G32

MRSA.6257.257N24MR

SA.6262.2621199W61

MRSA.6262.262G36

MRSA.6262.262N26

MRSA.6262.262W60

MRSA.6379.379N57

MRSA.6379.379W137

MRSA.6421.4217968W178

MRSA.6429.429W185

MRSA.6249.249.1G28

MRSA.6249.249G27

MRSA.6249.249W

48

MRSA.

6314.

314G44

MRSA.

6314.3

143541

W77

MRSA.6

314.3

14.1G4

5

MRSA.6

314.31

4W76

MRSA.63

75.N77MRSA

.6375.G9

7MRSA.637

5.W204MRSA

.6238.23

8A8MRSA.623

8.238W37

MRSA.6238

.238N16MRSA.

6238.2938W

38

MRSA.6

350.35

00593W

112MRS

A.6350

.350.1

A28

MRSA.6

350.35

01396W

110

MRSA.6

426.42

6W183

MRSA.6

345.34

5.1G60

MRSA.6402.4026402W162

MRSA.6424.4247207W182

MRSA.6266.A44

MRSA.6381.381W139

MRSA.6420.420W176

MRSA.6244.244.1A11MRSA.6244.2444815W44

MRSA.6244.244W43MRSA.6241.241G22MRSA.6244.244G24

MRSA.6244.244A10MRSA.6244.244N20

MRSA.6222.222A5MRSA.6353.353.1N46MRSA.6353.353.1G66MRSA.6353.353N45MRSA.6353.3535901W116

MRSA.6353.353.1A31MRSA.6330.330.1N41MRSA.6353.353A30MRSA.6353.353W115

MRSA.6417.417G88MRSA.6213.213N3

MRSA.6213.213G4

MRSA.6213.213W9

MRSA.6242.242N19

MRSA.6239.239G21

MRSA.6239.239W39

MRSA.6239.239A9

MRSA.6239.239N17

MRSA.6328.328N38

MRSA.6234.234G18

MRSA.6233.233W30

MRSA.6240.240W40

MRSA.6364.364G70

MRSA.6364.364083W124

MRSA.6364.3647947W125

MRSA.6364.364.1G71

MRSA.6364.364W123

MRSA.6201.2011355W2

MRSA.R201G

MRSA.R201W

MRSA.6201.201N1

MRSA.6201.201.1

N2

MRSA.8869

.201G1

MRSA.63

51.351A29

MRSA

.6330.

330.1G5

3

MRSA.633

0.330

W91

MRSA.6410.410N66

MRSA.6410.4102889W194

MRSA.6410.4106835W193

MRSA.6410.410G85

MRSA.6410.410.1G86

MRSA.6410.410.1N67

MRSA.6212.212W8

MRSA.6220.220W17

MRSA.6394.394W149

MRSA.6348.348G61

MRSA.6323.323G47

MRSA.6324.324W84

MRSA.6324.324.1G49

MRSA.6324.324.1A22

MRSA.6324.3247192W85

MRSA.6228.228.1G14

MRSA.6228.228W23

MRSA.6259.259W57

MRSA.R406 2601W

MRSA.R406N

MRSA.6406.406G84

MRSA.6406.406N64

MRSA.6406.4062601W165

MRSA.6406.4063760W164

NGS Application to Clinical Samples

Examples from our work


Exemestane

Anastrozole

Letrozole

SURGERY

Continued therapywith an AI wherepossible:Radiotherapy,Chemotherapydiscretionary

Accrual @ 377

Core Biopsy

PostmenopausalER+, Allred 6-8,Clinical Stage2 and 3

Genomics of Aromatase Inhibitor Response in Luminal Breast Cancer


Accrual @ 377

Tumor (Ki67)

46 “luminal” subtype breast cancer cases: 25 AI-responsive and 21 AI-unresponsive tumors (per resection Ki67 index).

46 genome pairs completed from pre-treatment core biopsies with >70% neoplastic cellularity. Manuscript in revision.

We now have sequenced the resected post-treatment tumor genomes for most of these patients.

A comprehensive mutational spectrum of ER+ Breast Cancer


• These mutations were tested in a panel of 121 additional luminal cases from the two clinical trials to ascertain significantly mutated genes.• 70/167 tumors carried PIK3CA mutations (41.9%).• TP53 was mutant in 15.7% (26 out of 167 tumors)

PIK3CA Mutations in Luminal Breast Cancer


• Two novel in-frame deletions were discovered in PIK3CA.• 20 of 26 point mutations in PIK3CA reside in the cluster with the highest allele frequency, indicating these are likely initiating mutations in the founder clones. •PIK3CA was the most frequently mutated gene at 43% of patients.

MAP3K1 Mutations in Luminal Breast Cancer


• MAP3K1 is a serine/threonine kinase that activates the ERK and JNK pathways• We identified 6 mutations in the 46 tumors (2 nonsense, 3 frameshift, 1 missense)• Four patients carried two MAP3K1 mutations, indicating bi-allelic loss

Deletion of MAP3K1

Normal


Tumor

Chromothripsis in Luminal Breast Cancer


• Twelve cases lacked any translocations.• Seven cases had multiple complex translocations with breakpoints suggestive of a single cellular catastrophe (“chromothripsis”).

Potentially Druggable Somatic Mutations


When combining common and rare mutations across 46 samples, 96% of patients have a potentially

druggable mutation identified by WGS

� Full and interpretive analysis

� Time to results in a clinical timeframe

� Demonstration of clinical efficacy (“Genome-forward medicine”)

� Adequacy of NGS + data analysis in the CLIA environment, compared to the current standard in pathology

The Challenges Ahead for Diagnostic NGS


compared to the current standard in pathology

� Uniformity of results across sample preservation methods

� Ability to inform patients about what might and might not result from genomic data interpretation

� Minimizing false negative results (meeting the need with the appropriate technology)

WGS for diagnosis

Defining actionable targets in cancer patients



Conclusion:

“…whole genome characterization will become a

routine part of cancer pathology.”

Clinical Course of a WGS Patient


In cancer, whole genome analysis will be done not once, but multiple times during the course of the disease fortumor subtyping, monitoring response to therapy and

diagnosing the reasons for recurrences or therapeutic failures.

Welch et al., JAMA April 20, 2011


Clinical case: “AML52”

Atypical APL Presentation

37 y.o. female with de novo AML; M3 morphology

Complex cytogenetics,persistent leukemia

Chemo + ATRA

Chemo only


First remission, referred to WU for SCT.rBM: normal morphology, cytogenetics; negative

for PML/RARA.

AllogeneicSCT

Consolidation + ATRA

Chemo only

???

“Genome-Guided Medicine”: An early example

Detection of PML-RARA by

WGS,Confirmed by

FISH, RT-PCR


37 y.o. female with de novo AML,

M3 morphology, CTG, no PML-

RARA.Referred to WUSM

for SCT.

Consolidation:Chemo + ATRA

Sustained remission

R.K.Wilson2011

A Paradigm for Clinical Sequencing

� Output per flow cell ~300 Gbp in a 10 day run time

� Combine the following data types per flow cell per clinical cancer case:

� Normal WGS (30X)


> Validation plus deep digital

sequencing

> Mutation expression + fusion

transcripts

� Tumor WGS (60X)

� Tumor exome (1 lane)

� Normal exome (1 lane)

� Tumor RNA-seq (1 lane)

� Integrated data analysis

� Pharmaco-oncogenomic prediction of targeted therapy

Example case

Acute lymphocytic leukemia


Example case

ALL-1: Case History

� Male patient, mid-30’s. � Initial presentation of ALL, induction therapy to remission,

BMT from sibling (blasts banked).� Patient relapsed in July 2011.� Induction therapy to apparent remission.

Using WGS of initial ALL blasts, two subclones identified,


� Using WGS of initial ALL blasts, two subclones identified, each with two distinct large deletions.

� Probes designed to detect these deletions used for flow-FISH of remission marrow. Minimal residual disease detected.

� Treatment options assessed, based on WGS and RNA-seq data analysis and DrugBank evaluation.

ALL-1: Somatic copy number alterations

2011 ASCP Annual MeetingAcute lymphocytic leukemia

ALL-1: Somatic single nucleotide variations� 91 somatic ‘tier 1’ SNVs

� 42 with evidence for expression in RNA-seq

� Variant allele frequency of the 10 most highly expressed are listed below

Gene Ref. Var. AA Type WGS Exome RNA-seq

UBXN4 T C D�D silent 51.25% 40.56% 53.40%

OGT G T C�F missense 39.22% 38.7% 35.79%


KIAA1033 C T T�I missense 38.89% 48.1% 50.75%

C15orf39 C G A�A silent 47.37% 37.5% 48.74%

SPTAN1 T C L�P missense 42.39% 49.03% 50.17%

DDX6 A G L�P missense 21.78% 14.29% 18.14%

CCDC47 C T A�T missense 22.34% 21.9% 23.36%

NF1 C T R�* nonsense 64.58% 64.04% 47.06%

TNRC6B G A S�N missense 13.25% 12.8% 19.46%

KIAA1462 G C P�A missense 70% 45.05% 41.14%


ALL-1: Tumor heterogeneity

Tumor variant allele frequency

Pro

port

ion


Read c

overa

ge (

X)

Tumor variant allele frequency

2,074 tier1-3 somatic variants. 91 are tier1 (coding exons)

NF1


RNA-seq reveals unusual FLT3 expression levels

RNA-seq Analysis

2011 ASCP Annual MeetingAcute lymphocytic leukemia

FLT3 expression is not just high, it is extreme…

Distribution of non-zero gene expression values

1000

1500

FLT3

Quantitation of FLT3 Expression

• Of 49,887 genes, 28,671 had an FPKM greater than zero.

• FLT3 ranks #229 by FPKM.• This places it inside the top

1% of all expressed genes.


Gene expression (log2(FPKM + 0.00001))

Frequency

-15 -10 -5 0 5 10 15

0500

1% of all expressed genes.

• Based on FLT3 overexpression in the tumor cells, predict the patient will respond to the FLT3 inhibitor Sunitinib

(Sutent)[DrugBank].


� Patient has been prescribed Sutent as of 2 weeks ago.

� Initial blood tests after 3 days on Sutent indicated blast production had improved by ~50% post-remission.

� Plan to re-evaluate remission/MRD status using flow-

ALL-1: Patient status


� Plan to re-evaluate remission/MRD status using flow-FISH probes and a bone marrow biopsy after 3 weeks of Sutent therapy.

� If remission is complete, a bone marrow donor has been identified and BMT will follow.

Am J Clin Pathol 2011; 135:668-672

The Future of Pathology?


� The interplay of surgery, oncology, pathology and cancer genomics is beginning to transform our understanding of the tumor genome.

� In addition to identifying targetable mutations, we can determine the mutational spectrum of the tumor cell subpopulations present, providing potentially

Conclusions


cell subpopulations present, providing potentially important information in therapeutic decisions.

� The maturity and scale of commercial NGS platforms and interpretational software are beginning to match the clinical need for WGS.

The Genome Institute

� Li Ding

� Dong Shen

� Chris Miller

� David Larson

� Malachi Griffith

Acknowledgements

Tim Ley, MDPeter Westervelt, MDJohn DiPersio, MDJohn Welch, MD, Ph.D.Tim Graubert, MDLukas Wartman, MD

Washington University


� Nathan Dees

� Christine Wylie

� Todd Wylie

� Jason Walker

� Vince Magrini

� Ryan Demeter

� Sean McGrath

Lukas Wartman, MDMatthew Ellis, MB, Ph.D.

George WeinstockRick Wilson

Whole Genome Sequencing by Next- Generation Methods...

Documents

Transcript of Whole Genome Sequencing by Next- Generation Methods...