Whole Genome Sequencing by Next- Generation Methods...
Transcript of Whole Genome Sequencing by Next- Generation Methods...
Whole Genome Sequencing by Next-Generation Methods: Genome-forward Medicine
Elaine R. Mardis, Ph.D.
2011 ASCP Annual Meeting
Elaine R. Mardis, Ph.D.
Professor of Genetics
Washington University School of Medicine
St. Louis MO
I have the following conflicts to declare:
� Speaker’s Bureau: Illumina, Inc.
� Scientific Advisory Board: Pacific Biosciences, Inc.
Conflict of Interest Slide
2011 ASCP Annual Meeting
� Stockholder: Life Technologies, Inc.
� Next-Generation Sequencers (NGS)� Roche/454� Illumina� Life Technologies
� Third Generation Sequencers� Pacific Biosciences� Ion Torrent
MiSeq
Overview
2011 ASCP Annual Meeting
� MiSeq� Oxford Nanopore
� NGS Practical Considerations� Coverage: depth and breadth� Error model� Representation bias
� NGS Applications
What is “Genome-Forward” Medicine?
� Informing the physician about genomic aspects of a patient’s diagnosis, using next-generation sequencing methods in place of conventional approaches.
� In certain applications of genome-forward
Defining “Genome-Forward” Medicine
2011 ASCP Annual Meeting
� In certain applications of genome-forward medicine, having information about the patient’s genome, or that of their microbial or viral infecting genome, may better inform the physician about treatment choices and in turn, improve patient response.
The Trajectory of Throughput: 10 years
2011 ASCP Annual Meeting
E.R. Mardis, Nature (2011) 470: 198-203
Comparative costs: sequencing a human genome
2011 ASCP Annual Meeting
Capillary technology
Applied Biosystems 3730xl
(2004)
$15,000,000
Next-gen technology
Illumina HiSeq (2011)
$10,000
Next-generation sequencers
The Fundamentals
2011 ASCP Annual Meeting
Next-generation DNA sequencing instruments
� All commercially-available sequencers have the following shared attributes:� Random fragmentation of starting DNA, ligation with custom linkers =
“a library”� Library amplification on a solid surface (either bead or glass)� Direct step-by-step detection of each nucleotide base incorporated
during the sequencing reaction
Next-generation DNA sequencing instruments
2011 ASCP Annual Meeting
during the sequencing reaction� Hundreds of thousands to hundreds of millions of reactions imaged
per instrument run = “massively parallel sequencing”� Shorter read lengths than capillary sequencers� A “digital” read type that enables direct quantitative comparisons� A sequencing mechanism that samples both ends of every fragment
sequenced (“paired end” reads)
Paired-end reads
� All next-gen platforms now offer methods to derive sequence data from each end of the library fragments.
� Differences exist in the _distance_ between read pairs, based on the approach/platform.� “paired ends” : linear fragment with ability to sample both ends in
What are paired-end reads?
2011 ASCP Annual Meeting
� “paired ends” : linear fragment with ability to sample both ends in separate reaction
� “mate pairs” : circularized fragment of >1kb, sequenced either by a single reaction read or two end read (platform dependent)
� In general, paired end reads offer advantages for human whole genome sequencing due to the repetitive nature of our DNA, and the difficulty in accurate placement (“mapping”) of NGS reads.
The Human Genome Reference• Mapping short reads to the genome reference sequence is a
required step for next-generation sequence data analysis, regardless of the preparatory steps used
• Once mapped to the genome, “localized” assembly of select reads can be used to define genomic alterations at single nucleotide resolution
• The human genome sequence serves as a “reference” to which we can compare other human genomes…
Read alignment to the Human Reference Genome
2011 ASCP Annual Meeting
we can compare other human genomes…
• SNPs (single nucleotide polymorphisms)
• Point mutations (maintain or change amino acids)
• Insertion/deletions (add or remove >1base in the sequence)
• Translocations of chromosome fragments (inter- or intra-)
• Inversions (an entire fragment swaps ends)
• Amplification (multiple repeated copies of a genome segment) or Deletion (large blocks removed)
Roche/454 Library Prep
Roche/454 Library Prep
Single Stranded Adapter Ligated Library•Random Fragmentation•Adapter Ligation
2011 ASCP Annual Meeting
•Adapter Ligation
• emPCR for clonalamplification
Roche/454 Pyrosequencing
Roche/454 Pyrosequencing
Centrifugation
Load Enzyme BeadsLoad beads into
PicoTiter™Plate
2011 ASCP Annual Meeting
DNA Capture BeadContaining Millions of
Copies of a Single Clonal Fragment
A A T C G G C A T G C T A A A A G T C A
Anneal PrimerT
PPi
ATP
Light + oxy luciferin
Sulfurylase
Luciferase
APS
luciferin
Sequencing by Synthesis
454 Instrumentation Specifics
Instrument
Run
Time
(hr)
Read
Length
(bp)
Yield
(Mb/run
)
Error
Type
Error
Rate
(%)
Purchase
Cost
(x1000)
454 FLX+ 18-20 700 900 Indel 1 $30 A
454 FLX Titanium 10 400 500 Indel 1 $500
454 GS Jr. Titanium
10 400 50 Indel 1 $108
A – Requires the 454 FLX Titanium. This is the upgrade cost.
2011 ASCP Annual Meeting
– Requires the 454 FLX Titanium. This is the upgrade cost.
Notable:
• Mate pair paired end reads of 3kb, 8kb and 20 kb separation without an increase in run time.
• Cost per run makes sequencing an entire human genome cost-prohibitive relative to other technologies (~ $20/Mbp)
• Great platform for targeted validation
Illumina Sequencing: Library Preparation
Illumina Sequencing: Library Preparation
2011 ASCP Annual Meeting
Automated ProcessingLow gDNA Inputs
Illumina Sequencing by Synthesis
Emission
Incorporate
Detect
De-block
2011 ASCP Annual Meeting
Excitation
De-block
Cleave
fluor
Illumina Instrumentation SpecificsKey updates
• 2010: HiSeq 2000
• Two flow cells per run
• 100 Gbp/FC or two genome equivalents per run
• New scanning mechanics - scans both surfaces of FC lanes
• 2011: HiSeq 2000
• Improved chemistry (v. 3): increased yield and accuracy
• 2011: MiSeq
2011 ASCP Annual Meeting
Instrument
Run
Time
(days)
Read
Length
(bp)
Yield
(Gb/run)
Error
Type
Error
Rate
(%)
Purchase
Cost
(x1000)
GAIIx 14 150 x 150 96 Sub >0.1 $525
HiSeq 2000 8 100 x 100 200 x 2 Sub >0.1 $700
HiSeq 2000 v3
10 100 x 100 <600 Sub >0.1 $700
MiSeq 1 150 x 150 1 Sub* >0.1* $125
Life Technologies: sequencing by ligation
• custom adapter library• emPCR on magnetic beads• sequencing by ligation usingfluorescent probes from acommon primer• sequential rounds of ligation
2011 ASCP Annual Meeting
• sequential rounds of ligationfrom a series of primers• fixed/known nucleotides foreach probeset identify twobases each cycle, or“two base encoding”
SOLiD Instrumentation Specifics
Instrument
Run
Time
(days)
Read
Length
(bp)
Yield
(Gb/run)
Error
Type
Error
Rate
(%)
Purchase
Cost
(x1000)
SOLiD 4 12 50 x 35 PE 71 A-T Bias >0.06 $475
SOLiD 5500 xl 875 x 35 PE
60 x 60 MP
155 A-T Bias >0.01 $595
5500 xl
•
2011 ASCP Annual Meeting
• Front-end automation addresses bottlenecks at emPCR, breaking, and enrichment of beads
• 6-lane Flow Chip with independent lanes/2 per run
• Cost per whole genome data set is predicted to be $6K by 2011
• Very high accuracy data due to two-base encoding
• ECC Module – An optional 6th primer that increases accuracy to 99.999%
• Direct conversion of color space to base space
• True paired-end chemistry enabled – Ligation reaction can be used in either direction
Third generation sequencers??
� Recently, new sequencing platforms were introduced.� The Pacific Biosciences sequencer is a single molecule
detection system that marries nanotechnology with molecular biology.
� The Ion Torrent uses pH rather than light to detect nucleotide incorporations.
Third Generation Sequencing Instruments
2011 ASCP Annual Meeting
nucleotide incorporations.� The MiSeq is a scaled down version of the HiSeq, with
faster chemistry and scanning.� All offer a faster run time, lower cost per run, reduced
amount of data generated relative to 2nd Gen platforms, and the potential to address genetic questions in the clinical setting.
Comparisons to Third-Generation Sequencers
CompanyPlatform
NameSequencing
Amplificatio
n
Run
Time
Roche 454 TiDNA Polymerase“Pyrosequencing”
emPCR 10 hours
IlluminaHi-
Seq/MiS DNA PolymeraseBridge
10 days/24
2011 ASCP Annual Meeting
Illumina Seq/MiSeq
DNA Polymeraseamplification
days/24 hours
LifeSOLiD/5
500DNA Ligase emPCR 12 days
Ion Torrent PGMSynthesis
H+ detectionemPCR 2 hours
PacificBiosciences
RS Synthesis NONE 45 min
Pacific Biosciences RS
Shearing
(Covaris/Hydroshear)
Polish ends
DNA polymerase binding
Load library/polymerase
complex onto SMRT cell
Sample Prep Library/Polymerase
Complex
Movie 1 (v1.1 & v1.2)
Raw
reads
Post-filter
reads reads
Mapped
reads
Movie 2 (only v1.2)
Raw
reads
Post-filter
reads reads
Mapped
reads
Sequencing
2011 ASCP Annual Meeting
SMRTbell™ ligation
Sequencing
primer annealing
The image part with relationship ID rId3 was not found in the file.
Zero Mode Waveguides (ZMWs)
Ver. 1.1 = 1 x 45,000 per SMRT cell
Ver. 1.2 = 2 x 75,000 per SMRT cell
SMRTbell Library Types
Circular Consensus
Small ~250bpSingle MovieMultiple passesMany sub-reads
Standard
Large ~2kbpSingle MovieFew sub-reads
VLR (“very long read”)
Larger ~6kbp
Single 45 minute Movie
Single long read provides
linking information
2011 ASCP Annual Meeting
Sub ReadsSub Reads
Reads
Consensus Read
Read
Pacific Biosciences RS Instrumentation Specifics
InstrumentRun Time
(Hours)
Read
Length
(bp)
Yield
(Mb)
Error
Type
Error
Rate
(%)
Purchase
Cost
(x1000)
RS14 (~8
SMRTCells)1500
45 per SMRTCell
Insertions 15 $695
mean mapped sub-read accuracy: 85.7% (±1.3%)
mean mapped sub-read length: 697 (± 501)
maximum mapped read length: 7,772 bp
2011 ASCP Annual Meeting
maximum mapped read length: 7,772 bp
maximum mapped sub-read length: 5,601 bp
1x45 min movie
8 SMRT cells
Strobe polymerase/strobe reagent/strobe protocol (45 min movie)
2x45 min movies
8 SMRT cells
Strobe polymerase/standard reagent/standard protocol
Ion Torrent PGM Instrument
2011 ASCP Annual Meeting
Data Output per Run Trajectory: Ion Torrent
2011 ASCP Annual Meeting
At present, only the 314 and 316 chips are commercially available
Ion Torrent Yield with Automation Modules
Ion Torrent Data Yield Improves with Automation
2011 ASCP Annual Meeting
Oxford Nanopore SequencingExonuclease-aided sequencing
2011 ASCP Annual Meeting
Pore translocation sequencing
NGS: Practical Considerations
Importance of coverage, error model, representation bias
2011 ASCP Annual Meeting
What is “coverage”?� Coverage is a general term to describe the
– fold oversampling of a DNA target by sequencing data
� In covering a target region or genome, increasing depth of coverage leads to increased certainty of
Importance of Coverage
2011 ASCP Annual Meeting
variant detection
� Coverage levels may vary due to G+C content, amplification or deletion in the region, or other biases, the uniqueness of the target region (“mappability”), and the error model of the data type
� Coverage breadth is equally important as depth!
Alignments: Coverage breadth
Breadth-of-coverage
topography
Altered greatly by the application of a minimum depth filter.
Breadth of coverage attrition through a range of minimum depth filters (1x, 5x, 10x, 15x,
Coverage breadth topography
Breadth
97.3%
67.3%
Depth
1X
5X
2011 ASCP Annual Meeting
depth filters (1x, 5x, 10x, 15x, 20x) for a given region of interest (chr7:930225-931129).
Note the breadth of coverage drops from almost 100% at 1x depth requirement to ~20% when we require >= 20x depth for coverage membership.
22.2%
43.5%
27.8%
20X
10X
15X
Bias Assessment: GC content impacts coverage
2011 ASCP Annual Meeting
� Whole Genome Sequencing (WGS)
� Hybrid Capture: Exome or Targeted sequencing
� Protein:DNA binding
� non-coding RNA sequencing: discovery/variants
Applying Next-generation Sequencing
2011 ASCP Annual Meeting
non-coding RNA sequencing: discovery/variants
� Transcriptome sequencing (RNA-seq)
� Genome-wide methylation of DNA (Methyl-seq)
� Clinical sequencing for therapeutic decisions
E.R. Mardis, Annual Reviews in Genetics & Genomics (2008)
E.R. Mardis, Nature (2011) 470: 198-203
Whole Genome Sequencing Process
• Prepare paired end libraries as whole genome fragment/shotgun by
random shearing of genomic DNA,
adapter ligation, size selection.
• Produce paired end data from each
end of billions of library fragments,
WGS: Data Production and Alignment
2011 ASCP Annual Meeting
end of billions of library fragments,
over-sampling about 30-fold to cover
at a depth sufficient to find all types
of genome alterations.
• Computer programs align the read
pair sequences onto the Human
Reference Genome and several
algorithms are used to discover
variants genome-wide.
Data Analysis Workflow for Variant Detection
30-fold coverage of tumor genomePE Illumina reads
Align read pairs to reference genome
Detect Single-Nucleotide Variantsand focused insertion/deletions
2011 ASCP Annual Meeting
Detect anomalous read pair mapping, assemble reads and identify structuralvariations (inversions, translocations)
Use normalized read coverage levels andHMM-based algorithm to identify CNA andLOH regions
Design custom capture
probes for each putative
variant site Compare to
SNP array
analysis
Somatic Mutations in 50 AML Genomes
2011 ASCP Annual Meeting
2011 ASCP Annual Meeting
DNMT3A mutations in AML
Ley et al. NEJM 2010.
Indentifying Prognostic Mutations
2011 ASCP Annual Meeting
�DNMT3A mutations are present in 22% of de novo AMLs, and 34% of cytogenetically normal patients.
�DNMT3A mutations are strongly associated with poor prognosis.
Hybrid Capture of Genomic Regions or Genes
• Hybrid capture - fragments from a whole genome library are selected by combining with probes that correspond to targeted genes.
• The probe DNAs are biotinylated, making selection from solution with streptavidin magnetic beads an effective means of purification.
• The human “exome” by definition,
2011 ASCP Annual Meeting
• The human “exome” by definition, is the exons of all ~21,000 genes annotated in the human reference genome. Commercial exome kits target most but not all genes.
• Custom capture reagents can be synthesized to target specific loci that may be of interest in a clinical context.
Comparing Exome to Whole Genome Cancer Sequencing
• Exome sequencing costs ~1/10 WGS
• Simplified analysis• Sequence more samples
• WGS captures SVs and focal amp/dels not resolved by
vs.
2011 ASCP Annual Meeting
amp/dels not resolved by SNP arrays
• Non-exonic mutations (“tier 2”) are likely to play a role in cancer
• Exome reagents do not capture all exons
• Tier 2/3 mutations facilitate heterogeneity analysis
Deep Digital Sequencing by Custom Hybrid Capture
� Custom capture probes can be designed to sample from somatic point mutations, in/dels and structural variant regions in the tumor genome.
� High depth next-generation sequencing of these captured regions provides a digital read-out of the allele frequency of each mutation in the tumor cell population.
Tumor Heterogeneity from Deep Read Count Analysis
2011 ASCP Annual Meeting
mutation in the tumor cell population.
� The prevalence of a mutation reflects its ‘history’ in the tumor evolution. Older mutations are present at higher prevalence or allele frequency, newer ones at lower allele frequency.
� Using these data and statistical approaches, we can model the tumor heterogeneity for any sequenced sample, determining the mutational profile of each subclone and it’s proportion of the tumor cell population.
Mutation
Total reads = 1000Mutant reads = 500
All cells carry a heterozygous mutation
Total reads = 1000Mutant reads = 250
Wild-type & het mutant cells
“Deep Digital Sequencing”
2011 ASCP Annual Meeting
Mutant reads = 500Mutant Allele Frequency = = 50%500
1000
Mutant reads = 250Mutant Allele Frequency = = 25%
2501000
Tumor heterogeneity: single dominant clone
2011 ASCP Annual Meeting
Single cluster of somatic mutations
Tumor heterogeneity: multiple clones
Cluster 3: 88% tumor allele frequency
Cluster 2: 57% tumor allele frequency
2011 ASCP Annual Meeting
22% Tumor Content of Tumor Sample0.3% Tumor Content of Normal Sample
Cluster 1: 35% tumor allele frequency
Tumor allele frequencies are adjusted for normal cell contaminationKernel density estimation determines the number of discrete clones
Comparing heterogeneity of primary and metastatic tumors
Primary = absentMetastasis= all cells
Primary tumor= heterozygous mutation present in all cells
Metastasis= heterozygous mutation present in all cells
Mutant Allele Frequency in Metastasis (%)
2011 ASCP Annual Meeting
Mutant Allele Frequency in Primary Tumor (%)
Mutant Allele Frequency in Metastasis (%)
Tumor subpopulations: Single dominant clone
De Novo vs. Relapse: Monoclonal Disease
Clinical DataSex: MaleAge: 53Cyto: NormalTime to relapse: 235OS months: 30Induction: 7+3+3EConsolidation: HDACSCT: No
2011 ASCP Annual Meeting
Tumor subpopulations: Multiple clones
De novo vs. Relapse: Oligoclonal Disease
Clinical DataSex: MaleAge: 68Cyto:NormalTime to relapse: 805OS months: 46.8Induction: 7+3DConsolidation: HDACSCT: No
2011 ASCP Annual Meeting
SCT: No
Clonal Progression in AML1 Relapse
Relapse is based on clonal evolution
2011 ASCP Annual Meeting
• Additional data from 7 AML tumor:relapse whole genome sequencing reveals that this scenario is common: a single dominant mutation cluster in a founder subclone persists through the chemotherapy and re-emerges with new somatic alterations.• Each tumor showed clear cut evidence of clonal evolution at relapse, and a higher frequency of transversions that were probably induced by DNA damage from chemotherapy.
Clonal Evolution: MDS to sAML
2011 ASCP Annual Meeting
By WGS of the sAML tumor, we identified somatic mutations All validated mutations were evaluated by deep digital sequencing in both banked MDS and sAML to calculate allele frequency and heterogeneity
RNA Sequencing
RNA isolateSize selection for
nc RNA classes
Fragment,
RT w/randoms
polyA priming,
RT, ds DNA SAGE tags
Adapter-ligated fragments
for next-gen sequencing
2011 ASCP Annual Meeting
for next-gen sequencing
Alignment to reference database
& discovery-Expression levels
-Novel splice isoforms
-Allelic bias in transcription
-Fusion transcripts
EN1 – a gene that is not expressed
RNA-seq informs therapeutic choices
Normal WGS
2011 ASCP Annual Meeting
EN1 is not expressed (0 RNA-seq reads out of ~146 million mapped)
Tumor WGS
Tumor RNA-seq
Metastatic breast cancer (to brain)
IGV screenshot
RNA Correlation of Somatic Copy Number Alterations
RNA-seq provides correlative data
HER2
2011 ASCP Annual MeetingMetastatic breast cancer (to brain)
HER2 / ERBB2 is heavily amplified in this tumor
RNA-seq confirms the HER2, PR, & ER statusRNA-seq correlates with IHC Diagnosis
HER2 +ive • Gene expression values from RNA-seq
• Used to confirm HER2, PR, & ER status of each patient obtained by IHC
• Tumor is• HER2+, PR-, ER-
2011 ASCP Annual MeetingMetastatic breast cancer (to brain) vs. four primary HER2 +ive breast cancers
PR- ER-
Samples Data Coordination,
Analysis and
Concordance to
SubjectsGoals:
-Defining Core Microbiomes in
“healthy” subjects
-Changes in Microbiomes with
disease
Human Microbiome Project
Community
Phenotypes
2011 ASCP Annual Meeting
Microbial
Communities
Reference
Sequences Metagenomics
16S rRNAWGS
Clinical DataSources of
strains
Phenotypes
Stability
of v
iromeover tim
e
HM
P: H
um
an V
irom
eS
tability
GI
Ora
l
2011 A
SC
P A
nnual M
eeting
Alphapapillomavirus
Polyomavirus
Roseolovirus
Lymphocryptovirus
Mastadenovirus
Alphapapillomavirus
Gammapapillomavirus
Human papillomavirus
Polyomavirus
Roseolovirus
Lymphocryptovirus
Mastadenovirus
Gammapapillomavirus
Human papillomavirus
Kris
tine W
ylie, W
UG
I Vagin
al
Nasal
400 MRSA Genomes
NGS of MRSA IsolatesLegend:
CD
MRSA.6326.326W87
MRSA.6330.330N40
MRSA.6347.347W106
MRSA.6365.G96
MRSA.6433.W211
MRSA.6433.N82
MRSA.6433.G99
MRSA.R103G
MRSA.6421.421N69
MRSA.6254.254W53
MRSA.6382.382W140
MRSA.6415.2790W210
MRSA.6415.4563W208
MRSA.6222.222N8
MRSA.6266.266.1G3
9
MRSA.6432.A4
6
MRSA.639
8.N80
MRSA.6
436.W21
2
MRSA.R1
03 714N
MRSA.R402 6488W
MRSA.R402N
MRSA.6370.370W132
MRSA.6342.342W101
MRSA.6341.341A26
MRSA.6331.331W92
MRSA.6412.4124774W170
MRSA.6224.224W20
MRSA.6329.329N39
MRSA.6329.329W90
MRSA.6245.245W45
MRSA.6245.245.1N21
MRSA.6245.245A12
MRSA.6245.245.1A13
MRSA.6245.245.1G25
MRSA.6216.216G6
MRSA.6216.216W14
MRSA.6246.246G26
MRSA.6246.246W46
MRSA.6326.326.1N37
MRSA.6231.231W27
MRSA.6422.4227588W179
MRSA.6422.422N70
MRSA.6331.331.1G55
MRSA.6332.332W95
MRSA.6209.209W5
MRSA.6386.N78
MRSA.6386.386A39
MRSA.6331.331.1G54
MRSA.6331.3314371W93
MRSA.6500.3300G82
MRSA.6501.3301A40
MRSA.6500.3300W155
MRSA.6348.348W107
MRSA.6374.374W134
MRSA.6335.335G56
MRSA.6344.344W102
MRSA.6345.345N44
MRSA.6345.345W103 M
RSA.6307.307A16
MRSA.6307.307N28
MRSA.6307.307G42
MRSA.6307.307W72
MRSA.6261.261G35
MRSA.6341.341.1N43
MRSA.6215.2157462W12
MRSA.6407.4076690W166
MRSA.6407.4074440W167
MRSA.6407.407N65
MRSA.6255.255G31
MRSA.6255.255W54
MRSA.6415.7010W209
MRSA.6415.N81
MRSA.6430.430G89
MRSA.6430.430W186
MRSA.6430.430N72
MRSA.6361.361W121
MRSA.6395.395W150
MRSA.6346.3463077W105
MRSA.6437.W213
MRSA.6437.N83
MRSA.6359.359W119
MRSA.6366.3667367W128
MRSA.6366.366W127
MRSA.6366.366G72
MRSA.6366.366N51
MRSA.6384.384A37
MRSA.6387.387G78
MRSA.6387.387N59
MRSA.6386.386W144
MRSA.6402.402N63
MRSA.6386.386G77
MRSA.6389.W206
MRSA.6389.A45
MRSA.6389.N79
MRSA.6401.4016575W160
MRSA.6349.349G62
MRSA.6258.258N25
MRSA.6258.258G33
MRSA.6258.258W56
MRSA.6266.2661353W66
MRSA.6266.2667011W67
MRSA.6266.266W65
MRSA.6266.2665921W68
MRSA.6266.266A15
MRSA.6266.266G38
MRSA.6427.427W184
MRSA.6427.427N71
MRSA.6243.W196
MRSA.6242.242W42
MRSA.6252.2527094W52
MRSA.6252.252G29
MRSA.6252.252W51
MRSA.6502.3302W157
MRSA.6501.3301W156
MRSA.6501.3301G83
MRSA.6501.3301N61
MRSA.6260.260G34
MRSA.6260.260W58
MRSA.6340.340A25
MRSA.6340.340W100
MRSA.6340.340G57
MRSA.6341.341N42
MRSA.6394.394G79
MRSA.6391.391W148
MRSA.6421.4217599W177
MRSA.6303.G93
MRSA.6303.W199
MRSA.6267.267.1N27
MRSA.6267.267W69
MRSA.6403.4036488W163
MRSA.6399.399W154
MRSA.6204.204W4
MRSA.6305.305W71
MRSA
.633
5.33
529
85W
190
MRSA.6335.335W96
MRSA.6250.250W49
MRSA.6339.339W99
MRSA.6328.328W89
MRSA.6384.384W142
MRSA.6406.406A41
MRSA.6351.351W113
MRSA.6319.319W81
MRSA.6377.3
77W136
MRSA.6376.376G74
MRSA.6398.398W153
MRSA.6
311.31
1.1A20
MRSA.6
311.31
1.1N32
MRSA.6
311.31
1N31
MRSA.6
311.31
1A19
MRSA.6
311.31
1W75
MRSA.
6438.
W214
MRSA.6
241.24
1N18
MRSA.
6241.2
41W41
MRSA.6416.4163680W172
MRSA.6416.416W171
MRSA.6416.416.1G87
MRSA.6346.346W104
MRSA.6336.336W97
MRSA.6337.337W98
MRSA.6265.265G37
MRSA.6265.265W64
MRSA.6350.350W109
MRSA.6349.349.1G63
MRSA.6349.349W108
MRSA.6321.W200
MRSA.6214.214N4
MRSA.6214.214W10
MRSA.6380.380W138
MRSA.6268.268.1G41
MRSA.6268.268G40
MRSA.6268.268W70
MRSA.6230.230N12
MRSA.6230.230.1G15
MRSA.6230.230.1N13
MRSA.6376.376.1A35
MRSA.6345.345G59
MRSA.6376.376.1N55
MRSA.6376.376A34
MRSA.6376.376W135
MRSA.6376.376.1G75
MRSA.6376.376N54
MRSA.6418.418W174
MRSA.6254.254G30
MRSA.6248.248W47
MRSA.6387.387W145
MRSA.6211.211W7
MRSA.6504.3304W158
MRSA.6506.3306W159
MRSA.6251.251N22
MRSA.6251.251W50
MRSA.6370.370N52
MRSA.6324.324.1N36
MRSA.6325.325W86
MRSA.6228.228.1G14
MRSA.6367.3
67W129
MRSA.6237.237
.1N15
MRSA.6237.237W
36
MRSA.6218.218.1N7
MRSA.6423.4232663W181
MRSA.6423.4237501W180
MRSA.6
439.W2
15
MRSA.6
363.G95
MRSA.63
17.31767
44W79
MRSA.631
7.317W78
MRSA.643
5.435W18
8
MRSA.633
6.336A24
MRSA.6371.
371N53
MRSA.6371.3
71G73
MRSA.6371.3
71W133
2011 ASCP Annual Meeting
MRSA.6308.308W73
MRSA.6385.385N58
MRSA.6385.385G76
MRSA.6385.385W143
MRSA.6385.385A38
MRSA.6385.3855370W192
MRSA.6365.365N50
MRSA.6365.365W126
MRSA.6327.327G50
MRSA.6327.327W88
MRSA.6369.369W131
MRSA.6320.320A21
MRSA.6320.320W82
MRSA.6320.320N33
MRSA.6320.320.1N34
MRSA.6320.320G46
MRSA.6396.3961643W152
MRSA.6396.396G80
MRSA.6396.396W151
MRSA.6355.355A32
MRSA.6355.355W117
MRSA.6355.355G67
MRSA.6355.355N47
MRSA.6309.309W74
MRSA.6309.309.1A18
MRSA.6309.309.1N30
MRSA.6263.2636238W63
MRSA.6263.263W62
MRSA.6328.328A23
MRSA.6327.327.1G51
MRSA.6328.328G52
MRSA.6324.324G48
MRSA.6323.323W83
MRSA.6219.W195
MRSA.6219.G90
MRSA.6219.N74
MRSA.6219.G91
MRSA.6226.226W21
MRSA.6226.226A6
MRSA.6226.226N9
MRSA.6261.261A14
MRSA.6261.261W59
MRSA.6377.377A36M
RSA.6377.377N56 M
RSA.6229.229W24
MRSA.6223.223G12
MRSA.6223.223W19
MRSA.6363.W203
MRSA.6232.2326105W29
MRSA.6232.232G16
MRSA.6232.232W28
MRSA.6388.388.1N60
MRSA.6390.3905178W147
MRSA.6390.390W146
MRSA.6236.2366121W33
MRSA.6236.236.1G20
MRSA.6236.2366527W35
MRSA.6236.2366195W34
MRSA.6236.236G19
MRSA.6236.236W32
MRSA.6215.2151332W13
MRSA.6215.215A2
MRSA.6215.215G5
MRSA.6215.215W11
MRSA.6221.221G11
MRSA.6221.221W18
MRSA.6388.W205
MRSA.6218.218G9
MRSA.6218.218.1G10
MRSA.6218.218W16
MRSA.6419.419W175
MRSA.6218.218.1A4
MRSA.6218.218N6
MRSA.6360.360N49
MRSA.6360.360W120
MRSA.6210.210G3
MRSA.6210.210W6
MRSA.6227.227N10
MRSA.6227.227.1N11
MRSA.6227.227G13
MRSA.6227.227W22
MRSA.6401.401N62
MRSA.6356.356N48
MRSA.6356.356G68
MRSA.6356.356A33
MRSA.6356.356W118
MRSA.6397.W207
MRSA.6233.233A7
MRSA.6233.233N14
MRSA.6234.234W31
MRSA.6411.4113370W168
MRSA.6412.412W169
MRSA.6412.412N68
MRSA.6431.G98
MRSA.6334.N76
MRSA.6334.W201
MRSA.6203.203G2
MRSA.6203.203W3
MRSA.6203.203A1
MRSA.6308.308A17
MRSA.6362.362G69
MRSA.6362.362W122
MRS
A.64
31.4
31A4
2
MRSA.6431.4313236W187
MRSA.6431.431N73
MRSA.6352.352G64
MRSA.6352.352W114
MRSA.6401.4016336W161
MRSA.6353.353G65MR
SA.6247.G92
MRSA.6354.G94
MRSA.6354.W202
MRSA.6247.N75
MRSA.6247.A43
MRSA.6247.W197
MRSA.6247.2947W198
MRSA.6331.3314694W94
MRSA.6368.368W130
MRSA.6217.2
17.1A3MRSA.6
217.2173205W1
89MRSA.6
344.344A27
MRSA.6344.344G
58MRSA.6396
.396.1G81
MRSA.6217.217.1N5
MRSA.6217.217.1G8
MRSA.6217.217G7
MRSA.6217.217W15
MRSA.6383.383W141
MRSA.6342.342342807W191
MRSA.6230.2302975W26
MRSA.6230.230W25
MRSA.6230.230N12
MRSA.6257.257W55
MRSA.6257.257G32
MRSA.6257.257N24MR
SA.6262.2621199W61
MRSA.6262.262G36
MRSA.6262.262N26
MRSA.6262.262W60
MRSA.6379.379N57
MRSA.6379.379W137
MRSA.6421.4217968W178
MRSA.6429.429W185
MRSA.6249.249.1G28
MRSA.6249.249G27
MRSA.6249.249W
48
MRSA.
6314.
314G44
MRSA.
6314.3
143541
W77
MRSA.6
314.3
14.1G4
5
MRSA.6
314.31
4W76
MRSA.63
75.N77MRSA
.6375.G9
7MRSA.637
5.W204MRSA
.6238.23
8A8MRSA.623
8.238W37
MRSA.6238
.238N16MRSA.
6238.2938W
38
MRSA.6
350.35
00593W
112MRS
A.6350
.350.1
A28
MRSA.6
350.35
01396W
110
MRSA.6
426.42
6W183
MRSA.6
345.34
5.1G60
MRSA.6402.4026402W162
MRSA.6424.4247207W182
MRSA.6266.A44
MRSA.6381.381W139
MRSA.6420.420W176
MRSA.6244.244.1A11MRSA.6244.2444815W44
MRSA.6244.244W43MRSA.6241.241G22MRSA.6244.244G24
MRSA.6244.244A10MRSA.6244.244N20
MRSA.6222.222A5MRSA.6353.353.1N46MRSA.6353.353.1G66MRSA.6353.353N45MRSA.6353.3535901W116
MRSA.6353.353.1A31MRSA.6330.330.1N41MRSA.6353.353A30MRSA.6353.353W115
MRSA.6417.417G88MRSA.6213.213N3
MRSA.6213.213G4
MRSA.6213.213W9
MRSA.6242.242N19
MRSA.6239.239G21
MRSA.6239.239W39
MRSA.6239.239A9
MRSA.6239.239N17
MRSA.6328.328N38
MRSA.6234.234G18
MRSA.6233.233W30
MRSA.6240.240W40
MRSA.6364.364G70
MRSA.6364.364083W124
MRSA.6364.3647947W125
MRSA.6364.364.1G71
MRSA.6364.364W123
MRSA.6201.2011355W2
MRSA.R201G
MRSA.R201W
MRSA.6201.201N1
MRSA.6201.201.1
N2
MRSA.8869
.201G1
MRSA.63
51.351A29
MRSA
.6330.
330.1G5
3
MRSA.633
0.330
W91
MRSA.6410.410N66
MRSA.6410.4102889W194
MRSA.6410.4106835W193
MRSA.6410.410G85
MRSA.6410.410.1G86
MRSA.6410.410.1N67
MRSA.6212.212W8
MRSA.6220.220W17
MRSA.6394.394W149
MRSA.6348.348G61
MRSA.6323.323G47
MRSA.6324.324W84
MRSA.6324.324.1G49
MRSA.6324.324.1A22
MRSA.6324.3247192W85
MRSA.6228.228.1G14
MRSA.6228.228W23
MRSA.6259.259W57
MRSA.R406 2601W
MRSA.R406N
MRSA.6406.406G84
MRSA.6406.406N64
MRSA.6406.4062601W165
MRSA.6406.4063760W164
NGS Application to Clinical Samples
Examples from our work
2011 ASCP Annual Meeting
Exemestane
Anastrozole
Letrozole
SURGERY
Continued therapywith an AI wherepossible:Radiotherapy,Chemotherapydiscretionary
Accrual @ 377
Core Biopsy
PostmenopausalER+, Allred 6-8,Clinical Stage2 and 3
Genomics of Aromatase Inhibitor Response in Luminal Breast Cancer
2011 ASCP Annual Meeting
Accrual @ 377
Tumor (Ki67)
46 “luminal” subtype breast cancer cases: 25 AI-responsive and 21 AI-unresponsive tumors (per resection Ki67 index).
46 genome pairs completed from pre-treatment core biopsies with >70% neoplastic cellularity. Manuscript in revision.
We now have sequenced the resected post-treatment tumor genomes for most of these patients.
A comprehensive mutational spectrum of ER+ Breast Cancer
2011 ASCP Annual Meeting
• These mutations were tested in a panel of 121 additional luminal cases from the two clinical trials to ascertain significantly mutated genes.• 70/167 tumors carried PIK3CA mutations (41.9%).• TP53 was mutant in 15.7% (26 out of 167 tumors)
PIK3CA Mutations in Luminal Breast Cancer
2011 ASCP Annual Meeting
• Two novel in-frame deletions were discovered in PIK3CA.• 20 of 26 point mutations in PIK3CA reside in the cluster with the highest allele frequency, indicating these are likely initiating mutations in the founder clones. •PIK3CA was the most frequently mutated gene at 43% of patients.
MAP3K1 Mutations in Luminal Breast Cancer
2011 ASCP Annual Meeting
• MAP3K1 is a serine/threonine kinase that activates the ERK and JNK pathways• We identified 6 mutations in the 46 tumors (2 nonsense, 3 frameshift, 1 missense)• Four patients carried two MAP3K1 mutations, indicating bi-allelic loss
Deletion of MAP3K1
Normal
2011 ASCP Annual Meeting
Tumor
Chromothripsis in Luminal Breast Cancer
2011 ASCP Annual Meeting
• Twelve cases lacked any translocations.• Seven cases had multiple complex translocations with breakpoints suggestive of a single cellular catastrophe (“chromothripsis”).
Potentially Druggable Somatic Mutations
2011 ASCP Annual Meeting
When combining common and rare mutations across 46 samples, 96% of patients have a potentially
druggable mutation identified by WGS
� Full and interpretive analysis
� Time to results in a clinical timeframe
� Demonstration of clinical efficacy (“Genome-forward medicine”)
� Adequacy of NGS + data analysis in the CLIA environment, compared to the current standard in pathology
The Challenges Ahead for Diagnostic NGS
2011 ASCP Annual Meeting
compared to the current standard in pathology
� Uniformity of results across sample preservation methods
� Ability to inform patients about what might and might not result from genomic data interpretation
� Minimizing false negative results (meeting the need with the appropriate technology)
WGS for diagnosis
Defining actionable targets in cancer patients
2011 ASCP Annual Meeting
2011 ASCP Annual Meeting
Conclusion:
“…whole genome characterization will become a
routine part of cancer pathology.”
Clinical Course of a WGS Patient
2011 ASCP Annual Meeting
In cancer, whole genome analysis will be done not once, but multiple times during the course of the disease fortumor subtyping, monitoring response to therapy and
diagnosing the reasons for recurrences or therapeutic failures.
Welch et al., JAMA April 20, 2011
2011 ASCP Annual Meeting
Clinical case: “AML52”
Atypical APL Presentation
37 y.o. female with de novo AML; M3 morphology
Complex cytogenetics,persistent leukemia
Chemo + ATRA
Chemo only
2011 ASCP Annual Meeting
First remission, referred to WU for SCT.rBM: normal morphology, cytogenetics; negative
for PML/RARA.
AllogeneicSCT
Consolidation + ATRA
Chemo only
???
“Genome-Guided Medicine”: An early example
Detection of PML-RARA by
WGS,Confirmed by
FISH, RT-PCR
2011 ASCP Annual Meeting
37 y.o. female with de novo AML,
M3 morphology, CTG, no PML-
RARA.Referred to WUSM
for SCT.
Consolidation:Chemo + ATRA
Sustained remission
R.K.Wilson2011
A Paradigm for Clinical Sequencing
� Output per flow cell ~300 Gbp in a 10 day run time
� Combine the following data types per flow cell per clinical cancer case:
� Normal WGS (30X)
2011 ASCP Annual Meeting
> Validation plus deep digital
sequencing
> Mutation expression + fusion
transcripts
� Tumor WGS (60X)
� Tumor exome (1 lane)
� Normal exome (1 lane)
� Tumor RNA-seq (1 lane)
� Integrated data analysis
� Pharmaco-oncogenomic prediction of targeted therapy
Example case
Acute lymphocytic leukemia
2011 ASCP Annual Meeting
Example case
ALL-1: Case History
� Male patient, mid-30’s. � Initial presentation of ALL, induction therapy to remission,
BMT from sibling (blasts banked).� Patient relapsed in July 2011.� Induction therapy to apparent remission.
Using WGS of initial ALL blasts, two subclones identified,
2011 ASCP Annual Meeting
� Using WGS of initial ALL blasts, two subclones identified, each with two distinct large deletions.
� Probes designed to detect these deletions used for flow-FISH of remission marrow. Minimal residual disease detected.
� Treatment options assessed, based on WGS and RNA-seq data analysis and DrugBank evaluation.
ALL-1: Somatic copy number alterations
2011 ASCP Annual MeetingAcute lymphocytic leukemia
ALL-1: Somatic single nucleotide variations� 91 somatic ‘tier 1’ SNVs
� 42 with evidence for expression in RNA-seq
� Variant allele frequency of the 10 most highly expressed are listed below
Gene Ref. Var. AA Type WGS Exome RNA-seq
UBXN4 T C D�D silent 51.25% 40.56% 53.40%
OGT G T C�F missense 39.22% 38.7% 35.79%
2011 ASCP Annual Meeting
KIAA1033 C T T�I missense 38.89% 48.1% 50.75%
C15orf39 C G A�A silent 47.37% 37.5% 48.74%
SPTAN1 T C L�P missense 42.39% 49.03% 50.17%
DDX6 A G L�P missense 21.78% 14.29% 18.14%
CCDC47 C T A�T missense 22.34% 21.9% 23.36%
NF1 C T R�* nonsense 64.58% 64.04% 47.06%
TNRC6B G A S�N missense 13.25% 12.8% 19.46%
KIAA1462 G C P�A missense 70% 45.05% 41.14%
Acute lymphocytic leukemia
ALL-1: Tumor heterogeneity
Tumor variant allele frequency
Pro
port
ion
2011 ASCP Annual Meeting
Read c
overa
ge (
X)
Tumor variant allele frequency
2,074 tier1-3 somatic variants. 91 are tier1 (coding exons)
NF1
Acute lymphocytic leukemia
RNA-seq reveals unusual FLT3 expression levels
RNA-seq Analysis
2011 ASCP Annual MeetingAcute lymphocytic leukemia
FLT3 expression is not just high, it is extreme…
Distribution of non-zero gene expression values
1000
1500
FLT3
Quantitation of FLT3 Expression
• Of 49,887 genes, 28,671 had an FPKM greater than zero.
• FLT3 ranks #229 by FPKM.• This places it inside the top
1% of all expressed genes.
2011 ASCP Annual Meeting
Gene expression (log2(FPKM + 0.00001))
Frequency
-15 -10 -5 0 5 10 15
0500
1% of all expressed genes.
• Based on FLT3 overexpression in the tumor cells, predict the patient will respond to the FLT3 inhibitor Sunitinib
(Sutent)[DrugBank].
Acute lymphocytic leukemia
� Patient has been prescribed Sutent as of 2 weeks ago.
� Initial blood tests after 3 days on Sutent indicated blast production had improved by ~50% post-remission.
� Plan to re-evaluate remission/MRD status using flow-
ALL-1: Patient status
2011 ASCP Annual Meeting
� Plan to re-evaluate remission/MRD status using flow-FISH probes and a bone marrow biopsy after 3 weeks of Sutent therapy.
� If remission is complete, a bone marrow donor has been identified and BMT will follow.
Am J Clin Pathol 2011; 135:668-672
The Future of Pathology?
2011 ASCP Annual Meeting
� The interplay of surgery, oncology, pathology and cancer genomics is beginning to transform our understanding of the tumor genome.
� In addition to identifying targetable mutations, we can determine the mutational spectrum of the tumor cell subpopulations present, providing potentially
Conclusions
2011 ASCP Annual Meeting
cell subpopulations present, providing potentially important information in therapeutic decisions.
� The maturity and scale of commercial NGS platforms and interpretational software are beginning to match the clinical need for WGS.
The Genome Institute
� Li Ding
� Dong Shen
� Chris Miller
� David Larson
� Malachi Griffith
Acknowledgements
Tim Ley, MDPeter Westervelt, MDJohn DiPersio, MDJohn Welch, MD, Ph.D.Tim Graubert, MDLukas Wartman, MD
Washington University
2011 ASCP Annual Meeting
� Nathan Dees
� Christine Wylie
� Todd Wylie
� Jason Walker
� Vince Magrini
� Ryan Demeter
� Sean McGrath
Lukas Wartman, MDMatthew Ellis, MB, Ph.D.
George WeinstockRick Wilson