Next Generation Sequencing Update
Karl V. Voelkerding, MD Professor of Pathology
University of Utah Medical Director for Genomics and Bioinformatics
ARUP Laboratories
AACC-AMP 2012 Molecular Pathology Course
Disclosures
• Grant/Research Support: NIH
• Salary/Consultant Fees: None
• Committees: College of American Pathologists
• Stocks/Bonds: None
• Honorarium/Expenses: None
• Intellectual Property/Royalty Income: None
Learning Objectives
• Explain Principles of NGS
• Describe Current and Future NGS Platform Options
• Discuss Spectrum of NGS Clinical Applications
Sanger Sequencing
Electrophoretic Separation of Chain Termination Products
Next Generation Sequencing
Paradigm Shift
Sequence Clonally Amplified DNA Templates in a Flow Cell
Massively Parallel Configuration
Genomic DNA or Enriched Genes
Fragmentation
End Repair and Adapter Ligation
Fragment A Adapter Adapter
“Fragment Library”
Process
(150 – 500 bp)
Fragment B Adapter Adapter
Fragment C Adapter Adapter
Clonal Amplification of Each Fragment
Sequencing of Clonal Amplicons in a Flow Cell
“Fragment Library”
A
B
C
Emulsion Bead PCR Surface Clusters
C B A A
B
C
Process
Sequencing of Clonal Amplicons in a Flow Cell
Generation of Luminescent or Fluorescent Images
Conversion to Sequence
Pyrosequencing 454
Reversible Dye Terminators Illumina
Sequencing by Ligation SOLiD
Process
454/Roche
Pyrosequencing
Solexa/Illumina
Reversible dye terminators
200 – 400 base reads 36 – 75 base reads
Bead Emulsion PCR Surface Bridge PCR
Luminescence (Roche)
Fluorescence (Illumina,SOLiD)
pH Detection (Ion Torrent)
Signal to Noise Processing
Cyclic Base Calls C G A T G C - - -
Base Quality Scores C30 G28 A33 T30 G28 C30 - - -
Next Generation Sequencing • Sequence up to billions of fragments simultaneously
• Iterative/cyclic sequencing
Next Generation Sequencing Data
Primary Sequence Alignment
BWA
Refined Sequence Alignment GATK/Picard
Variant Calling SAMTools/GATK
Variant Annotation Annovar
@HW-ST573_75:1:1:1353:4122/11
CAATCGAATGGAATTATCGAATGCAATCGA
ATAGAATCATCGAATGGACTCGAATGGAAT
CATCGAA
+
ggfggggggggggggfgggggggfgegggg
fdfeefeggggggggegbgegegggdeYed
gggggeg
@HW-ST573_75:1:1:1347:4151/11
ATCTGTTCTTGTCTTTAACTCTCAAGGCAC
CACCTTCCATGGTCAATAATGAACAACGCC
AGCATGC
+
effffggggggggggggfgggggggggggg
gdggggfgggfgdggaffffgfggffgdgg
ggggdfg
@HW-ST573_75:1:1:1485:4153/11
GAGGAGAGATATTTTGACTTCCTCTCTTCA
TATTTGGATGCTTTTTACTTATCTCTCTTG
ACTAATT
+
dZdddbXc`_ccccbeeedbeaedeeeee^
aeeedcaZca_`^c[eeeeed]eeecd[dd
^eeba[d
FastQ File Format
Variant g.34142190T>C in TPM1
454/Roche 2004/5
Solexa/Illumina 2006/7
ABI/Life Tech 2007/8
GS FLX
Genome Analyzer
SOLiD
First Wave
GS Junior
SOLiD 5500 SOLiD 5500xl
Helicos Pacific
Biosciences
HeliScope
Second Wave - SMS
SMRT
GAIIx GAIIe
HiScanSQ HiSeq
MiSeq 2011
Next Generation Sequencers
Ion Torrent Life Technologies PGM
2011
Third Wave
Clinical Dissemination
Illumina HiSeq 2000
2 Independent Flow Cells
8 Lanes per Flow Cell
2 X 100 base pairs
540-600 Gb Output
8-11 Day Sequencing Run
Multiple Gene Panel Samples per Lane
2 Genomes per Flow Cell
2-3 Exome(s) per Lane
Reversible Dye Terminators
Illumina MiSeq
2 X 150 bp 2 X 250 bp
2.0 – 7.0 Gb Output
~27 Hrs Sequencing Run
Multi-Gene Panels Genetics Oncology
Microbiology
Viral and Bacterial Genomes
Transcriptomes
100 – 200 base pairs
10 Mb – 1.0 Gb Output
~2 Hrs Sequencing Run
Monitors H+ Release
Ion Torrent
Multi-Gene Panels Genetics Oncology
Microbiology
Viral and Bacterial Genomes
Transcriptomes
Illumina HiSeq 2000
2 X 100 base pairs
540-600 Gb Output Single Genome in 27+ Hours
Multiple Exomes in 27+ Hours
Upgrade Module
120 Gb 27+ Hours
Late 2012
11 Day Sequencing Run
Oxford Nanopore Technologies
Processive Enzyme
Protein Nanopore in Polymer Membrane
Current Disruption Based Electronic Signal
MinION – Late 2012
The Meeting Place
Biotechnology Bioinformatics
Biomedical Question
Sequence Analysis Interpretation
Sequence Generation
What is the Genetic Landscape of a Tumor
What Pathogen is Responsible for an Outbreak
What Genetic Contributors Account for a Phenotype
Multiple Genes
Multi-Gene Diagnostics
Clinical Phenotype
Locus Heterogeneity Allelic Heterogeneity
Mutational Spectrum
Multi-Gene Diagnostics
“New First Tier” Genetic Testing
Scaling Increases Interpretive Complexity
Can Yield Non-Definitive Results
Gateway to Exome/Genome
Multi-Gene Diagnostics
Genomic DNA
Enrichment
Target Genes
NGS Library Preparation
Next Generation Sequencing
Interpretation
Bioinformatics
PCR or LR-PCR RainDance ePCR
Fluidigm HaloGenomics
Solid Surface or
In Solution
Gene Enrichment Approaches
Amplification Based
Genomic DNA
Array Capture Based
Enriched Genes NGS
Advantage: Enrichment Specificity Advantage: Scalable to Exome
PCR or LR-PCR RainDance ePCR
Fluidigm HaloGenomics
Solid Surface or
In Solution
Gene Enrichment Approaches
Amplification Based
Genomic DNA
Array Capture Based
Drawbacks: Not as Scalable Instrument and Chip Costs
Drawbacks: Homologous Sequence Capture Manually Complex
~ 30+ Megabases (~ 1.5% of the genome)
~ 180,000 exons (~ 20,500 genes)
Harbors “Majority” of Mendelian Mutations
“Journey to the Center of the Genome”
Human Exome
Exome Sequencing History
“Genetic Diagnosis by Whole Exome Capture and Massively Parallel DNA Sequencing”
Choi et al PNAS 2009 – Congenital Chloride Diarrhea
~45 Gene Discovery Publications May 2012
Recessive Dominant De Novo
Library Preparation
Next Generation Sequencing Library
Exome Enriched Library
Bioinformatics Analysis
Next Generation Sequencing
Genomic DNA
Hybridize to Exome Capture Probes
MAZ HLA-DOB Exon 1
Coverage
Aligned reads
Reference Capture probes
Exon 1
Nimblegen Exome Capture and Illumina HiSeq
Exome Sequencing - Coverage of Coding Regions is Variable
Capture Technology – Probe Design and Capture Efficiency
Define Proportion of Exome “Adequately Covered”
Dependent On
Define Proportion of Exome “Not Adequately Covered”
Conversely
Exome Sequencing – Performance Characteristics
Sequencing Depth
Co-Capture Component
Pseudogenes
Paralogs and Homologs
Exome Sequencing – Performance Characteristics
Define Proportion of Exome “Accurately Sequenced”
Repetitive Elements
Difficult to Sequence Regions
Mendelian Disorders – Working Hypothesis Seeking “Rare” Variants in a Single Gene(s)
Needle(s) in the
Haystack(s)
Annotated Variants
Prioritization by Heuristic Filtering Prioritization by Likelihood Prediction
Filter Out Common Variants
Pathogenicity Prediction Filtering
Variant Binning
Candidate Genes/Potential Causative Variants
Cross Reference Databases
Pedigree Information Linkage/SGS/IBD
dbSNP/1000 genomes Variant frequency
SIFT/PolyPhen GERP
Intersects
HGMD/OMIM/Locus Specific
VAAST Algorithm
Missense Nonsense/Frameshift/Splice Site/Indels
Bioinformatics
Library Preparation
Next Generation Sequencing Library
Exome Enriched Library
Bioinformatics Analysis
Next Generation Sequencing
Genomic DNA
Hybridize to Exome Capture Probes Genome
Sequencing
Library Preparation
Next Generation Sequencing Library
Bioinformatics Analysis
Next Generation Sequencing
Genomic DNA
vs
Cost – Coverage – Complexity
Exome Sequencing
Genome Sequencing
Horizon
Continued Evolution of Sequencing and Bioinformatics
College of American Pathologists Checklist Requirements for Next Generation Sequencing
Professional Societies Guidelines for Clinical Next Generation Sequencing
Top Related