The Agilent Technologies
SureSelect™ Platform for Target Enrichment
Focus your next-gen sequencing on DNA that matters
Kimberly Troutman
Field Applications Scientist
January 27th, 2011
Agenda
Introduction: SureSelectTM
2 Exome Approach for Genetic Diseases
1
3 Complex Diseases
Custom Biomarker Discovery and Profiling4
7 New SureSelect Products
5 Targeted RNA Sequencing
6 Kinome Kit
Page 2
Page 3
Target Enrichment: A Highly Enabling Process
What?
• Also referred to as genome partitioning, targeted re-sequencing, DNA capture…
• Captures genomic material of interest for next generation sequencer (i.e. Illumina, SOLiD, 454 etc…)
Why?
• Sequence your regions of interest!
• Enables focus on a subset of the genome
• Saves both time and money for downstream sequencing
• Identify homozygous and heterozygous
variants in targets relative to the reference
genome
gDNA
Enriched
library
Page 4Page 4
Agilent’s SureSelect™ Platform: Two Options
SureSelect Target
Enrichment System*
Developed in collaboration
with the Broad Institute
Dr. Chad Nusbaum et al.
SureSelect
DNA Capture Array
Developed in collaboration
with Cold Spring Harbor
Dr. Greg Hannon et al.
*Flagship Method Released February 2009
Agilent 60-mer Array
244k & 1M features
3 µg gDNA
1-5 µg gDNA
(with WGA)
or
20 µg gDNA (unamplified)
Released July 2009
Illumina GAIIx
Illumina HiSeq
SOLiD 3
SOLiD 4
5500
GS FLX &
GS JR
• Baits
• cRNA probes • Long (120 bases) • Biotin labeled
SureSelect Target Enrichment Kit Choices
Product Target amount (Mb) Reactions/kit Product Definition
Human X-
demo3.05 5 Human X-chr Exons
Human All
Exon v138 5-10,000
Catalog content from CCDS
2008 plus >1000 ncRNA
Human All
Exon Plus
38-50 plus up to 6.8 of
custom content5-10,000
Add custom content to All
Exon catalog content
Human All
Exon v244 5-10,000
CCDS Sept. 2009
Plus additional RefSeq
Human All
Exon 50Mb50 5-10,000
GENCODE content
Most comprehensive coverage
Multiplexable
Kinome 3.2 5-10,000 All kinases
Indexed
custom
content
<0.2
0.2 - 0.49
0.5 - 1.49
1.5 - 2.9
3 - 6.8
10-5,000Custom offering
-Illumina (12 indexes)
-SOLiD (16 barcodes)
SureSelect Kits Multiplexing Capability
Target Enrichment Size
Ranges
Illumina AB SOLiD
GA HiSeq 2000 Octet Quadrant Flow Cell Full Run
<200 Kb targets 12 12 16 16 16 16
200 Kb - 499 Kb targets 12 12 16 16 16 16
500 Kb - 1.49 Mb targets 12 12 5 10 16 16
1.5 Mb - 2.99 Mb targets 12 12 3 7 16 16
3.0 Mb - 6.0 Mb targets 8 12 2 3 16 16
Human All Exon 38 Mb 1 4 0 1 3 7
Human All Exon 50 Mb 1 3 0 0 3 5
Agilent SureSelectXT Kits
gDNA kit + Library Prep kit + SureSelect Reagents
= SureSelectXT Kit
SureSelectXT Kit – Coupled with an optimized gDNA prep and library prep kit, allows the
use one kit for the entire, sample-prep-to-sequencing target enrichment workflow
• Kit composition
• gDNA Isolation – Lysis buffer and enzymes required for isolation
• Library prep – Buffers, reagents, enzymes and indexes needed for prep
• SureSelect Target Enrichment Kit – Hybridization buffers and ”baits”
• All kits are available in the XT format- catalog kits and custom content
• SureSelectXT All Exome & SureSelectXT All Exome Plus
• SureSelectXT Human Kinome & SureSelectXT Human X Chromosome
• SureSelectXT Custom from < 200 Kb to > 6.8 Mb (up to 34 Mb in Spring 2011)
• Illumina GAIIx and HiSeq 2000 (Protocol v1.0 Nov 2010)
• SOLiD 3 / 4 and 5500 (Available soon)
SureSelectXT – complete sample to sequencer
solutions for your target enrichment needsGenomic
DNA prep
Library prep
(GA, SOLiD)
Bioanalyzer,
qPCR quant
Manual Procedure for small
number of samples
Sequencer
Page 11
Agenda
Introduction: SureSelectTM
2 Exome Approach for Genetic Diseases
1
3 Complex Diseases
Custom Biomarker Discovery and Profiling4
7 New SureSelect Products
5 Targeted RNA Sequencing
6 Kinome Kit
Page 12
Exon Capture is a Powerful Tool to Study
Mendelian Diseases
• Mendelian diseases are caused by coding mutations (with some exceptions)
• Exons are only ~1-1.4 % of human genome (30-50Mb)
• Primarily protein coding regions
Advantages:
• Much less sequencing
• ~5% of WGS, so up to 20x more samples
Why coding?
• More interpretable
• Easier to follow up
• Especially adapted to study of Mendelian diseases
• CCDS exons – v1
• CCDS + RefSeq – 38 Mb v2 (Broad)
• GENCODE – 50 Mb (Sanger)
• Includes ncRNA
• All Exons on X chromosomes
• 7674 exons
• 3 Mb
Page 13
Page 17
SureSelect Human All Exon Kits
All Exon v1 All Exon v2 All Exon50 Mb
CCDS Sept. 2008
CCDS Sept. 2008
+ additional RefSeq
content including
CCDS Sept. 2009
exons
GENCODE and
Sanger (includes
CCDS and Broad
defined v2 content as
well)
CCDS (Nov. 2010) 89.6% 98.2% 99.5%
CNV (Mar. 2010) 23.98% 27.49% 30.62%
Ensembl (Aug. 2010) 79.9% 90.9% 96.2%
miRNA (miRBase 14) 90.0% 90.0% 92.8%
GenBank (6/16/2010) 75.96% 89.07% 90.74%
RefSeq Genes (Nov. 2010) 85.0% 96.9% 99.0%
RefSeq Transcripts
(6/16/2010)88.85% 95.07% 97.50%
Target Size 38Mb 44Mb 50Mb
Developed with Broad Broad Sanger
• Human All Exon kits can be customized (PLUS) with up to 6.8 Mb additional custom content
• Human All Exon kits can be multiplexed on SOLiD4 and HiSeq2000
Human All Exon 50Mb – 2x76 bp, 50-60M HQ Reads
Page 18
76.32%
85.07%
96.65%
87.93%
77.46%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
% on target +/-
100bp
Uniformity (3/4 mean with upper
tail):
% bases with 1x
coverage
% bases with 10x coverage
% bases with 20x coverage
The most comprehensive Human
All Exon content available
38 Mb design = a subset of 50 Mb
Sequencing capacity:
• 0.5-1 sample / lane GAIIx
• 1-3 samples / lane HiSeq
• 5-10 samples /full slide SOLiD4
Chemistry recommended:
• PE 2x76 bp Illumina v4
• PE 50+25 SOLiD
Multiplexing:
• Illumina
• SOLiD
Comparison of SNP Calls with HapMap
Page 19
99.1% 99.2%98.4% 98.0%
95.7%94.9%
70%
75%
80%
85%
90%
95%
100%
Human All Exon v2 Human All Exon 50Mb
GT is REF GT is variant HOM
GT is variant HET
99.8% 99.7%98.2% 98.1%98.5% 98.3%
99.4% 99.3%
70%
75%
80%
85%
90%
95%
100%
Human All Exon v2 Human All Exon 50Mb
GT is REF GT is variant HOM
GT is variant HET OVERALL
Genotype Concordance vs. HapMapGenotype Sensitivity vs. HapMap
All Exon Plus
Page 20
All Exon Library
+
Your Custom Library
Enter Your Custom Regions in eArray
CCDS exons
>1000 ncRNAs
38 Mb
Your regions of
interest (6.8 Mb)
Is the Human All Exon Kit not hitting all of your regions
of interest?
Human All Exon Plus Performance
Page 21
1 tube capture, 1 lane seq. at 2x76 bp on GAIIx = ~2 Gb
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Exome +
0.87 Mb
Exome +
1.7 Mb
Exome +
3.4 Mb
Exome +
6.8 Mb
Exome
Control
SNP Analysis vs. HapMap
Sensitivity Concordance
0
5000
10000
15000
20000
25000
30000
35000
Exome +
0.87 Mb
Exome +
1.7 Mb
Exome +
3.4 Mb
Exome +
6.8 Mb
Exome
Control
22394
23224
24337
26352
21976
4480
5528
5816 6
182
5173
No. S
NP
s
SNP Analysis vs. dbSNP
Concordant Novel Mismatched
Agenda
Introduction: SureSelectTM
2 Exome Approach for Genetic Diseases
1
3 Complex Diseases
Custom Biomarker Discovery and Profiling4
7 New SureSelect Products
5 Targeted RNA Sequencing
6 Kinome Kit
Page 22
Beyond Mendelian Diseases: Complex diseases
Page 24
25bp deletion
7bp deletion
10bp deletion
11bp deletion
Agenda
Introduction: SureSelectTM
2 Exome Approach for Genetic Diseases
1
3 Complex Diseases
Custom Biomarker Discovery and Profiling4
7 New SureSelect Products
5 Targeted RNA Sequencing
6 Kinome Kit
Page 25
Other Applications of Targeted Re-Sequencing
• Capture any custom genomic regions (introns, exons, UTRs, regulatory, etc.)
• Ideal for biomarkers discovery and profiling (e.g. cancer)
• Ideal for custom SNP follow-up
• Ideal for characterization of large sample cohorts
Key enabling features:
• High throughput
• 12 Illumina indexes / up to 96 samples per run
• 16 SOLiD barcodes / up to 128 samples per run
• Only pay what you capture, scalable from 0.2 to 6.9 Mb (sweet spot for 3rd Gen
Seq)
• <0.2 Mb
• 0.2 – 0.5 Mb
• 0.5 – 1.5 Mb
• 1.5 – 3 Mb
• 3 – 6.9 Mb
• Very reproducible, excellent allelic balance for accurate heterozygote calls
• Custom and catalog content (kinome)
• Automation (library prep and capture)
Page 26
Page 27
Target Enrichment Design Application in eArray
• eArray is a tool to design and order custom microarrays, qPCR
primers and SureSelect products (and it is free!!)
• eArray is divided into “Application Spaces”
• Allows for application specific functionality
• Target Enrichment application space features:
• Create custom baits and bait libraries
• Search existing designs/baits
– Catalog and custom
• Upload custom bait designs
• Download design files
• Share designs
• Get quotes
Page 27
Customize your SureSelectTM Kit
Create your own design or add extra custom sequence to
a catalog design – up to 6.8Mb
Customer A
Gene ID 1
Gene ID 2
Gene ID 3
Baits
#1
Baits
#2
Baits
#3
Virtual
bait
library
Bait design
Bait design
Bait design
Library
design
Kit
size
Quote
eArray Webportal
Customer B
DNA bait
library
RNA bait
library
Kit
Assemble kit
Ship Kit to Customer
Order
library
Up to 55,000
unique baits
https://earray.chem.agilent.com
Page 28
• Inherited loss-of-function mutations in the tumor suppressor genes BRCA1, BRCA2, and
multiple other genes predispose to high risks of breast and/or ovarian cancer. Cancer-
associated inherited mutations in these genes are collectively quite common, but individually
rare or even private.
• To determine whether massively parallel, “next-generation” sequencing would enable
accurate, thorough, and cost-effective identification of inherited mutations for breast and
ovarian cancer, we developed a genomic assay to capture [with Agilent’s custom SureSelect],
sequence, and detect all mutations in 21 genes, including BRCA1 and BRCA2, with inherited
mutations that predispose to breast or ovarian cancer.
• There were zero false-positive calls of nonsense mutations, frameshift mutations, or genomic
rearrangements for any gene in any test sample.
• This approach enables widespread genetic testing and personalized risk assessment for
breast and ovarian cancer.
Page 31
Page 36
Efficient Capture of 5 bp Deletion on Chr X:Menke’s Syndrome
Page 36
hg18_ChrX_77131408_77131467_+ : Wild type Bait Design
CTATTGTTTATCAACCTCATCTTATCTCAGTAGAGGAAATGAAAAAGCAGATTGAAGCT
CTATTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAA
ATTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAG
TTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGC
GTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAG
TATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATT
ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG
ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG
ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG
CAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTGAA
CCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTGAAGCT
SureSelect™ Target Enrichment Kit Efficiently Captures 5 bp MutantReadout on Illumina GA
Agenda
Introduction: SureSelectTM
2 Exome Approach for Genetic Diseases
1
3 Complex Diseases
Custom Biomarker Discovery and Profiling4
7 New SureSelect Products
5 Targeted RNA Sequencing
6 Kinome Kit
Page 37
SureSelectTM RNA Target Enrichment
FIRST IN CLASS
• First RNA Capture product on the market
Custom and catalog kits
• Design kits from 200Kb to 3.4Mb using RNA Target
Enrichment space on eArray portal (PN G7581-G7585)
• RNA Capture Kinome Kit (catalog) containing same content as
current SureSelect Kit (PN G7580)
SureSelect RNA Enrichment Protocol
Start with 0.1-0.5ug RNA
• Similar process to DNA Target
enrichment
• Except that it is a cDNA NGS
library
• Protocol time ~ 3-4 days
• Protocols available
• Illumina and SOLiD
• Individual or multiplexed
samples
Agenda
Introduction: SureSelectTM
2 Exome Approach for Genetic Diseases
1
3 Complex Diseases
Custom Biomarker Discovery and Profiling4
7 New SureSelect Products
5 Targeted RNA Sequencing
6 Kinome Kit
Page 45
SureSelect “kinome” – Discovery and profiling of
biomarkers related to disease and/or drug response
Definition of SureSelect Human Kinome Kit: 3.2Mb (incl. UTRs)
(Original content defined by Prof. René Bernards – NKI)
• 518 putative kinases
• 12 PI3K domain-containing genes
• 13 diglyceride kinases
• 6 PI3K regulatory components
• 9 inositol polyphosphate Kinases
• 9 PIP4/PIP5 Kinases
• 28 genes frequently mutated in human cancer
• 19 genes specifically known to be mutated in breast cancer
• 612 genes total
G. Manning et al Science 298 1912 (2002)
Slide courtesy of Rene Bernards
Kinome Kit Performance –
3-5 samples per GAIIx lane / SOLID quad
0%
20%
40%
60%
80%
100%
Kinome
Index 1
Kinome
Index 2
Kinome
Index 3
Kinome
Index 4
Kinome
Index 5
Reproducible Performance Across Indexes
% on target +/- 200bp
% Bases 1X Coverage
% Bases 10X Coverage
% Bases 20X Coverage
1.15
0.84
1.00
0.98
1.15
Even Index Representation Across Single Lane
Kinome Index 1
Kinome Index 2
Kinome Index 3
Kinome Index 4
Kinome Index 5
Uniform Read Depth Distribution
Page 47
Agenda
Introduction: SureSelectTM
2 Exome Approach for Genetic Diseases
1
3 Complex Diseases
Custom Biomarker Discovery and Profiling4
7 New SureSelect Products
5 Targeted RNA Sequencing
6 Kinome Kit
Page 49
SureSelect + 454
• SureSelect support for both 454 FLX and GS Junior
sequencers
• Simplified protocol with a rapid library protocol
(<3hrs), the shortest in-solution capture protocol
• Only 500 ng of starting material required
• Full SureSelect product line available, custom bait
libraries from <200 kb - 6.8 Mb capture size or catalog
kit ( Human All Exon, Kinome and X chromosome)
• Allows for detection of mutations, SNPs, indels, CNVs
and fusions/translocations
454 FLX SureSelect Custom Capture: 0.5 Mb
DNA NA10831
Bait and Pond 00.5Mb_B
Avg read length: bases 360.3
Total number of bases mapped: bases 67,157,190
Percentage reads in targeted regions
:57.07%
Percentage reads in regions +/-
300bp:59.35%
Average Read Depth: fold 52.7
Percentage of targeted bases covere
d by...
...at least 1 read: 99.34%
...at least 5 reads: 98.99%
...at least 10 reads: 98.12%
...at least 20 reads: 95.42%
...at least 30 reads: 88.83%
...at least 40 reads: 76.10%
1/4 PicoTiterPlate run, 67 Gb of sequence
>95% of capture sequenced at 20X depth or greater
454 FLX SureSelect Custom Capture: 0.5 Mb
SNP Detection
343 HapMap SNPs were assayed in replicate samples of NA10831
98% 95% 97%100% 100%100%
Cancer Research – Gene Fusions
Problem: Genomic rearrangement in tyrosine
kinase genes
• Can lead to deregulation of cellular signaling
and cancer
• Identification of novel TK fusions is
laborious
• TKs are attractive therapeutic targets
Solution: SureSelect Custom Capture (908 Kb)
• based on known cancer-derived TK fusions
• Designed baits to a conserved GXGXXG
motif in 90 TKs + ATK and BRAF
• Regions extended to include preceding
exons/introns
• 454 long-read sequencing
SureSelect Custom Capture (908 Kb)
SureSelectXT Mouse All Exon Kit
• Agilent SureSelectXT Mouse All Exon Kit
• For SOLiD, Illumina & 454 platforms
• Available in 5 to 10,000 reactions
• Designed against UCSC mm9 / NCBI build 37 (July 2007)
Exon definition derived from Ensembl + RefSeq
• Complete Mouse exome coverage
• 49.6 Mb capture
• 221,784 exons and 24,306 genes
• Excellent coverage uniformity, on-target specificity and SNP
detection and accuracy
SureSelectXT Mouse All Exon Kit:
Illumina GAIIx, single lane 2x76 PE, 5.2 Gb
• C3H mouse genomic DNA, 49.6 Mb Mouse All Exon Capture
• On-target reads= 69%
• 98% of Bases covered at 1x or greater and the average read depth was 54X
• 84 % of the targeted bases were sequenced at a depth of 20X or greater, enabling high-
confidence SNP calling
SureSelect Mouse All Exon Kit SNP Sensitivity
& Concordance
A) Sensitivity of SNP detection relative to the Perlegen Mouse SNP dataset was very high with the SureSelect XT
Mouse All Exon Kit / Illumina GAIIx platform, with 99 percent of reference SNPs detected for the C3H and DBA
samples. Variant SNPs were also detected at high rates (99 percent) for both the C3H and DBA samples.
B) Of the SNPs detected, concordance with the Perlegen Mouse SNP data set for both the C3H ad DBA samples
was 98 percent for the variants and 95 percent overall.
99% 99% 95%98%
SureSelectXT Catalog Exome Kits
• Coming soon in 2011 - SureSelectXT Exome Kits for…
• Bovine, Canine, Xenopus, and Zebrafish
Acknowledgements
• Collaborators:
• Broad Institute
• Chad Nussbaum et. al
• Stacey Gabriel et. al
• Sheila Fisher et. al
• Sanger Institute
• Daniel Turner et.al.,
• NKI
• Rene Bernards
• Ian Majewski
• RIKEN Institute
• Yoichi Gondo
• All our early access collaborators
(over 20 institutions worldwide)
Top Related