Improved Performance of Solution-based Target Enrichment ...

1
Improved Performance of Solution-based Target Enrichment with Spike-in of Individually Synthesized Capture DNA Probes Derek Bowen, Michele Cargill, Curtis Kautzer, Tom Landers, Gautam R. Mehta, & Eric Olivares; InVitae Team, 458 Brannan Street, San Francisco, CA 94107 INTRODUCTION While DNA sequencing costs have decreased and throughput increased at an incredible rate recently, the cost for sequencing a complete genome at high depth is still prohibitively expensive, and thus the vast majority of sequencing efforts still consist of decoding targeted portions of the genome (exome, cancer panels, etc.). In the past few years, several commercial methods / technologies of targeted enrichment have emerged. One such example is solution hybrid capture (SHS), developed by the Broad Institute and commercialized by Agilent Technologies (SureSelect), where specific biotinylated probes hybridize to genomic DNA targets and then are selectively enriched with the use of streptavidin beads and finally sequenced. There are many advantages to this process, namely the ability to capture precise targets efficiently and in an automatable manner, and recent commercial success of these companies is evident. Unfortunately, there are limitations and challenges to SHS—targets that are high GC, homologues and paralogues, and the inability to accurately assess quality of capture probes on a large scale. Here, we present a method to overcome some of these deficiencies by performing spike-ins of individually synthesized and quality-controlled DNA oligonucleotide capture probes into our existing SureSelect probe pools prior to sequencing. Advantages are numerous: we have successfully seen greater on-target capture, reduced GC bias, and more uniform coverage, whilst maintaining similar sensitivity and specificity. This spike-in method allows us to fine-tune our final captured product by simply adding or removing probes. Furthermore, this method is fully automatable and more scalable than PCR. MATERIALS / METHODS Sample Preparation - Hapmap DNA samples were obtained from Coriell, and ligation- based sequencing libraries were prepared in accordance with Agilent’s SureSelect XT Target Enrichment System for Illumina Paired-End Sequencing Library protocol. Briefly, gDNA samples were sheared to ~200bp on a Covaris E220, end-repaired, A- tailed, and ligated with SureSelect Adaptor Oligo Mix. The adaptor-ligated libraries were than amplified and prepared for hybridization. Probe Design – 1200 5’ biotinylated 120nt DNA probes were designed to regions where gaps were observed in our custom SureSelect panel (~28,000 120mer RNA probes) and were manufactured by Integrated DNA Technologies (IDT). RESULTS probe.GC. 10 -1 10 0 10 1 10 2 20 30 40 50 60 70 80 1/10 0.1 variable X1 X10 X25 X50 X100 IDT SureSelect PCR Only IDT Only SureSelect + IDT SureSelect + PCR Figure 2: Direct Comparison of PCR vs. IDT fill-in. Left – Raw coverage and uniformity comparison between existing PCR amplicons and IDT baitsets (designed to same region). The coverage is much more evenly distributed with IDT baits and provides a much better overall region when combined with SureSelect (right). Hybridization and Sequencing – In-solution hybrid selection was performed by spiking-in 1μL of various 100ng/μL pools of IDT probes with 1μL SureSelect baits and proceeding with 24 hour hybrization and capture washes according to Agilent’s protocol. Once targets were captured, samples were multiplexed by post- hybridization amplification. Libraries were quantitated by qPCR (Kapa Biosystems) and Illumina MiSeq platform using 2x150bp paired-end reads. Figure 1: In-solution hybrid selection performed by spiking-in IDT baits into SureSelect hybridization. Figure 3: Normalized Coverage of IDT probesets by GC% It is evident that the low GC (<35%) probes do not recover as well as the high GC (>35%). The different colours indicate varying input amounts of spiked-in IDT probesets with the X# representing the total ng of IDT probes that were added. DISCUSSION Current array-based probe synthesis methods have many limitations, including lack of QC on probe synthesis and long turnaround time. From our initial experiments, it is clear that individually synthesized DNA probes can effectively target and capture genomic regions for sequencing and can supplement existing capture technologies to increase overall coverage and uniformity. We have demonstrated that spiking-in DNA probes into our SureSelect hybridizations dramatically improves coverage with minimal GC bias. Probes less than 35% GC do not perform as well, but this is due to the extensive washes performed in the SureSelect protocol and can easily be recovered with further protocol optimization. Furthermore, by replacing our existing workaround of filling-in SureSelect gaps by PCR with spiking-in DNA probes, we have dramatically improved our workflow in several ways. Initially, to adequately increase coverage in the gap regions, each sample went through a completely separate workflow, where ~700 amplicons were generated by PCR and then prepared for sequencing. Using the spike-in method, this entire second workflow can be replaced by a simple addition of DNA probes into the existing SureSelect hybridization. We have not yet performed optimization on the hybridization buffers and reagents, which would undoubtedly improve performance of the DNA probes (see Figure 4), as current protocol hybridization conditions are optimized for RNA-DNA interactions. IDT’s fast turnaround time (7-10 days), coupled with the fact that spiking-in as few as 4-5 probes can significantly increase enrichment, allows for the ability to fine-tune and customize any larger array or panel quickly and cheaply. Need coverage/GC for V2 SureSelect only and V2 + IDT figure Figure 4: Low-GC regions recover nicely with decreases in capture wash stringency. Here, the regions marked by the arrows increase in coverage as the wash stringency is decreased. SureSelect wash buffers are designed specifically for the stronger RNA-DNA bond, and therefore the high-stringency washes effectively wash off enriched low-GC regions. SureSelect + IDT coverage / SureSelect only coverage SS (0.1x) SS+IDT(0.1x) SS+IDT(0.2x) SS+IDT(0.3x) SS+IDT(1x)

Transcript of Improved Performance of Solution-based Target Enrichment ...

Page 1: Improved Performance of Solution-based Target Enrichment ...

Improved Performance of Solution-based Target Enrichment with Spike-in of Individually Synthesized Capture DNA Probes Derek Bowen, Michele Cargill, Curtis Kautzer, Tom Landers, Gautam R. Mehta, & Eric Olivares; InVitae Team, 458 Brannan Street, San Francisco, CA 94107

INTRODUCTION While DNA sequencing costs have decreased and throughput increased at an incredible rate recently, the cost for sequencing a complete genome at high depth is still prohibitively expensive, and thus the vast majority of sequencing efforts still consist of decoding targeted portions of the genome (exome, cancer panels, etc.). In the past few years, several commercial methods / technologies of targeted enrichment have emerged. One such example is solution hybrid capture (SHS), developed by the Broad Institute and commercialized by Agilent Technologies (SureSelect), where specific biotinylated probes hybridize to genomic DNA targets and then are selectively enriched with the use of streptavidin beads and finally sequenced. There are many advantages to this process, namely the ability to capture precise targets efficiently and in an automatable manner, and recent commercial success of these companies is evident. Unfortunately, there are limitations and challenges to SHS—targets that are high GC, homologues and paralogues, and the inability to accurately assess quality of capture probes on a large scale. Here, we present a method to overcome some of these deficiencies by performing spike-ins of individually synthesized and quality-controlled DNA oligonucleotide capture probes into our existing SureSelect probe pools prior to sequencing. Advantages are numerous: we have successfully seen greater on-target capture, reduced GC bias, and more uniform coverage, whilst maintaining similar sensitivity and specificity. This spike-in method allows us to fine-tune our final captured product by simply adding or removing probes. Furthermore, this method is fully automatable and more scalable than PCR.

MATERIALS / METHODS Sample Preparation - Hapmap DNA samples were obtained from Coriell, and ligation-based sequencing libraries were prepared in accordance with Agilent’s SureSelectXT Target Enrichment System for Illumina Paired-End Sequencing Library protocol. Briefly, gDNA samples were sheared to ~200bp on a Covaris E220, end-repaired, A-tailed, and ligated with SureSelect Adaptor Oligo Mix. The adaptor-ligated libraries were than amplified and prepared for hybridization. Probe Design – 1200 5’ biotinylated 120nt DNA probes were designed to regions where gaps were observed in our custom SureSelect panel (~28,000 120mer RNA probes) and were manufactured by Integrated DNA Technologies (IDT).

RESULTS

probe.GC.

value

10-1

100

101

102

20 30 40 50 60 70 80

1/100.1

variableX1

X10

X25

X50

X100

IDT

SureSelect

PCR

PCR  Only  

IDT  Only  

SureSelect  +  IDT  

SureSelect  +  

PCR  

Figure 2: Direct Comparison of PCR vs. IDT fill-in. Left – Raw coverage and uniformity comparison between existing PCR amplicons and IDT baitsets (designed to same region). The coverage is much more evenly distributed with IDT baits and provides a much better overall region when combined with SureSelect (right).

Hybridization and Sequencing – In-solution hybrid selection was performed by spiking-in 1µL of various 100ng/µL pools of IDT probes with 1µL SureSelect baits and proceeding with 24 hour hybrization and capture washes according to Agilent’s protocol. Once targets were captured, samples were multiplexed by post-hybridization amplification. Libraries were quantitated by qPCR (Kapa Biosystems) and Illumina MiSeq platform using 2x150bp paired-end reads.

Figure 1: In-solution hybrid selection performed by spiking-in IDT baits into SureSelect hybridization.

Figure 3: Normalized Coverage of IDT probesets by GC% It is evident that the low GC (<35%) probes do not recover as well as the high GC (>35%). The different colours indicate varying input amounts of spiked-in IDT probesets with the X# representing the total ng of IDT probes that were added.

DISCUSSION    Current array-based probe synthesis methods have many limitations, including lack of QC on probe synthesis and long turnaround time. From our initial experiments, it is clear that individually synthesized DNA probes can effectively target and capture genomic regions for sequencing and can supplement existing capture technologies to increase overall coverage and uniformity. We have demonstrated that spiking-in DNA probes into our SureSelect hybridizations dramatically improves coverage with minimal GC bias. Probes less than 35% GC do not perform as well, but this is due to the extensive washes performed in the SureSelect protocol and can easily be recovered with further protocol optimization. Furthermore, by replacing our existing workaround of filling-in SureSelect gaps by PCR with spiking-in DNA probes, we have dramatically improved our workflow in several ways. Initially, to adequately increase coverage in the gap regions, each sample went through a completely separate workflow, where ~700 amplicons were generated by PCR and then prepared for sequencing. Using the spike-in method, this entire second workflow can be replaced by a simple addition of DNA probes into the existing SureSelect hybridization. We have not yet performed optimization on the hybridization buffers and reagents, which would undoubtedly improve performance of the DNA probes (see Figure 4), as current protocol hybridization conditions are optimized for RNA-DNA interactions. IDT’s fast turnaround time (7-10 days), coupled with the fact that spiking-in as few as 4-5 probes can significantly increase enrichment, allows for the ability to fine-tune and customize any larger array or panel quickly and cheaply.  

Need coverage/GC for V2 SureSelect only and V2 + IDT figure

Figure 4: Low-GC regions recover nicely with decreases in capture wash stringency. Here, the regions marked by the arrows increase in coverage as the wash stringency is decreased. SureSelect wash buffers are designed specifically for the stronger RNA-DNA bond, and therefore the high-stringency washes effectively wash off enriched low-GC regions.

Sure

Sele

ct +

IDT

cove

rage

/ Su

reSe

lect

onl

y co

vera

ge

SS (0.1x)

SS+IDT(0.1x)

SS+IDT(0.2x)

SS+IDT(0.3x)

SS+IDT(1x)