Schematic of Paired End SOLiD™ Sequencing ABTSRACT F3 ...
Transcript of Schematic of Paired End SOLiD™ Sequencing ABTSRACT F3 ...
0
2
4
6
8
10
12
14
16
18
20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Chromosome
Ave
rage
Lev
el o
f Cov
erag
e
0
5000
10000
15000
20000
25000
30000
1 2 3
Size of Insertion (bp)
Num
ber o
f Ins
ertio
ns
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%# Insertions%in dbSNP
0
5000
10000
15000
20000
25000
30000
35000
-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11
Size of Deletion (bp)
Num
ber o
f Del
etio
ns
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%# Deletions% in dbSNP
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
0 5 10 15 20 25 30 35 40 45
Level of Coverage
% o
f Gen
ome
Life Technologies • 5791 Van Allen Way • Carlsbad, CA 92008 • www.lifetechnologies.com
Paired End Sequencing on the SOLiD™ Platform1
ABTSRACTThe SOLiD™ platform is a revolutionary sequencing system that utilizes sequential ligation of fluorescently labeled oligonucleotide probes, enabling high fidelity and ultra-high throughput sequencing. Previous sequencing protocols for the SOLiD™ system have only been available in the forward direction (3’ to 5’). Sequencing in the reverse direction (5’ to 3’) is ideal as it enables fragment library paired end sequencing. To this end, novel ligation chemistries were developed to support 5’ to 3’ read lengths of up to 35 bases. This paired end sequencing technology will be incorporated into the SOLiD V4 platform, increasing effective read length, maximizing throughput per run, and meeting special research interests, such as whole transcriptome studies.
SOLiD™ OverviewSOLiD™ Sequencing involves the serial ligation of probes in which a dye reports the subset of four possible dibase pairs at the 1st and 2nd positions from the ligation junction. SOLiD™ Sequencing in the forward direction involves: A.) 5’-phosphorylated primer is hybridized to the adapter region of the templates to be sequenced; B.) Fluorophore labeled 8-mer complementary probes, containing 3 universal bases to decrease complexity, are ligated to the primer; a second round of ligation is performed with unlabeled probe to increase the amount of primer extended per bead; C.) Any remaining unextended primer is capped by dephosphorylation to prevent dephasing; beads are then imaged to record fluorophore reporter; D.) A phosphorothiolate bond in the ligated probes is cleaved with AgNO3, reducing the probe to 5 nucleotides and generating a free phosphate for the next round of ligation; E.) Additional cycles of ligation, capping, imaging, and cleavage are performed until the desired read length is obtained.
SOLiD™ Sequencing in the reverse direction involves: F.) 3’-hydroxylated primer is hybridized to the adapter region of the templates to be sequenced; G.) Fluorophore labeled 8-mer complementary probes (5’ phosphorylated), containing 3 universal bases to decrease complexity, are ligated to the primer; a second round of ligation is performed with unlabeled probe to increase the amount of primer extended per bead; H.) Any remaining unextended primer is capped by polymerase incorporation of a ddNTP to prevent dephasing; beads are then imaged to record fluorophore reporter; I.) A phosphorothiolate bond in the ligated probes is cleaved with AgNO3, reducing the probe to 5 nucleotides and generating a free 3’ phosphate; J.) The 3’ phosphate is removed for the next round of ligation; K.) Additional cycles of ligation, capping, imaging, cleavage, and dephosphorylation are performed until the desired read length is obtained.
Primer
P1 Adapter P2 AdapterTemplate Sequence
P1 Adapter P2 AdapterTemplate Sequence
P1 Adapter P2 AdapterTemplate Sequence
Ligate
Cap Unextended 5’ PhosphateImage
P1 Adapter P2 AdapterTemplate Sequence
Cleave
P1 Adapter P2 AdapterTemplate Sequence
Repeat LigateCap, ImageCleave, etc.
Primer
P1 Adapter P2 AdapterTemplate Sequence
P1 Adapter P2 AdapterTemplate Sequence
Ligate
Cap Unextended 3’ HydroxylImage
P1 Adapter P2 AdapterTemplate Sequence
Cleave
P1 Adapter P2 AdapterTemplate Sequence
P1 Adapter P2 AdapterTemplate Sequence
Dephosphorylate
P1 Adapter P2 AdapterTemplate Sequence
Repeat LigateCap, Image, CleaveDephosphorylate, etc.
Forward Sequencing Reverse Sequencing
0
20
40
60
80
100
120
140
25mer (F5) 35mer (F5) 50mer (F3) 25 x 50 35 x 50
Estim
ated
Gig
abas
es G
ener
ated
per
Run
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
0 25 50 75 100 125 150 175 200 225 250 275 300Size (bases)
Cou
nt
mbead1µmbead
F3 Adapter F5 AdapterDNA Fragment ~ 200 Bases
F5: 35 bp
5’ 3’
F3: 75bpC1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15
C1C2C3C4C5C6C7
Schematic of Paired End SOLiD™ Sequencing
Figure 1. Paired end sequencing on the SOLiD™ platform. Priming from the 5’ end of the template for forward reads (F3 Adapter) and from the 3’ end for the reverse reads (F5 adapter) of a standard fragment library. Novel reverse ligation chemistry enables paired end sequencing without requiring synthesis of the complementary strand of the DNA template.
C1 C2 C3 C4 C5 C6 C7
C1 C2 C3 C4 C5 C6 C7
A F
B G
C H
D I
E J
K
Reverse Sequencing E coli DH10b “Satay” Plots
Figure 2. “Satay” Plots for an E. coli DH10b 35 base reverse sequencing run. These plots show the spectral quality and intensity of the sample. The axes correspond to the 4 different fluorochromes used in SOLiD™ sequencing: FAM, CY3, TXR, CY5. Each dot on the plot represents the fluorescent wavelength and intensity of multiple copies of the bead bound DNA template. Beads that fall on or near an axis are monoclonal (i.e. they contain multiple copies of a single DNA template), and beads that are far from the origin are high intensity beads.
Figure 3. Matching Statistics for paired end sequencing of an E. coli DH10b genome. For reverse sequencing (F5), 65% of the reads matched the genome for 35 bases allowing up to 4 mismatches; 77% of the reads matched the genome for 25 bases allowing up to 3 mismatches. For forward sequencing (F3) 76% of the reads matched the genome for 50 bases allowing up to 6 mismatches.
Matching Statistics for Paired End Sequencing
Paired End Sequencing Throughput
Figure 4. Estimated throughput of paired end sequencing based on SOLiD™ 4 bead densities. A full sequencing run can generate 28 Gigabases of data for 25mers (reverse reads), 35 Gigabases of data for 35mers (reverse reads), and 57 Gigabases of data for 50mers (forward reads). A 25mer (reverse) x 50mer (forward) can generate up to 86 Gigabases of data, and up to 120 Gigabases for 35mer (reverse) x 50mer (forward).
DNA Template Size Distribution
Figure 5. Size distribution of DNA template determined from paired reads.
Paired End Sequencing of the Human Genome
Reverse Sequencing Human “Satay” Plots
Figure 6. “Satay” Plots for a 25 base reverse sequencing run of a human genome.
Distribution of Small Insertions Distribution of Small Deletions
Human Genome Distribution of Coverage Average Coverage per Chromosome
Figure 7. Distribution of coverage of uniquely placed paired reads across the human genome. The average coverage was 14.5x.
Summary of SNPs/Indels
Table 1. Pairing rates for 25mer (reverse read) x 50mer (forward read) and 35mer (reverse read) x 50mer (forward read).
Table 2. Summary of identifiedSNPs and small indels, and theirconcordance with dbSNP.
* stringent indel calling conditions
P1
P2
P3
P4
P5
99.05%35 x 50
98.94%25 x 50
Pairing Rate
Read Length
99.05%35 x 50
98.94%25 x 50
Pairing Rate
Read Length
E coli DH10b Pairing Rates
C1 C2 C3 C4 C5C1 C2 C3 C4 C5C1 C2 C3 C4 C5C1 C2 C3 C4 C5C1 C2 C3 C4 C5
P1
P2
P3
P4
P5
DNA extracted from buccal swab
Constructed paired end library using 1 µg of DNA 1 day library protocol
2 SOLiD Sequencing Runs 50 bp Forward Reads 25 bp Reverse Reads
Generated over 88 Gigabases of aligned reads
ACKNOWLEDGEMENTS
Christopher Clouser Nick Fantin Rachel Kasinskas Bashar Mullah Jessica Spangler
Brittney Coleman Tamara Gilbert Mike Laptewicz Dick Noble Dean Tsou
Gina Costa Sherry Hansen Clarence Lee Ken Otteson Tristen Weaver
Tenzin Dawoe Stephen Hendricks Liz Levandowsky Vaishnavi Panchapakesa Allen Wong
Dave Dupont Jeffrey Ichikawa Stephen McLaughlin Andrew Sheridan Joon Yang
Count % in dbSNPTotal SNPS 2972853 82.16%Heterozygous SNPs 1963475 73.70%Homozygous SNPs 1213219 94.42%Small Indels * 103027 73.80%
Figure 9. Distribution of small insertions (1-3 bases); 76.9%were in dbSNP.
Figure 10. Distribution of small deletions (1-11 bp); 71.2% were in dbSNP.
Figure 8. Average coverage for each chromosome. Coverage ranged from 14.7x to 17.4x for the autosomal chromosomes.
Eileen T. Dimalanta1, Lei Zhang1, Lele Sun1, Kara S. Eusko1, Dalia M. Dhingra1, Damon S. Perez1, Michael R. Lyons1, Adam E. Sannicandro2, Jonathan M. Manning1, Heather E. Peckham1, Eric Tsung1, Steve M. Menchen2, Alan P. Blanchard1, and Kevin J. McKernan11Life Technologies, 500 Cummings Center, Suite 2400, Beverly, MA 01915 2Life Technologies, 850 Lincoln Centre Drive, Foster City, CA 94404
TRADEMARKS/LICENSING© 2010 Life Technologies Corporation. All rights reserved.
The trademarks mentioned herein are the property of Life Technologies Corporation or their respective owners.
Cy3 and Cy5 are registered trademarks of GE Healthcare.