Poster PAG2013 GreaterThan10kbReads
-
Upload
peter1234u -
Category
Documents
-
view
214 -
download
0
description
Transcript of Poster PAG2013 GreaterThan10kbReads
![Page 1: Poster PAG2013 GreaterThan10kbReads](https://reader036.fdocuments.in/reader036/viewer/2022062518/563dbb70550346aa9aad2925/html5/thumbnails/1.jpg)
Figure 2: Example of a
figure caption
Pacific Biosciences, PacBio, SMRT, SMRTbell and the Pacific Biosciences logo are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2013 Pacific Biosciences of California, Inc. All rights reserved.
Greater than 10 kb Read Lengths Routine when
Sequencing with Pacific Biosciences’ XL Release
Cheryl Heiner, Primo Baybayan, Susana Wang, Yan Guo, Meredith Ashby, Joan Wilson, Kevin Travers, Jason Chin, and Jason Underwood
Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025
PacBio’s SMRT® Sequencing produces the longest read
lengths of any sequencing technology currently available.
There have been a number of recent improvements to
further extend the length of PacBio® RS reads. With an
exponential read length distribution, there are many reads
greater than 10 kb, and some reads at or beyond 20 kb.
These improvements include library prep methods for
generating >10 kb libraries, a new XL polymerase, magnetic
bead loading, stage start, new XL sequencing kits, and
increasing data collection time to 120 minutes per SMRT
Cell. Each of these features will be described, with data
illustrating the associated gains in performance.
With these developments, we are able to obtain greatly
improved and, in some cases, completed assemblies for
genomes that have been considered impossible to
assemble in the past, because they include repeats or low
complexity regions spanning many kilobases. Long read
lengths are valuable in other areas as well. In a single read,
we can obtain sequence covering an entire viral segment,
read through multi-kilobase amplicons with expanded
repeats, and identify splice variants in long, full-length cDNA
sequences. Examples of these applications will be shown.
Introduction Applications of SMRT® Sequencing
Very Large Insert SMRTbell™ Library Prep
Key steps in preparing very large insert libraries
Extended collection times maximize read length or throughput
Conclusion
Recent Developments in SMRT® Sequencing
New XL polymerase extends read lengths, while
maintaining high consensus accuracy
Combination of new features yields long subreads, some beyond 10 kb:
Sequencing full-length cDNA transcripts
Recent improvements in SMRT® Sequencing provide a wide
range of options, including the capability to sequence over 10 kb
fragments in a single read, enabling the sequencing community to
answer biological questions at a level never before possible.
Ideal sample
10kb→
20kb→
30kb→
5kb→
Not ideal
10kb→
20kb→
30kb→
5kb→
10 11 12 13 14
←10kb
←20kb
←30kb
Samples:
10. K12 gDNA (dil.11/1/2012)
11. K12 shear, regular g-TUBE, 5500 rpm, 50 µL @ 100 ng/µL
12. K12 shear, regular g-TUBE, 5000 rpm, 50 µL @ 100 ng/µL
13. K12 shear, regular g-TUBE, 4500 rpm, 50 µL @ 100 ng/µL
14. K12 shear, regular g-TUBE, 4000 rpm, 50 µL @ 100 ng/µL
Lane 11 = 18.8 kbp
Lane 14 = 30.5 kbp
Left, ideal sample, nearly all high molecular weight; right, sample has high molecular
weight band, but shorter fragments will dominate loading and sequence data
Start with high quality input DNA: pulsed-field gel QC
Shearing to 10-20 kb: Covaris® g-TUBE® devices
Results from varying spin speed with g-TUBE fragmentation using the Eppendorf®
MiniSpin® plus. The lower the speed, the larger the size, but also the more likely
sample will remain in the upper reservoir and be lost or not sheared.
Fragment size decrease post shearing due to handling
during library prep; gentle handling helps but does not
eliminate this issue
Converting to SMRTbell™ libraries: large DNA fragments are
fragile
Samples:
1. Input E. coli K12 gDNA
2. Sheared E. coli K12 gDNA
3. E. coli K12 SMRTbell Library
1 2 3 Shear = 22.1 kbp
Library = 16.1 kbp
Stage start for longer subread lengths
Cell Prep Station Start Coverage Stage Start Coverage
Sequencing the 9,749 bp HIV genome
Left, cell prep station start excludes first and last 1000 bases.
Right, stage start increases coverage range nearly to ends of genome. Along with XL
polymerase and 120 minute movies, the entire genome can be covered in a single read.
Workflow for full-length cDNA sequencing
Detection of novel splice forms of a cyclin-dependent kinase
polyA+ RNA
PCR
Optimization
SMARTer
PCR cDNA
Agarose Size
Selection: <1kb,
1-2kb, 2-3kb, >3kb
Large Scale
PCR
SuperScript Full
Length cDNA
SMRTbell™ Template
Preparation
Total RNA
PacBio’s draft cDNA sequencing protocol is now
available as a Shared Protocol on SampleNet:
http://www.smrtcommunity.com/Share/Protocol/List
Chicken transcript library: full-pass subreads
correspond with full-length reference sequences
>10 kb read joins 17 contigs
Example from Gbase genome assembly project
Very long inserts can join regions of long repeats, greatly improving problematic assemblies.
For more information on assembly methods, see poster P0998, Towards Finished Genome
Assemblies using SMRT® Sequencing .
PacBio variants confirmed by PCR-Sanger
Diffusion Loading MagBead Loading
Magnetic bead loading for more efficient sample utilization,
removal of small fragments with large insert libraries
Problematic sample with many
small fragments <1 kb
Fragments <1 kb are
excluded with MagBead
loading
Sequence through 12-base homopolymer
Many reads span entire multi-kb transcripts
Sequencing through >2000 bases of pure CGG repeats
Collaboration with UC Davis:
Expanded CCG-Repeat Alleles of the Fragile X Gene
Loomis et al. (2012) Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene.
Genome Research, accepted for publication.
Regions of long, difficult sequence context are covered in single reads
High consensus accuracy of >Q50 obtainable with PacBio sequencing
Clone Reference PacBio Sanger
BAC 1
T G G
T -- --
T A A
G T T
C T T
C T T
G C C
C G G
BAC 2
C T T
C G G
A G G
G A A
T C C
T C C
C T T
BAC 3
T -- --
T -- --
T C C
T C C
G T T
A C C
BAC 4
A G G
T C C
A T T
T -- --
T G G
22 indels and 4 SNPs in human BAC confirmed by
PCR-Sanger
400 Mb rice genome, CSHL,
17 kb library
Polymerase C2 C2 C2 XL
Loading Diffusion MBS MBS MBS
Input into DNA Repair 5 μg (minimum) 5 μg 1 μg (minimum) 1 μg (minimum)
15% recovery 750 ng 750 ng 150 ng 150 ng
Primer Annealing 5 nM 5 nM 0.8333 nM 0.8333 nM
Polymerase Binding 3 nM 3 nM 0.5 nM 0.5 nM
Loading (on cell) 150 pM 150 pM 10 pM 5.5 pM
Total # SMRT Cells 52 (with reuse) 184 (no reuse) 36 (no reuse) 68 (no reuse)
• >10 kb library prep recommendations
• XL polymerase, C2 sequencing chemistry
• 1 x 120 minute collection time
• Stage start
• MagBead loading
2 x 55 min movies
11 kb plasmidbell
1 x 120 min movie
Average: 4,200 bp
95th Percentile: 9,500 bp
Max: 13,000 bp
2 kb lambda library
120 minute movies maximize number of 10-20 kb reads
2 x 55 minute movies maximize total number of total reads and Mb / sample
Average: 4,500 bp
95th Percentile: 12,000 bp
Max: 21,000 bp
PacB
io R
eads
Subread lengths, plant and microbial libraries
Template input reduced, number of SMRT Cells increased with MagBead loading and XL polymerase
Acknowledgements
The authors would like to thank Jonathan Bingham, Kathryn Keho, Wendy Wise, Jenny Gu, and the
many contributors in the PacBio community, including CSHL, UC Davis, and U Washington.
50% of sequence from
subreads >4800 bases
Fra
ction o
f s
equence fro
m s
ubre
ads >
_x_
Read Length
Single-Pass Accuracy
Single-Pass Accuracy
XL/ C2 C2 /C2
High consensus accuracy due to randomness of errors in individual reads
# o
f subre
ads p
er
SM
RT
Cell
Consensus Accuracy
10kb libraries