Poster PAG2013 GreaterThan10kbReads

Figure 2: Example of a

figure caption

Pacific Biosciences, PacBio, SMRT, SMRTbell and the Pacific Biosciences logo are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2013 Pacific Biosciences of California, Inc. All rights reserved.

Greater than 10 kb Read Lengths Routine when

Sequencing with Pacific Biosciences’ XL Release

Cheryl Heiner, Primo Baybayan, Susana Wang, Yan Guo, Meredith Ashby, Joan Wilson, Kevin Travers, Jason Chin, and Jason Underwood

Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025

PacBio’s SMRT® Sequencing produces the longest read

lengths of any sequencing technology currently available.

There have been a number of recent improvements to

further extend the length of PacBio® RS reads. With an

exponential read length distribution, there are many reads

greater than 10 kb, and some reads at or beyond 20 kb.

These improvements include library prep methods for

generating >10 kb libraries, a new XL polymerase, magnetic

bead loading, stage start, new XL sequencing kits, and

increasing data collection time to 120 minutes per SMRT

Cell. Each of these features will be described, with data

illustrating the associated gains in performance.

With these developments, we are able to obtain greatly

improved and, in some cases, completed assemblies for

genomes that have been considered impossible to

assemble in the past, because they include repeats or low

complexity regions spanning many kilobases. Long read

lengths are valuable in other areas as well. In a single read,

we can obtain sequence covering an entire viral segment,

read through multi-kilobase amplicons with expanded

repeats, and identify splice variants in long, full-length cDNA

sequences. Examples of these applications will be shown.

Introduction Applications of SMRT® Sequencing

Very Large Insert SMRTbell™ Library Prep

Key steps in preparing very large insert libraries

Extended collection times maximize read length or throughput

Conclusion

Recent Developments in SMRT® Sequencing

New XL polymerase extends read lengths, while

maintaining high consensus accuracy

Combination of new features yields long subreads, some beyond 10 kb:

Sequencing full-length cDNA transcripts

Recent improvements in SMRT® Sequencing provide a wide

range of options, including the capability to sequence over 10 kb

fragments in a single read, enabling the sequencing community to

answer biological questions at a level never before possible.

Ideal sample

10kb→

20kb→

30kb→

5kb→

Not ideal

10kb→

20kb→

30kb→

5kb→

10 11 12 13 14

←10kb

←20kb

←30kb

Samples:

10. K12 gDNA (dil.11/1/2012)

11. K12 shear, regular g-TUBE, 5500 rpm, 50 µL @ 100 ng/µL




Lane 11 = 18.8 kbp

Lane 14 = 30.5 kbp

Left, ideal sample, nearly all high molecular weight; right, sample has high molecular

weight band, but shorter fragments will dominate loading and sequence data

Start with high quality input DNA: pulsed-field gel QC

Shearing to 10-20 kb: Covaris® g-TUBE® devices

Results from varying spin speed with g-TUBE fragmentation using the Eppendorf®

MiniSpin® plus. The lower the speed, the larger the size, but also the more likely

sample will remain in the upper reservoir and be lost or not sheared.

Fragment size decrease post shearing due to handling

during library prep; gentle handling helps but does not

eliminate this issue

Converting to SMRTbell™ libraries: large DNA fragments are

fragile

Samples:

1. Input E. coli K12 gDNA

2. Sheared E. coli K12 gDNA

3. E. coli K12 SMRTbell Library

1 2 3 Shear = 22.1 kbp

Library = 16.1 kbp

Stage start for longer subread lengths

Cell Prep Station Start Coverage Stage Start Coverage

Sequencing the 9,749 bp HIV genome

Left, cell prep station start excludes first and last 1000 bases.

Right, stage start increases coverage range nearly to ends of genome. Along with XL

polymerase and 120 minute movies, the entire genome can be covered in a single read.

Workflow for full-length cDNA sequencing

Detection of novel splice forms of a cyclin-dependent kinase

polyA+ RNA

PCR

Optimization

SMARTer

PCR cDNA

Agarose Size

Selection: <1kb,

1-2kb, 2-3kb, >3kb

Large Scale

PCR

SuperScript Full

Length cDNA

SMRTbell™ Template

Preparation

Total RNA

PacBio’s draft cDNA sequencing protocol is now

available as a Shared Protocol on SampleNet:

http://www.smrtcommunity.com/Share/Protocol/List

Chicken transcript library: full-pass subreads

correspond with full-length reference sequences

>10 kb read joins 17 contigs

Example from Gbase genome assembly project

Very long inserts can join regions of long repeats, greatly improving problematic assemblies.

For more information on assembly methods, see poster P0998, Towards Finished Genome

Assemblies using SMRT® Sequencing .

PacBio variants confirmed by PCR-Sanger

Diffusion Loading MagBead Loading

Magnetic bead loading for more efficient sample utilization,

removal of small fragments with large insert libraries

Problematic sample with many

small fragments <1 kb

Fragments <1 kb are

excluded with MagBead

loading

Sequence through 12-base homopolymer

Many reads span entire multi-kb transcripts

Sequencing through >2000 bases of pure CGG repeats

Collaboration with UC Davis:

Expanded CCG-Repeat Alleles of the Fragile X Gene

Loomis et al. (2012) Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene.

Genome Research, accepted for publication.

Regions of long, difficult sequence context are covered in single reads

High consensus accuracy of >Q50 obtainable with PacBio sequencing

Clone Reference PacBio Sanger

BAC 1

T G G

T -- --

T A A

G T T

C T T

C T T

G C C

C G G

BAC 2

C T T

C G G

A G G

G A A

T C C

T C C

C T T

BAC 3

T -- --

T -- --

T C C

T C C

G T T

A C C

BAC 4

A G G

T C C

A T T

T -- --

T G G

22 indels and 4 SNPs in human BAC confirmed by

PCR-Sanger

400 Mb rice genome, CSHL,

17 kb library

Polymerase C2 C2 C2 XL

Loading Diffusion MBS MBS MBS

Input into DNA Repair 5 μg (minimum) 5 μg 1 μg (minimum) 1 μg (minimum)

15% recovery 750 ng 750 ng 150 ng 150 ng

Primer Annealing 5 nM 5 nM 0.8333 nM 0.8333 nM

Polymerase Binding 3 nM 3 nM 0.5 nM 0.5 nM

Loading (on cell) 150 pM 150 pM 10 pM 5.5 pM

Total # SMRT Cells 52 (with reuse) 184 (no reuse) 36 (no reuse) 68 (no reuse)

• >10 kb library prep recommendations

• XL polymerase, C2 sequencing chemistry

• 1 x 120 minute collection time

• Stage start

• MagBead loading

2 x 55 min movies

11 kb plasmidbell

1 x 120 min movie

Average: 4,200 bp

95th Percentile: 9,500 bp

Max: 13,000 bp

2 kb lambda library

120 minute movies maximize number of 10-20 kb reads

2 x 55 minute movies maximize total number of total reads and Mb / sample

Average: 4,500 bp

95th Percentile: 12,000 bp

Max: 21,000 bp

PacB

io R

eads

Subread lengths, plant and microbial libraries

Template input reduced, number of SMRT Cells increased with MagBead loading and XL polymerase

Acknowledgements

The authors would like to thank Jonathan Bingham, Kathryn Keho, Wendy Wise, Jenny Gu, and the

many contributors in the PacBio community, including CSHL, UC Davis, and U Washington.

50% of sequence from

subreads >4800 bases

Fra

ction o

f s

equence fro

m s

ubre

ads >

_x_

Read Length

Single-Pass Accuracy

Single-Pass Accuracy

XL/ C2 C2 /C2

High consensus accuracy due to randomness of errors in individual reads

# o

f subre

ads p

er

SM

RT

Cell

Consensus Accuracy

10kb libraries



Poster PAG2013 GreaterThan10kbReads

Documents

Transcript of Poster PAG2013 GreaterThan10kbReads