Poster PAG2013 GreaterThan10kbReads

1
Figure 2: Example of a figure caption Pacific Biosciences, PacBio, SMRT, SMRTbell and the Pacific Biosciences logo are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2013 Pacific Biosciences of California, Inc. All rights reserved. Greater than 10 kb Read Lengths Routine when Sequencing with Pacific Biosciences’ XL Release Cheryl Heiner, Primo Baybayan, Susana Wang, Yan Guo, Meredith Ashby, Joan Wilson, Kevin Travers, Jason Chin, and Jason Underwood Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025 PacBio’s SMRT ® Sequencing produces the longest read lengths of any sequencing technology currently available. There have been a number of recent improvements to further extend the length of PacBio ® RS reads. With an exponential read length distribution, there are many reads greater than 10 kb, and some reads at or beyond 20 kb. These improvements include library prep methods for generating >10 kb libraries, a new XL polymerase, magnetic bead loading, stage start, new XL sequencing kits, and increasing data collection time to 120 minutes per SMRT Cell. Each of these features will be described, with data illustrating the associated gains in performance. With these developments, we are able to obtain greatly improved and, in some cases, completed assemblies for genomes that have been considered impossible to assemble in the past, because they include repeats or low complexity regions spanning many kilobases. Long read lengths are valuable in other areas as well. In a single read, we can obtain sequence covering an entire viral segment, read through multi-kilobase amplicons with expanded repeats, and identify splice variants in long, full-length cDNA sequences. Examples of these applications will be shown. Introduction Applications of SMRT ® Sequencing Very Large Insert SMRTbell™ Library Prep Key steps in preparing very large insert libraries Extended collection times maximize read length or throughput Conclusion Recent Developments in SMRT ® Sequencing New XL polymerase extends read lengths, while maintaining high consensus accuracy Combination of new features yields long subreads, some beyond 10 kb: Sequencing full-length cDNA transcripts Recent improvements in SMRT ® Sequencing provide a wide range of options, including the capability to sequence over 10 kb fragments in a single read, enabling the sequencing community to answer biological questions at a level never before possible. Ideal sample 10kb→ 20kb→ 30kb→ 5kb→ Not ideal 10kb→ 20kb→ 30kb→ 5kb→ 10 11 12 13 14 ←10kb ←20kb ←30kb Samples: 10. K12 gDNA (dil.11/1/2012) 11. K12 shear, regular g-TUBE, 5500 rpm, 50 μL @ 100 ng/μL 12. K12 shear, regular g-TUBE, 5000 rpm, 50 μL @ 100 ng/μL 13. K12 shear, regular g-TUBE, 4500 rpm, 50 μL @ 100 ng/μL 14. K12 shear, regular g-TUBE, 4000 rpm, 50 μL @ 100 ng/μL Lane 11 = 18.8 kbp Lane 14 = 30.5 kbp Left, ideal sample, nearly all high molecular weight; right, sample has high molecular weight band, but shorter fragments will dominate loading and sequence data Start with high quality input DNA: pulsed-field gel QC Shearing to 10-20 kb: Covaris ® g-TUBE ® devices Results from varying spin speed with g-TUBE fragmentation using the Eppendorf ® MiniSpin ® plus. The lower the speed, the larger the size, but also the more likely sample will remain in the upper reservoir and be lost or not sheared. Fragment size decrease post shearing due to handling during library prep; gentle handling helps but does not eliminate this issue Converting to SMRTbell™ libraries: large DNA fragments are fragile Samples: 1. Input E. coli K12 gDNA 2. Sheared E. coli K12 gDNA 3. E. coli K12 SMRTbell Library 1 2 3 Shear = 22.1 kbp Library = 16.1 kbp Stage start for longer subread lengths Cell Prep Station Start Coverage Stage Start Coverage Sequencing the 9,749 bp HIV genome Left, cell prep station start excludes first and last 1000 bases. Right, stage start increases coverage range nearly to ends of genome. Along with XL polymerase and 120 minute movies, the entire genome can be covered in a single read. Workflow for full-length cDNA sequencing Detection of novel splice forms of a cyclin-dependent kinase polyA+ RNA PCR Optimization SMARTer PCR cDNA Agarose Size Selection: <1kb, 1-2kb, 2-3kb, >3kb Large Scale PCR SuperScript Full Length cDNA SMRTbell™ Template Preparation Total RNA PacBio’s draft cDNA sequencing protocol is now available as a Shared Protocol on SampleNet: http://www.smrtcommunity.com/Share/Protocol/List Chicken transcript library: full-pass subreads correspond with full-length reference sequences >10 kb read joins 17 contigs Example from Gbase genome assembly project Very long inserts can join regions of long repeats, greatly improving problematic assemblies. For more information on assembly methods, see poster P0998, Towards Finished Genome Assemblies using SMRT ® Sequencing . PacBio variants confirmed by PCR-Sanger Diffusion Loading MagBead Loading Magnetic bead loading for more efficient sample utilization, removal of small fragments with large insert libraries Problematic sample with many small fragments <1 kb Fragments <1 kb are excluded with MagBead loading Sequence through 12-base homopolymer Many reads span entire multi-kb transcripts Sequencing through >2000 bases of pure CGG repeats Collaboration with UC Davis: Expanded CCG-Repeat Alleles of the Fragile X Gene Loomis et al. (2012) Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene. Genome Research, accepted for publication. Regions of long, difficult sequence context are covered in single reads High consensus accuracy of >Q50 obtainable with PacBio sequencing Clone Reference PacBio Sanger BAC 1 T G G T -- -- T A A G T T C T T C T T G C C C G G BAC 2 C T T C G G A G G G A A T C C T C C C T T BAC 3 T -- -- T -- -- T C C T C C G T T A C C BAC 4 A G G T C C A T T T -- -- T G G 22 indels and 4 SNPs in human BAC confirmed by PCR-Sanger 400 Mb rice genome, CSHL, 17 kb library Polymerase C2 C2 C2 XL Loading Diffusion MBS MBS MBS Input into DNA Repair 5 μg (minimum) 5 μg 1 μg (minimum) 1 μg (minimum) 15% recovery 750 ng 750 ng 150 ng 150 ng Primer Annealing 5 nM 5 nM 0.8333 nM 0.8333 nM Polymerase Binding 3 nM 3 nM 0.5 nM 0.5 nM Loading (on cell) 150 pM 150 pM 10 pM 5.5 pM Total # SMRT Cells 52 (with reuse) 184 (no reuse) 36 (no reuse) 68 (no reuse) >10 kb library prep recommendations XL polymerase, C2 sequencing chemistry 1 x 120 minute collection time Stage start MagBead loading 2 x 55 min movies 11 kb plasmidbell 1 x 120 min movie Average: 4,200 bp 95 th Percentile: 9,500 bp Max: 13,000 bp 2 kb lambda library 120 minute movies maximize number of 10-20 kb reads 2 x 55 minute movies maximize total number of total reads and Mb / sample Average: 4,500 bp 95 th Percentile: 12,000 bp Max: 21,000 bp PacBio Reads Subread lengths, plant and microbial libraries Template input reduced, number of SMRT Cells increased with MagBead loading and XL polymerase Acknowledgements The authors would like to thank Jonathan Bingham, Kathryn Keho, Wendy Wise, Jenny Gu, and the many contributors in the PacBio community, including CSHL, UC Davis, and U Washington. 50% of sequence from subreads >4800 bases Fraction of sequence from subreads >_x_ Read Length Single-Pass Accuracy Single-Pass Accuracy XL/ C2 C2 /C2 High consensus accuracy due to randomness of errors in individual reads # of subreads per SMRT Cell Consensus Accuracy 10kb libraries

description

AbstractWe have designed and implemented a system thatpermits the measurement of network Quality of Service(QoS) parameters. This system allows us to objectivelyevaluate the requirements of network applications fordelivering user acceptable quality. We use FastEthernet tapsto monitor full-duplex traffic and programmable networkinterface cards to extract all the information needed tocompute the network QoS parameters: latency, jitter, packetloss and throughput. The measurement system makes use ofa global clock to synchronize the time measurements indifferent points of the network.We have employed this system to evaluate theperformance of several network devices and study thebehaviour of real network applications, such as file transferand voice over IP. For these applications user perceivedquality (UPQ) metrics have been defined in order to assesstheir QoS requirements. Since we measure simultaneouslynetwork QoS and application UPQ, we are able to correlatethem. Determining application requirements has two mainuses: (i) to predict UPQ for an application running over agiven network based on the corresponding measured QoSparameters and understand the causes of application failure;(ii) to design/configure networks that provide the necessar

Transcript of Poster PAG2013 GreaterThan10kbReads

Page 1: Poster PAG2013 GreaterThan10kbReads

Figure 2: Example of a

figure caption

Pacific Biosciences, PacBio, SMRT, SMRTbell and the Pacific Biosciences logo are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2013 Pacific Biosciences of California, Inc. All rights reserved.

Greater than 10 kb Read Lengths Routine when

Sequencing with Pacific Biosciences’ XL Release

Cheryl Heiner, Primo Baybayan, Susana Wang, Yan Guo, Meredith Ashby, Joan Wilson, Kevin Travers, Jason Chin, and Jason Underwood

Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025

PacBio’s SMRT® Sequencing produces the longest read

lengths of any sequencing technology currently available.

There have been a number of recent improvements to

further extend the length of PacBio® RS reads. With an

exponential read length distribution, there are many reads

greater than 10 kb, and some reads at or beyond 20 kb.

These improvements include library prep methods for

generating >10 kb libraries, a new XL polymerase, magnetic

bead loading, stage start, new XL sequencing kits, and

increasing data collection time to 120 minutes per SMRT

Cell. Each of these features will be described, with data

illustrating the associated gains in performance.

With these developments, we are able to obtain greatly

improved and, in some cases, completed assemblies for

genomes that have been considered impossible to

assemble in the past, because they include repeats or low

complexity regions spanning many kilobases. Long read

lengths are valuable in other areas as well. In a single read,

we can obtain sequence covering an entire viral segment,

read through multi-kilobase amplicons with expanded

repeats, and identify splice variants in long, full-length cDNA

sequences. Examples of these applications will be shown.

Introduction Applications of SMRT® Sequencing

Very Large Insert SMRTbell™ Library Prep

Key steps in preparing very large insert libraries

Extended collection times maximize read length or throughput

Conclusion

Recent Developments in SMRT® Sequencing

New XL polymerase extends read lengths, while

maintaining high consensus accuracy

Combination of new features yields long subreads, some beyond 10 kb:

Sequencing full-length cDNA transcripts

Recent improvements in SMRT® Sequencing provide a wide

range of options, including the capability to sequence over 10 kb

fragments in a single read, enabling the sequencing community to

answer biological questions at a level never before possible.

Ideal sample

10kb→

20kb→

30kb→

5kb→

Not ideal

10kb→

20kb→

30kb→

5kb→

10 11 12 13 14

←10kb

←20kb

←30kb

Samples:

10. K12 gDNA (dil.11/1/2012)

11. K12 shear, regular g-TUBE, 5500 rpm, 50 µL @ 100 ng/µL

12. K12 shear, regular g-TUBE, 5000 rpm, 50 µL @ 100 ng/µL

13. K12 shear, regular g-TUBE, 4500 rpm, 50 µL @ 100 ng/µL

14. K12 shear, regular g-TUBE, 4000 rpm, 50 µL @ 100 ng/µL

Lane 11 = 18.8 kbp

Lane 14 = 30.5 kbp

Left, ideal sample, nearly all high molecular weight; right, sample has high molecular

weight band, but shorter fragments will dominate loading and sequence data

Start with high quality input DNA: pulsed-field gel QC

Shearing to 10-20 kb: Covaris® g-TUBE® devices

Results from varying spin speed with g-TUBE fragmentation using the Eppendorf®

MiniSpin® plus. The lower the speed, the larger the size, but also the more likely

sample will remain in the upper reservoir and be lost or not sheared.

Fragment size decrease post shearing due to handling

during library prep; gentle handling helps but does not

eliminate this issue

Converting to SMRTbell™ libraries: large DNA fragments are

fragile

Samples:

1. Input E. coli K12 gDNA

2. Sheared E. coli K12 gDNA

3. E. coli K12 SMRTbell Library

1 2 3 Shear = 22.1 kbp

Library = 16.1 kbp

Stage start for longer subread lengths

Cell Prep Station Start Coverage Stage Start Coverage

Sequencing the 9,749 bp HIV genome

Left, cell prep station start excludes first and last 1000 bases.

Right, stage start increases coverage range nearly to ends of genome. Along with XL

polymerase and 120 minute movies, the entire genome can be covered in a single read.

Workflow for full-length cDNA sequencing

Detection of novel splice forms of a cyclin-dependent kinase

polyA+ RNA

PCR

Optimization

SMARTer

PCR cDNA

Agarose Size

Selection: <1kb,

1-2kb, 2-3kb, >3kb

Large Scale

PCR

SuperScript Full

Length cDNA

SMRTbell™ Template

Preparation

Total RNA

PacBio’s draft cDNA sequencing protocol is now

available as a Shared Protocol on SampleNet:

http://www.smrtcommunity.com/Share/Protocol/List

Chicken transcript library: full-pass subreads

correspond with full-length reference sequences

>10 kb read joins 17 contigs

Example from Gbase genome assembly project

Very long inserts can join regions of long repeats, greatly improving problematic assemblies.

For more information on assembly methods, see poster P0998, Towards Finished Genome

Assemblies using SMRT® Sequencing .

PacBio variants confirmed by PCR-Sanger

Diffusion Loading MagBead Loading

Magnetic bead loading for more efficient sample utilization,

removal of small fragments with large insert libraries

Problematic sample with many

small fragments <1 kb

Fragments <1 kb are

excluded with MagBead

loading

Sequence through 12-base homopolymer

Many reads span entire multi-kb transcripts

Sequencing through >2000 bases of pure CGG repeats

Collaboration with UC Davis:

Expanded CCG-Repeat Alleles of the Fragile X Gene

Loomis et al. (2012) Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene.

Genome Research, accepted for publication.

Regions of long, difficult sequence context are covered in single reads

High consensus accuracy of >Q50 obtainable with PacBio sequencing

Clone Reference PacBio Sanger

BAC 1

T G G

T -- --

T A A

G T T

C T T

C T T

G C C

C G G

BAC 2

C T T

C G G

A G G

G A A

T C C

T C C

C T T

BAC 3

T -- --

T -- --

T C C

T C C

G T T

A C C

BAC 4

A G G

T C C

A T T

T -- --

T G G

22 indels and 4 SNPs in human BAC confirmed by

PCR-Sanger

400 Mb rice genome, CSHL,

17 kb library

Polymerase C2 C2 C2 XL

Loading Diffusion MBS MBS MBS

Input into DNA Repair 5 μg (minimum) 5 μg 1 μg (minimum) 1 μg (minimum)

15% recovery 750 ng 750 ng 150 ng 150 ng

Primer Annealing 5 nM 5 nM 0.8333 nM 0.8333 nM

Polymerase Binding 3 nM 3 nM 0.5 nM 0.5 nM

Loading (on cell) 150 pM 150 pM 10 pM 5.5 pM

Total # SMRT Cells 52 (with reuse) 184 (no reuse) 36 (no reuse) 68 (no reuse)

• >10 kb library prep recommendations

• XL polymerase, C2 sequencing chemistry

• 1 x 120 minute collection time

• Stage start

• MagBead loading

2 x 55 min movies

11 kb plasmidbell

1 x 120 min movie

Average: 4,200 bp

95th Percentile: 9,500 bp

Max: 13,000 bp

2 kb lambda library

120 minute movies maximize number of 10-20 kb reads

2 x 55 minute movies maximize total number of total reads and Mb / sample

Average: 4,500 bp

95th Percentile: 12,000 bp

Max: 21,000 bp

PacB

io R

eads

Subread lengths, plant and microbial libraries

Template input reduced, number of SMRT Cells increased with MagBead loading and XL polymerase

Acknowledgements

The authors would like to thank Jonathan Bingham, Kathryn Keho, Wendy Wise, Jenny Gu, and the

many contributors in the PacBio community, including CSHL, UC Davis, and U Washington.

50% of sequence from

subreads >4800 bases

Fra

ction o

f s

equence fro

m s

ubre

ads >

_x_

Read Length

Single-Pass Accuracy

Single-Pass Accuracy

XL/ C2 C2 /C2

High consensus accuracy due to randomness of errors in individual reads

# o

f subre

ads p

er

SM

RT

Cell

Consensus Accuracy

10kb libraries