Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection...

20
Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of Medicine at Mount Sinai Icahn Institute for Genomics & Multi-scale Biology @IcahnInstitute Applications in Clinical Genomics

Transcript of Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection...

Page 1: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Optimizing Blue Pippin Size-

Selection for Increased SubRead

Lengths on the RSII

Robert P. Sebra, Ph.D.

Icahn School of Medicine at Mount Sinai

Icahn Institute for Genomics & Multi-scale Biology

@IcahnInstitute

Applications in Clinical Genomics

Page 2: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

An Integrated Omics Approach

Page 3: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Requires a Multi-faceted Sequencing Technology Sandbox

▶ Microarray Genotyping and

Expression

• 1 x Illumina Bead Array platform

▶ Liquid Handling Automation

• 1 x Agilent Bravo

• 2 x Tecan EBO

• 1 x Beckman FX Robot

▶ Local IT Infrastructure

• 100 Tb mirrored primary

storage

• 1,500 Tb secondary storage

Multi-Platform DNA Sequencing

First Generation

• 1 x Applied Biosystems 3730xl

Second Generation

• 5 x Illumina HiSeq 2500

• 1 x MiSeq

• 1 x Ion Proton

• 2 x Ion PGM

Third Generation

• 2 x PacBio RS II

Page 4: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

The SMRT Sequencing Program at Mount Sinai GCF

Human Sequencing

-Complex genetic loci (TR,

CNV, rearrangements, etc.)

-Exploring intronic space

-Clinical validation

-Allelic phasing

-High resolution genotyping

-Targeted sequencing

Infectious Disease

-Hospital surveillance

-Rapid microbial finishing

-Phasing plasmids

-Building phylogenies

-Metagenomics

-Understanding co-infection

Epigenetics

-mtDNA disease

-Discovering novel motifs

-Growth phase comparisons

-Virulence factors

-Oxidative or photochemical

damage associated w/ cancer

Basic Research

-Full length transcriptome

-Basic genomics research

-Novel Bifx pipelines

-Reducing DNA input

-Methods development for

targeted/capture capability

12-month Highlights

~1800 SMRTcells sequenced

in the past 14 months

-Throughput has increased

by ~4X, but as much as 10X

in some cases

-subRL has increased by

~2X, but up to 5X

RSII Upgrades

&

Sage Blue Pippin SS

Page 5: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Size Selection using the Blue Pippin Technology I

5

48502

10086

12220

Inp

ut

10

kb

sh

ear

Inp

ut

20

kb

sh

ear

>5kb >10kb

Shear to 10kb avg.

Size select >5kb

Shear to 20kb avg.

Size select >10kb

8271, 8614 Sheared

DNA

Pippin

High-Pass

Selected

1 5 10 15 20 25 30 35 40

[DNA]

Size (kb) PFGE view

Page 6: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

4 samples per run.

Typical runtimes for

DNA >10kb:

2-8 hours.

Size Selection using the Blue Pippin Technology II

Page 7: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Applications Requiring Blue Pippin Size Selection at Sinai

Large Insert Size Selection

(0.75% Cassette)

-Removal of fragments <10kb

-Purification of specific DNA (mtDNA)

-Selection of large plasmids

Short Insert Size Selection

(2.0% Cassette)

-Amplicon or digested DNA selection

(by size)

-Purification of short libraries

-Plasmid selection < 10kb

Removal of <10kb

library

Selected 225-

275bp

Maximize library size

for longest N50 subreads Sample purity

Page 8: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

2kb SMRTbell Library Construction

Obtain tissue, cells, or swab for generating gDNA

Culture colony to generate the

appropriate mass

Isolate 6ug gDNA (various methods)

OR

Use 5ug gDNA or Shear to ~20kb

Shear 1ug to 2kb-6kb

Tet Convert, if desired

10ng to WGA if control needed

Blue Pippin Size Select if needed

Sequencing using 20 pM P4-C2, 120’ MB collection

20kb+ SMRTbell Library Construction

Blue Pippin Size Select at >7 or 10kb

Sequencing using 50 pM P4-C2, 120’ MB collection

Option: Mix w/ Plasmid Library

HGAP De Novo Assembly &

Variant Analysis Plasmids Sequenced and/or

Base Modification Profile

Standard Pipeline

Plasmid (<10kb) Pipeline

Pipelines

Page 9: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Blue Pippin Size Selection Performance: Yield & Capacity

2 SMRTcells

10 SMRTcells

40 SMRTcells

100 SMRTcells

-Only 1 SS library has insufficient yield for 2 chips

-Average % yield (by mass) = 24%

-Yield is dependent on input DNA size distribution but

parameters can be adjusted accordingly (7kb vs 10kb)

Tips for Maximizing Yield:

1. Use DNA isolation

techniques with gentle steps

(Qiagen Tip works well)

2. Conduct AMPure prior to

shearing to remove small

fragments for true

assessment of input mass.

3. Remove any bubbles in

cassette by tapping, etc.

4. Wash all loading & elution

wells with E-buffer prior to

placing sample

5. If bubble is in well, don’t

use that well.

6. After sample collection,

rinse all elution wells w/

40uL of E-buffer to collect

DNA

67 BPSS libraries

Page 10: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Blue Pippin Size Selection Impact on SubRL Distributions

~12.5kb ~5kb

w/o BPSS:

95th % ~ 12,400bp

N50 ~ 5000bp

Best Read ~21,000bp

~240Mb mapped

w/ BPSS:

95th % ~ 19,700bp

N50 ~ 12,500bp

Best Read ~34,500bp

~325Mb mapped /

cell

2 SMRTcells

1 SMRTcell

subRL Threshold

%

Seq

uen

ced

Bas

es

Impact of BPSS on N50 Length (MRSA)

Page 11: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Rapid & Cost Effective Infectious Disease Surveillance

-Main culprit in healthcare-associated infections

-Regulatory agencies now require mandatory

reporting, reimbursements held

-At Sinai, each positive blood stream culture

requires a consult due to:

High mortality rates

High treatment failure rates/relapses

Drug resistance

-Accounts for nearly half of all inpatient

Infectious Diseases consultations

Patient Cultures

Drawn

+ Cultures inoculated

on agar plates in

MicroLab

Colonies identified by

routine MicroLab

techniques

Colonies streaked for

single colony & single

colony grown

Cells harvested, spun,

and lysed for DNA

isolation

DNA isolated using

Qiagen DNATip (or

similar kit)

gDNA QC & sample

preparation

(previously shown)

PacBio RSII

sequencing & HGAP

assembly

~ 48 hours & <$300

Pipeline: From Bedside to Sequencing

Example: MRSA

Page 12: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Examples of Microbial Assemblies after BPSS / RSII

# of

SMTcells # of Reads Mean

subRL

95th %

RL

Coverage

Assembled

Bases

N50 of

Contigs

Largest

Contig

#

Contigs

MRSA 1 1 50,000 2,986 12,448 83 X 2.8 Mb 47,597 173,012 87

MRSA 2 1 55,010 3,787 13,038 88 X 2.98 Mb 300,802 619,820 24

MRSA 1’ 2 79,032 7,088 19,680 110 X 2.90 Mb 1.96 Mb 1.96 Mb 11

MRSA 3 2 22,106 5,787 15,959 40 X 2.92 Mb 2.92 Mb 2.92 Mb 2

MRSA 4 2 42,253 7,485 20,771 105 X 2.94 Mb 2.94 Mb 2.94 Mb 1

Bacteria 1 4 197,239 5,507 14,785 239 X 4.75 Mb 4.73 Mb 4.73 Mb 3

Bacteria 2 4 267,140 5,424 14,795 355 X 4.53 Mb 4.53 Mb 4.53 Mb 1

Bacteria 3 4 187,322 4,868 13,940 210 X 4.83 Mb 3.46 Mb 3.46 Mb 10

Bacteria 4 4 168,612 5,578 13,981 204 X 4.78 Mb 799,485 2.09 Mb 18

Bacteria 5 4 192,349 5,758 13,909 238 X 4.71 Mb 4.69 Mb 4.69 Mb 2

-Complete microbial genomes in as little as 1-4 SMRT Cells

-Some contigs may be plasmids

-MRSA 1 is reduced from 87 to 11 contigs w/ BPSS (Others to a single contig)

-DNA quality (from bead beating conditions and/or prep) is important

-All assemblies stats from HGAP 2.0.x (no partial alignments)

Page 13: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

10X Mapped Coverage of Human Genome NA12878

454 Reads

Number of reads ~100M

Mapped coverage ~15X

Single and Paired End Illumina Reads

Number of reads ~100s of M

Mapped coverage ~30X

PacBio Reads

Number of Reads ~12M

Mapped coverage 10X+

Mean sub-read length 2,766

Mean unrolled read length 4,066

95th Percentile 11,630

Accuracy (error-corrected reads)

>99%

Page 14: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Hard to Sequence Human Genes w/ Disease Association

Panel of genes we’re interested in

for inherited disease screening

Genes involving TNR expansions

Page 15: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

CACNA1A gene TNRs Associated w/ Spinocerebellar Ataxia 6

CAG ACC

Calcium channel, voltage-

dependent, P/Q type, alpha

1A subunit gene

chr19:13318673-13318711

chr19:13319695-13319721

~10-15X reads span both

TNRs

Arrow denotes read spanning exons to link

separate TNRs

Variants sssociated with familial

hemiplegic migraine and episodic ataxia

Page 16: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Long Reads Suggest Under Representation of TR Spans in

Reference

Reference suggests

bias towards repeat

length compression

Page 17: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Impact of BPSS & Longer ReadLength Chemistry on NA12878

CLR Means w/ & w/o Blue Pippin Size Select

Mean sub-RL w/o Pippin SS 2,766 bp

Mean sub-RL w/ Pippin SS 4,491 bp

95th Percentile w/o Pippin SS 11,630 bp

95th Percentile w/ Pippin SS 13,266 bp

~22Mb > 10kbp

~82Mb > 5kbp Fragments < 7kb eliminated

Size Select Protocol was set to select 7kb-50kb

Standard 10-20kb library

Size Selected 10-20kb library

-Blue Pippin electrophoretic size selection improves subRL by 62% by

removing DNA fragments < 7kb to avoid small molecule loading bias.

Page 18: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Impact of BPSS & Longer ReadLength Chemistry on NA12878

0 10000 20000 30000

0

0.25

0.50

0.75

1.0

SubRL (bp)

February

Min. : 40

1st Qu.: 974

Mean : 2165

3rd Qu.: 2739

Max. :21458

Present

Min. : 40

1st Qu.: 1907

Mean : 5520

3rd Qu.: 8211

Max. :34216

August

Min. : 40

1st Qu.: 1324

Mean : 4019

3rd Qu.: 5855

Max. :30474

Currently, we generate ~1X NA12878 coverage in 8-10 SMRTcells for ~$1000

SubRL Statistics

Page 19: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Thanks! Sinai Team

Deena Altman

Ali Bashir

Imane Bourgui

Gintaras Deikus

Andrew Kasarskis

Alona Keren-Paz

Milind Mahajan

Eric Schadt

Anne Schaefer

Harm van Bakel

Ajay Ummat

Cornell

Roger Altman

Russell Durrett

Chris Mason

CSHL

Eric Antoniou

Richard McCombie

Patricia Mocombe

NYGC/Rockefeller

Bob Darnell

NYU

Bo Shopsin

Pacbio

Jason Chin

Ellen Paxinos

PacBio Field Staff

Sage Science

Page 20: Optimizing Blue Pippin Size- Selection for Increased ... · Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Robert P. Sebra, Ph.D. Icahn School of

Stay Connected with Us!

20

@IcahnInstitute