SMRT Sequencing: Enter a New Realm of Genome, Epigenome...

For Research Use Only. Not for use in diagnostics procedures. © Copyright 2016 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo,

PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. All other trademarks are the sole property of their respective owners

SMRT Sequencing: Enter a New Realm of

Genome, Epigenome and Transcriptome

AnalysesZuwei Qian, Ph.D. HKU Customer Sharing Workshop Nov 27, 2017

AGENDA

-Case studies for human genome

sequencing

- Structural variation and human disease

- Targeted sequencing approaches

- Target capture by pull-down enrichment

- Pseudogene sequencing

- Sequencing low-complexity regions

- Full length RNA transcript sequencing

- Epigenetic modification

- PacBio development update

SINGLE MOLECULE, REAL-TIME (SMRT) DNA SEQUENCING

SMRT SEQUENCING CHARACTERISTICS单分子SMRT 测序特性

Long Reads长读取

- Average >10,000 bases

平均读长>10,000 base

High Consensus Accuracy 一致性准确率高

- >99.999%

Uniform, Unbiased Coverage 均一,无偏好覆盖

- Lack of GC% or sequence complexity bias

DNA Modification Detection 碱基修饰检测

- Epigenome characterization

描述表观基因组20 kb PacBio read length

250 bp Illumina read length

PACBIO RS II AND SEQUEL SYSTEMS

PacBio RS II Sequel

Machine Launch 2013 2015

Number ZMW per

Cell150,000 1,000,000

Output per Cell 0.5 – 1 Gb ~ 5-8 Gb

Input DNA per Cell ~ 250 ng ~ 250 ng

N50 Read Length 15 – 20 kb ~15 kb

Consensus accuracy QV50 QV50

CONSENSUS ACCURACY VS. RAW ACCURACY

ATCCGGAGCGACGCGTACGATTAAAGCACGTACTGCGTATGCGTATCCCTAGCTTGCTAGGCTAGTATGCTAGATTAAAGCTCGTACTGCG

ATCCGGATCGACGCGTACGATTAAAGCTCGTACTGCGTATGCGTATGCCTAACTAGATAGGCTAGTTTGCTAGATTAAAGCTCGTTCTGCG

ATCCGTATCGACACGCACGACTAAAGCTCGTACTGCATATGTGTATGCCTAGCTAGCTAGGATAGCATGCTAGATTAAAGCTCGTACTG

ATCCGGATCGCCGCGTATGATTAAAGCTCGTACCGCGTATGCGTATGCCCAGGTAGCTAGGCTAGTATGCTAGATTAAAGTTCGTACTGCG

ATTCGGATCGACGCGTACGATTAAAGCTCGTACTGCGCATGCGTATGCCTAGCTAGCTAGGCTAGTATTCTAGATTAAAGCTCGTAATGCG

ATCCGGATCTACGCGTACGATTAAAGCTAGTACTGCGTATGCGTTTGCCTATGTAGCTAGTCTAGTATGCTAGATTAAAGCTCGTACTGCG

ATCCGGATCGACGTGTACGATTATAGCTCTTACTGCGTATACGTATGCCTAGGTAGCTAGGCTAGTATGCTAGATTAAAGCTCGAACTT

ATCTGGATCGACGCGTACGATCAAAGCTCGTACTGTGTATGCGTATGCCTAGCTCGCTACGCTAGTATGCTCGATTATAGCTCGTACTGCG

ATCCGGATCGACGCGTACGATTAAAGCTCGTACTGCGTATGCGTATGCCTAGGTAGCTAGGCTAGTATGCTAGATTAAAGCTCGTACTGCG

ATCCGGTTCGAAGCGTACGTTTAAAGCTCGTACTACGTATGCGTATGTCTAGCTAGCTATGCTATTATGCTAGTTTAAAGCTCGTACTGCG

ATCCCGATCGACGCGTTCGATTAAAGCTCGTCCTGCGTATGCTTATGCCTAGGCAGCTAGGCTAGTATGCTAGATTAAAGCTCTTACTG

ATCCGGATCGGCGCGTACGATTAAAGCTCGTACTGCGGATGCGTATGCCTAGCTGGCTAGGCGAGTATGCTAGATGAAAGGTCGTACTGCG

ATCCGGATCGACGCGTACGATTAAAGCTCGTACTGCGTATGCGTATGCCTAGCTAGCTAGGCTAGTATGCTAGATTAAAGCTCGTACTGCG

Long Reads with Random Error:

HIGH CONSENSUS ACCURACY, GREATER THAN 99.999%

99.99%

99.999%

E. coli 20kb-insert library, resequencing analysis with SMRT Analysis v2.3

HIGH CONSENSUS ACCURACY

“SMRT sequencing exceeds the consensus accuracy achieved by

other sequencing methods because of the random nature of the errors.

The SMRT sequencing achieves results with >99.999% accuracy [28].”

NEB ADAPTS PACBIO SEQUENCING TO CHARACTERIZE SOURCE OF

ERROR IN PCR

https://www.neb.com/tools-and-resources/video-library?device=modal&videoid=%7BC4EC7AC8-C541-4FFA-8987-7138B51D8288%7D

https://www.neb.com/tools-and-resources/video-library?device=modal&videoid=C4EC7AC8-C541-4FFA-8987-7138B51D8288

US COMPANY INVITAE ADOPTS PACBIO AS VARIANT

VALIDATION TOOL FOR DTC TESTING

Identify Human Mutations Related to Diseases

ORIGINS OF REDUCED ACCURACY IN CLINICAL

GENOMICS FROM SHORT SEQUENCING READS

Structural Variant Detection in Understanding Human Diseases

STRUCTURAL VARIANT = DIFFERENCE ≥50 BP

deletion insertion duplication

inversion translocation

STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME

4,000

20,000

Short reads

PacBio

repeats + GC-rich +

large insertions

Huddleston et al. (2017) Genome Research 27(5):677-85.

Seo et al. (2016) Nature 538:243-7.

Sudmant et al. (2016) Nature 526:75-81.

VARIATION BETWEEN TWO HUMAN GENOMES vs.

5×106

5 Mb 3 Mb 10 Mb

variants

basepairs

affected

SNVs

4×105

structural variantsindels

2×104

Huddleston et al. (2017) Genome Research 27(5):677-85.

Any study of genetic variation is

incomplete until structural variation is

detected with PacBio sequencing.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 10 20 30 40 50

% S

Vs d

ete

cte

d

Coverage

Het

Hom

HOW MUCH TO SEQUENCE?

short read

30- to 40-fold

saturate

discovery

5- to 10-fold

optimal tradeoff of

cost vs. performance

de novo (spontaneous)

variant discovery

single, high-value sample

disease gene discovery

rare disease

diagnosis (research)

population genetics survey

Human HG00733

Sequel System

211 Gb (70-fold)

Standard is full dataset

Subsample / titrate to

lower coverage

Evaluate overlap with

standard call set

CLINICAL CASE HISTORY

Merker et al. (2017) Genetics in Medicine. doi:10.1038/gim.2017.86.

7 yrsleft atrial myxoma resection,

atrial repair

10 yrstesticular mass,

right orchiectomy

13 yrs pituitary tumor

16 yrsrecurrence of myxomata,

resection, adrenal microadenoma

18 yrsrecurrence of ventricular

myxomata, resection, VT

19 yrsACTH-independent Cushing’s

disease, thyroid nodules

21 yrstransphenoidal resection of

pituitary

present

(26 yrs)

recurrence of myxomata,

consideration for heart transplant

genetics suggests Carney complex

PRKAR1A testing negative

short-read whole genome

sequencing negative

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 10 20 30 40 50

% S

Vs d

ete

cte

d

Coverage

Het

Hom

HOW MUCH TO SEQUENCE?

5- to 10-fold

optimal tradeoff of

cost vs. performance

disease gene discovery

rare disease

diagnosis (research)

population genetics survey

Deletions

≥50 bp

Insertions

≥50 bp

Initial call set 6,971 6,821

EVALUATING STRUCTURAL VARIANTS

MAP

READS

CALL

VARIANTSSEQUENCE

(8-FOLD)

NGMLR PBHoneySequel System


Rescheneder, Sedlazeck, and Schatz. https://github.com/philres/ngmlr/.

English et al. (2014) BMC Bioinformatics 15:180.

Deletions

≥50 bp

Insertions

≥50 bp

Initial call set 6,971 6,821

Not in segdup 5,893 6,254

Not in NA12878 “healthy”

control2,476 3,171

Overlaps RefSeq coding

exon39 16

Gene linked to some

disease in OMIM3 3

EVALUATING STRUCTURAL VARIANTS


Rescheneder, Sedlazeck, and Schatz. https://github.com/philres/ngmlr/.

English et al. (2014) BMC Bioinformatics 15:180.

MAP

READS

CALL

VARIANTSSEQUENCE

(8-FOLD)

NGMLR PBHoneySequel System

HETEROZYGOUS 2.2 KB DELETION IN PRKAR1A


RNA-SEQ SHOWS REDUCED EXPRESSION AND

NOVEL EXON-EXON JUNCTION


control (n=16)

case

normalized read counts

DELETION CLASSIFIED AS PATHOGENIC


PVS1 null variant (nonsense, frameshift, canonical splice sites, initiation codon,

single or multiexon deletion) in a gene where loss of function is a known

mechanism of disease.

PS2 de novo (both maternity and paternity confirmed) in a patient with the disease

and no family history

CLINICAL CASE HISTORY


7 yrsleft atrial myxoma resection,

atrial repair

10 yrstesticular mass,

right orchiectomy

13 yrs pituitary tumor

16 yrsrecurrence of myxomata,

resection, adrenal microadenoma

18 yrsrecurrence of ventricular

myxomata, resection, VT

19 yrsACTH-independent Cushing’s

disease, thyroid nodules

21 yrstransphenoidal resection of

pituitary

present

(26 yrs)

recurrence of myxomata,

consideration for heart transplant

genetics suggests Carney complex

PRKAR1A testing negative

short-read whole genome

sequencing negative

PacBio sequencing identifies causative

structural variant in PRKAR1A

COMPARISON WITH OTHER TECHNOLOGIES

Same sample (NA12878):

PacBio: NGM-LR (GRCh37/hg19) & PBHoney; Illumina: 1000genomes.org

Deletions Insertions

10-fold PacBio (Sequel) 6,798 11,252

Illumina (1000G) 1,910 1,090



PacBio: NGM-LR (GRCh37/hg19) & PBHoney; Illumina: 1000genomes.org: 10X: http://www.slideshare.net/GenomeInABottle/sept2016-sv-10x



Illumina (1000G) 1,910 1,090

30-fold 10xGenomics 3,166 -

COMMUNITY OBSERVATIONS

Michael Schatz “Personalized Phased Diploid Genomes of the EN-TEx Samples”, AGBT 2017

- Poor sensitivity for

insertions


tandem repeat

expansions and

contractions


insertions

- Spurious false

positive deletions

at 180-250bp

- Effectively no

tandem repeat

expansions

- Balanced number

of insertions and

deletions

- Biologically

plausible length

distribution (peak

at ALU)

- Long tandem

repeat expansions

/ contractions

- PacBio callsets are self-consistent

(7,857 SVs)

- 10X LongRanger only calls

deletions and very large events

(>30 kb)

- 10X callsets have little overlap

(1,486 SVs)

- Illumina callsets have even less

overlap (687 SVs)



PacBio: NGM-LR (GRCh37/hg19) & PBHoney; Illumina: 1000genomes.org: 10X: http://www.slideshare.net/GenomeInABottle/sept2016-sv-10x; ONT:

https://github.com/nanopore-wgs-consortium/NA12878, 12/5/2016 release



Illumina (1000G) 1,910 1,090


20-fold ONT 28,791 3,900

COMMUNITY OBSERVATIONS

Mark Chaisson “SV Calling in the cliveome”,

http://lateholocene.blogspot.com/2017/01/sv-calling-in-cliveome.html

Oxford Nanopore SV callset

“The count of the deletion

variants is very high, driven by

the STR and Complex callsets”

“The validation rate for

Nanopore-specific SVs is 35%”

Oxford Nanopore SV callset

Wigard Kloosterman “Characterization of structural variations and

chromothripsis in a nanopore sequencing of human genomes”,

Nanopore Community Meeting 2016.

53,403 deletions




https://github.com/nanopore-wgs-consortium/NA12878, 12/5/2016 release; Bionano: http://www.genetics.org/content/202/1/351



Illumina (1000G) 1,910 1,090


20-fold ONT 28,791 3,900

Bionano Genomics 522 769

HUMAN GENOME SV SIZE DISTRIBUTION

- SV sensitivity: >1.5 kb

- Does not provide exact SV breakpoints

- Does not provide the sequence for insertions

Not accessible

with Bionano

Bionano:




https://github.com/nanopore-wgs-consortium/NA12878, 12/5/2016 release; Bionano: http://www.genetics.org/content/202/1/351



Illumina (1000G) 1,910 1,090


20-fold ONT 28,791 3,900

Bionano Genomics 522 769

PacBio has highest sensitivity and specificity

Other technologies have poor sensitivity

and/or high false positive rates

VARIANT DETECTION IN NA12878

10X Genomics: http://www.slideshare.net/GenomeInABottle/sept2016-sv-10x

Bionano: Hastie AR et. al. (2017) http://dx.doi.org/10.1101/102764

Illumina: http://1000genomes.org

Oxford Nanopore: http://github.com/nanopore-wgs-consortium/NA12878; Kloosterman W (2016).

PacBio: http://www.pacb.com/blog/identifying-structural-variants-na12878-low-fold-coverage-sequencing-pacbio-sequel-system/

0%

20%

40%

60%

80%

100%

0%20%40%60%80%100%

Sensitiv

ity

False Discovery Rate

PacBio (10-fold)

PacBio (30-fold)

Oxford Nanopore (30-fold)

Bionano

Illumina

10X Genomics

Best

Best

size of dot is relative price

Targeted Long Read Sequencing by Pull-Down Enrichment

TARGET ENRICHMENT METHODS

ADVANTAGE OF LONG READ CAPTURE SEQUENCING

Fragment 6-7 Kb Fragment 200 bp

Sequencing and map

TRUE FULL-GENE ANALYSIS

Example: MUTYH

Sequence data generated on PacBio RS II and MiSeq from cell line NA12762 captured with standard NimbleGen Oncology Panel

PacBio

(~5 kb

fragments)

MiSeq

(200 bp

fragments)

RESOLUTION OF STRUCTURAL VARIATION

PDE4DIP:

PacBio

(~5kb

fragments)

MiSeq

(200bp

fragments)

Allele 1

reads

Allele 2

reads

PacBio resolves a heterozygous ~740bp deletion containing an entire exon,

missed by Illumina

Gene/Pseudogene Discrimination

CYP2D6

SEQUENCING REGIONS OF HIGH SEQUENCE HOMOLOGY

- GENCODE project identified 11,216 unique pseudogenes to date

- Challenging for short-read NGS & Sanger sequencing

- Risk of false-positive and false-negative variant calls resulting from inaccurate

mapping of short reads to highly homologous regions, including pseudogenes

Mandelker et al. (2016) Genet Med 18: 1282-1289


1Mandelker et al. (2016) Genet Med 18: 1282-1289; 2PacBio WGS on NA19240 (suppl by NA19239, HG00512 & HG00731 for chrY since NA19240 is female)

193 medically relevant genes in “NGS Problem List”1

163 (84%) of these resolvable with third-generation sequencing2

PMS2 EXAMPLE

- The PMS2 gene is associated with autosomal dominant Lynch syndrome (also called hereditary nonpolyposis colorectal cancer syndrome, or HNPCC)

- Identifying variants in PMS2 is hampered by the presence of a pseudogene, PMS2CL, which has nearly identical homology to PMS2 in the final four exons of the gene (exons 12–15)

- 99.2% identical (exons) vs. 98.2% identical (gene)

- Sequence reads derived from hybridization capture and short read sequencing methods cannot be unambiguously aligned to PMS2 or PMS2CL

PMS2

PMS2CL

10 11 12 13 14 15

1 2 3 4 5 6

98.2% identical

TARGETING APPROACH – LONG RANGE PCR

- This generates a ~17kb amplicon that can be turned into a SMRTbell and sequenced on PacBio

PMS2

PMS2CL

10 11 12 13 14 15

1 2 3 4 5 6

Design primers so that only PMS2 will amplify

17kb amplicon from PMS2

PMS2 RESULTS

- After making a library and sequencing, data is run through Long Amplicon Analysis

- Detection of all variants (exonic & intronic)

- Results in fully phased haplotypes:


wt

mut

5 kb

Sequencing the unsequencable:PCR-free enrichment of repeat expansion genes

DISEASES THAT ARE CAUSED BY UNSTABLE REPEATS

Nature Reviews Genetics, September 2016, Euan A. Ashley, “Towards precision medicine”

Low complexity, high GC regions are intractable to short read sequencing

Short tandem repeats cannot be PCR amplified faithfully

UNIFORMITY: LACK OF CONTEXT BIAS MEAN FEWER

INFORMATION GAPS

RPGR

ClinVar pathogenic poly-T/rs10524523

CCTTCTCCTTCCTCCTCTTCTCCCTCCCCTTCTCCTTCCTCTTCTCCCTC

CCCTTCTCCTTCCTCCTCTTCCCCCTCCCCTTCTCCTTCCTCCCCTTCTT

CCTCCCCTTCTCCTTCTTCCCCTTCTTCCTCCCCTTTCCCTTCTCCTTCC

TCCTCTTCCCCCTCCCCTTCCTCCTCTTCCCCCTCCCCTTCCTCCTCTTC

CCCCTCACCCTCCTCCTCTTCCTCTTCCCTCTCTCCTTTCCCCTCCTCTA

CTTCCCCTCCCTCTACTTCCCCTCCCTCCTCTTTTTCCTCCCCTCTCCCC

TCTGTTTCCTCCTCTTCCCCCTCTCCTTGGTCTCCTTCTTCCTCTCCTTT

CTCCTCCTTCCCCGCTCTTTCCTCCTTTTTCCTCTCTCCTTCCTCCTTTT

CACGTTCTCCCTCCACTTCTTCCCCTTCTCCTTCCTCTTTCCCTTCTCCC

TCCTTCTCTTCTTCCTCTTCTCTGTCTCCCTCCTCTTCTTCTCCTTCTCC

ATGCTCCTCCTCCCCTCCCTCCTCCATCTCTTGGTTTCTTTCCTTCTGAT

PACBIO

ILLUMINA

TARGET ENRICHMENT VIA CAS9 DIGESTION

ABLE TO DETECT ALLELE-SPECIFIC NUMBER OF REPEATS,

INTERRUPTIONS AND DIFFERENT REPEATS

GC-RICH REGIONS

Example: FMR1 (CGG repeat, fragile X syndrome)

Loomis et al. (2013) Genome Research 23: 121-128

100% GC; ~2.5kb

AGG “INTERRUPTIONS” REDUCE THE CHANCES OF PRE -

TO FULL-MUTATION TRANSMISSION

APPLICATION TO THE GENETICS OF PARKINSON’S

PACIFIC BIOSCIENCES® CONFIDENTIAL

PARKINSON’S DISEASE ASSOCIATED WITH PURE ATXN10

REPEAT EXPANSION

- SCA10 is caused by an ATTCT repeat is located within the 66.4 kb intron 9 in ATAXN10

gene

- Normal population: allele size ranges from 10‐32 ATTCT repeats

- SCA10 disease phenotype: allele size ranges from 800‐4,500 ATTCT repeats

- Birgitt Schule et al. found that SCA10 and Parkinson’s Disease segregated in the same

family

- Study goal: genetically characterize the ATXN10 repeat expansion and to better

understand the phenotypic differences of progressive cerebellar ataxia with seizures

and parkinsonism

Schüle B. et al (2017) Parkinson's disease associated with pure ATXN10 repeat expansion. NPJ Parkinsons Dis;3:27

Pedigree of ATXN10 family

https://www.nature.com/articles/s41531-017-0029-x.pdf

PACIFIC BIOSCIENCES® CONFIDENTIAL

- The genetic composition of the complete repeat expansion revealed a novel

phenotype-genotype correlation for Parkinson’s disease and SCA10

- Family members with ataxia: 480 ATTCT repeats followed by 920 ATTCC repeat

interruptions

- Family member with with clinically defined Parkinson’s disease: >1,300 ATTCT

repeats but no ATTCC repeat interruptions

CRISPR/CAS9 AND SMRT SEQUENCING YIELDS NEW

PHENOTYPE ASSOCIATION FOR SCA10 REPEAT EXPANSION

DISORDER

Schüle B. et al (2017) Parkinson's disease associated with pure ATXN10 repeat expansion. NPJ Parkinsons Dis;3:27

unaffected

PD

ataxiaataxia

ataxia ataxia

unaffected

PD

ataxiaataxia

ataxia ataxia

https://www.nature.com/articles/s41531-017-0029-x.pdf

Full-Length Transcripts, No Assembly

Required with Iso-Seq™ Sequencing

DETERMINATION OF TRANSCRIPT ISOFORMS

Gene

Short-read

technologies:

Reads

spanning

splice

junctions

Insufficient Connectivity

Splice Isoform Uncertainty

PacBio’s

Iso-Seq

solution:

Full-length cDNA Sequence Reads

Splice Isoform Certainty – No Assembly Required

mRNA isoforms

PACBIO LONG READS COVER NEARLY ALL TRANSCRIPTS

Average PacBio Read: 12,000 bases

IL2

CD4

CYP2D6

CFTR

BRCA1

SMRT SEQUENCING OF ANDROGEN RECEPTOR VARIANTS

Kohli et al. (2017) Clinical Cancer Research, in press.

SMRT SEQUENCING OF ANDROGEN RECEPTOR VARIANTS

Full-length RNA-seq (Iso-Seq) data revealed that the previously reported

AR-V9 structure is incorrect

Kohli et al. (2017) Clinical Cancer Research, in press.

WITH THIS NEW INFORMATION, CAN WE RESOLVE WHETHER

AR-V7 OR AR-V9 IS THE BETTER BIOMARKER?

Kohli et. al. (2017) Clinical Cancer Research, Accepted for publication.

AR-V9 expression in pre-treatment

biopsies correlates with AR-V7 in a

cohort of 46 responders and 32

non-responders

Univariate Cox regression analysis

for 12-week composite PFS

AR-V9 levels in the highest

quartile is the best predictor of

primary resistance to therapy

TARGETED ISO-SEQ ANALYSIS OF FMR1 GENE

Tseng et al., Altered expression of the FMR1 splicing variants landscape in premutation

carriers, to appear in BBA – Gene Regulatory Mechanisms (2017)

- Follow up on 2014 study (Pretto et al.)

- Previous study identified 16 isoforms

- Matching premutation carriers vs controls

- Identified 49 unique isoforms

- 30 present in premutation group only

- New splicing patterns found

- Group “E” skips exon 11-14

- Validated by qRT-PCR & Sanger

- New exon found

- A 140-bp exon between exon 9 and 10

- Validated by qRT-PCR & Sanger

- Possibly results in truncated ORF

TARGETED ISO-SEQ ANALYSIS OF FMR1 GENE

Tseng et al., Altered expression of the FMR1 splicing variants landscape in premutation

carriers, to appear in BBA – Gene Regulatory Mechanisms (2017)

Iso4, Iso4b (group B) New exon “9.5” skipping exon 3

NOVEL COLORECTAL CANCER BIOMARKERS

SQANTI (STRUCTURAL AND QUALITY ANNOTATION OF NOVEL

TRANSCRIPT ISOFORMS): ISOFORM IDENTIFICATION AND

QUANTIFICATION ANALYSIS SOFTWARE

Is long read sequencing improving transcriptome quantification?

“...non-annotated variability at 3’ ends of expressed transcripts might be confounding

transcriptome quantification by short-reads alone. Our results indicate that using a full-

length, experiment specific transcriptome as a reference solves this problem and improves

accuracy of quantification estimates.

UNRELIABLE QUANTIFICATION BY SHORT READ NGS DUE TO

3 END VARIABILITY OF EXPRESSED TRANSCRIPT

Epigenetic base modification

METHYLATION DETECTION

Flusberg et al. (2010) Nature Methods 7: 461-465

Detectable by other Sequencing Methods

SIGNATURES OF DIFFERENT DNA BASE MODIFICATIONS

75

Prokaryotic Eukaryotic DNA Damage

m6A FOUND IN MAMMALS, SHOWN IN REGULATORY ROLES

EPIGENOME CHARACTERIZATION

- Methylation status of CpG islands (https://github.com/hacone/AgIn)

- Chr4: ANKRD17 (breast cancer)

Collaboration with M. Classon, V. Janakiraman, E. Stawiski, S. Durinck, S. Seshagiri (Genentech) & Y. Suzuki, S. Morishita (U of Tokyo)

https://github.com/hacone/AgIn

- CGG repeat region appears to be heavily methylated (5mC)

DIRECT METHYLATION DETECTION OF FMR1 PREMUTATION

SAMPLE

2017 SEQUEL PERFORMANCE IMPROVEMENTS

2.1 Sequencing Chemistry

SMRT Link 5.0

Library Prep Improvements

Sequencing Yield Improvements

Analysis Improvements(SMRT Link 5.1)

ACCELERATED LIBRARY PREP

- Significantly faster (3 hours vs days)

- Less starting material (2.5x less DNA input)

- More efficient (~5x increase in size selected

library yield)

- Less damage to DNA (improved sequencing

performance)

- Amenable to automation

Release by end of the year

New prep for >30kb library:

ExoVII – 15 min

DDR – 30 min

ER/A-tail – 35 min

Ligation – 30 min

Proteinase K –

15 min

AMPure

~3 hours

* Size Selection still recommended for genomic libraries

- Single tube

- Additive workflow

- ~3 hours

2017 SEQUEL PERFORMANCE IMPROVEMENTS

2.1 Sequencing Chemistry

SMRT Link 5.0

Library Prep Improvements

Sequencing Yield Improvements

Analysis Improvements(SMRT Link 5.1)

SEQUEL ROADMAP

~8-fold increase in throughput per Sequel SMRT CellEnd of 2018:

~2-fold increase in throughput per Sequel SMRT Cell2017:

x

~30-fold increase

~150 Gb / Sequel SMRT Cell

(~50x human genome coverage for de novo assembly)

x~2-fold increase in throughput per Sequel SMRT Cell2018:

2017 PACBIO DISTRIBUTOR MEETING | PACIFIC BIOSCIENCES® CONFIDENTIAL

2017 - Double Gb per SMRT cell

- New process SMRT Cell 1M

- Sequel Sequencing Kits 2.1

- SMRTbell Express Template

Prep Kit

- Upcoming releases

2017 GOAL: SEQUENCING YIELD IMPROVEMENT

> 2-fold increase

RECENT SMRT CELL 1M RUNS IN PACBIO R&D

- ~30 kb insert shear library:

- Yield: 12 Gb

- Average RL: 25 kb

- Longest read: 160 kb

- 5 kb amplicon library:

- Yield: 16.5 Gb

- Average RL: 33.5 kb


- Pooled mixed amplicon library:

- Yield: 22 Gb

- Average RL: 37 kb


http://www.pacb.com/publications/

1 0 0 0 0 4 1 7 5 40103

296

591

~1020

0

200

400

600

800

1000200

3

200

4

200

5

20

06

200

7

200

8

200

9

201

0

20

11

201

2

201

3

201

4

201

5

201

6

>2000 publications

~3-4 new papers per day

http://www.pacb.com/smrt-science/smrt-resources/scientific-publications/

For Research Use Only. Not for use in diagnostics procedures. © Copyright 2016 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio,

SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx.

All other trademarks are the sole property of their respective owners.

www.pacb.com

[email protected]

SMRT Sequencing: Enter a New Realm of Genome, Epigenome...

Documents

Transcript of SMRT Sequencing: Enter a New Realm of Genome, Epigenome...