SMRT Sequencing: Enter a New Realm of Genome, Epigenome...
Transcript of SMRT Sequencing: Enter a New Realm of Genome, Epigenome...
For Research Use Only. Not for use in diagnostics procedures. © Copyright 2016 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo,
PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. All other trademarks are the sole property of their respective owners
SMRT Sequencing: Enter a New Realm of
Genome, Epigenome and Transcriptome
AnalysesZuwei Qian, Ph.D. HKU Customer Sharing Workshop Nov 27, 2017
AGENDA
-Case studies for human genome
sequencing
- Structural variation and human disease
- Targeted sequencing approaches
- Target capture by pull-down enrichment
- Pseudogene sequencing
- Sequencing low-complexity regions
- Full length RNA transcript sequencing
- Epigenetic modification
- PacBio development update
SINGLE MOLECULE, REAL-TIME (SMRT) DNA SEQUENCING
SMRT SEQUENCING CHARACTERISTICS单分子SMRT 测序特性
Long Reads长读取
- Average >10,000 bases
平均读长>10,000 base
High Consensus Accuracy 一致性准确率高
- >99.999%
Uniform, Unbiased Coverage 均一,无偏好覆盖
- Lack of GC% or sequence complexity bias
DNA Modification Detection 碱基修饰检测
- Epigenome characterization
描述表观基因组20 kb PacBio read length
250 bp Illumina read length
PACBIO RS II AND SEQUEL SYSTEMS
PacBio RS II Sequel
Machine Launch 2013 2015
Number ZMW per
Cell150,000 1,000,000
Output per Cell 0.5 – 1 Gb ~ 5-8 Gb
Input DNA per Cell ~ 250 ng ~ 250 ng
N50 Read Length 15 – 20 kb ~15 kb
Consensus accuracy QV50 QV50
CONSENSUS ACCURACY VS. RAW ACCURACY
ATCCGGAGCGACGCGTACGATTAAAGCACGTACTGCGTATGCGTATCCCTAGCTTGCTAGGCTAGTATGCTAGATTAAAGCTCGTACTGCG
ATCCGGATCGACGCGTACGATTAAAGCTCGTACTGCGTATGCGTATGCCTAACTAGATAGGCTAGTTTGCTAGATTAAAGCTCGTTCTGCG
ATCCGTATCGACACGCACGACTAAAGCTCGTACTGCATATGTGTATGCCTAGCTAGCTAGGATAGCATGCTAGATTAAAGCTCGTACTG
ATCCGGATCGCCGCGTATGATTAAAGCTCGTACCGCGTATGCGTATGCCCAGGTAGCTAGGCTAGTATGCTAGATTAAAGTTCGTACTGCG
ATTCGGATCGACGCGTACGATTAAAGCTCGTACTGCGCATGCGTATGCCTAGCTAGCTAGGCTAGTATTCTAGATTAAAGCTCGTAATGCG
ATCCGGATCTACGCGTACGATTAAAGCTAGTACTGCGTATGCGTTTGCCTATGTAGCTAGTCTAGTATGCTAGATTAAAGCTCGTACTGCG
ATCCGGATCGACGTGTACGATTATAGCTCTTACTGCGTATACGTATGCCTAGGTAGCTAGGCTAGTATGCTAGATTAAAGCTCGAACTT
ATCTGGATCGACGCGTACGATCAAAGCTCGTACTGTGTATGCGTATGCCTAGCTCGCTACGCTAGTATGCTCGATTATAGCTCGTACTGCG
ATCCGGATCGACGCGTACGATTAAAGCTCGTACTGCGTATGCGTATGCCTAGGTAGCTAGGCTAGTATGCTAGATTAAAGCTCGTACTGCG
ATCCGGTTCGAAGCGTACGTTTAAAGCTCGTACTACGTATGCGTATGTCTAGCTAGCTATGCTATTATGCTAGTTTAAAGCTCGTACTGCG
ATCCCGATCGACGCGTTCGATTAAAGCTCGTCCTGCGTATGCTTATGCCTAGGCAGCTAGGCTAGTATGCTAGATTAAAGCTCTTACTG
ATCCGGATCGGCGCGTACGATTAAAGCTCGTACTGCGGATGCGTATGCCTAGCTGGCTAGGCGAGTATGCTAGATGAAAGGTCGTACTGCG
ATCCGGATCGACGCGTACGATTAAAGCTCGTACTGCGTATGCGTATGCCTAGCTAGCTAGGCTAGTATGCTAGATTAAAGCTCGTACTGCG
Long Reads with Random Error:
HIGH CONSENSUS ACCURACY, GREATER THAN 99.999%
99.99%
99.999%
E. coli 20kb-insert library, resequencing analysis with SMRT Analysis v2.3
HIGH CONSENSUS ACCURACY
“SMRT sequencing exceeds the consensus accuracy achieved by
other sequencing methods because of the random nature of the errors.
The SMRT sequencing achieves results with >99.999% accuracy [28].”
NEB ADAPTS PACBIO SEQUENCING TO CHARACTERIZE SOURCE OF
ERROR IN PCR
https://www.neb.com/tools-and-resources/video-library?device=modal&videoid=%7BC4EC7AC8-C541-4FFA-8987-7138B51D8288%7D
US COMPANY INVITAE ADOPTS PACBIO AS VARIANT
VALIDATION TOOL FOR DTC TESTING
Identify Human Mutations Related to Diseases
ORIGINS OF REDUCED ACCURACY IN CLINICAL
GENOMICS FROM SHORT SEQUENCING READS
Structural Variant Detection in Understanding Human Diseases
STRUCTURAL VARIANT = DIFFERENCE ≥50 BP
deletion insertion duplication
inversion translocation
STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME
4,000
20,000
Short reads
PacBio
repeats + GC-rich +
large insertions
Huddleston et al. (2017) Genome Research 27(5):677-85.
Seo et al. (2016) Nature 538:243-7.
Sudmant et al. (2016) Nature 526:75-81.
VARIATION BETWEEN TWO HUMAN GENOMES vs.
5×106
5 Mb 3 Mb 10 Mb
variants
basepairs
affected
SNVs
4×105
structural variantsindels
2×104
Huddleston et al. (2017) Genome Research 27(5):677-85.
Any study of genetic variation is
incomplete until structural variation is
detected with PacBio sequencing.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 10 20 30 40 50
% S
Vs d
ete
cte
d
Coverage
Het
Hom
HOW MUCH TO SEQUENCE?
short read
30- to 40-fold
saturate
discovery
5- to 10-fold
optimal tradeoff of
cost vs. performance
de novo (spontaneous)
variant discovery
single, high-value sample
disease gene discovery
rare disease
diagnosis (research)
population genetics survey
Human HG00733
Sequel System
211 Gb (70-fold)
Standard is full dataset
Subsample / titrate to
lower coverage
Evaluate overlap with
standard call set
CLINICAL CASE HISTORY
Merker et al. (2017) Genetics in Medicine. doi:10.1038/gim.2017.86.
7 yrsleft atrial myxoma resection,
atrial repair
10 yrstesticular mass,
right orchiectomy
13 yrs pituitary tumor
16 yrsrecurrence of myxomata,
resection, adrenal microadenoma
18 yrsrecurrence of ventricular
myxomata, resection, VT
19 yrsACTH-independent Cushing’s
disease, thyroid nodules
21 yrstransphenoidal resection of
pituitary
present
(26 yrs)
recurrence of myxomata,
consideration for heart transplant
genetics suggests Carney complex
PRKAR1A testing negative
short-read whole genome
sequencing negative
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 10 20 30 40 50
% S
Vs d
ete
cte
d
Coverage
Het
Hom
HOW MUCH TO SEQUENCE?
5- to 10-fold
optimal tradeoff of
cost vs. performance
disease gene discovery
rare disease
diagnosis (research)
population genetics survey
Deletions
≥50 bp
Insertions
≥50 bp
Initial call set 6,971 6,821
EVALUATING STRUCTURAL VARIANTS
MAP
READS
CALL
VARIANTSSEQUENCE
(8-FOLD)
NGMLR PBHoneySequel System
Merker et al. (2017) Genetics in Medicine. doi:10.1038/gim.2017.86.
Rescheneder, Sedlazeck, and Schatz. https://github.com/philres/ngmlr/.
English et al. (2014) BMC Bioinformatics 15:180.
Deletions
≥50 bp
Insertions
≥50 bp
Initial call set 6,971 6,821
Not in segdup 5,893 6,254
Not in NA12878 “healthy”
control2,476 3,171
Overlaps RefSeq coding
exon39 16
Gene linked to some
disease in OMIM3 3
EVALUATING STRUCTURAL VARIANTS
Merker et al. (2017) Genetics in Medicine. doi:10.1038/gim.2017.86.
Rescheneder, Sedlazeck, and Schatz. https://github.com/philres/ngmlr/.
English et al. (2014) BMC Bioinformatics 15:180.
MAP
READS
CALL
VARIANTSSEQUENCE
(8-FOLD)
NGMLR PBHoneySequel System
HETEROZYGOUS 2.2 KB DELETION IN PRKAR1A
Merker et al. (2017) Genetics in Medicine. doi:10.1038/gim.2017.86.
RNA-SEQ SHOWS REDUCED EXPRESSION AND
NOVEL EXON-EXON JUNCTION
Merker et al. (2017) Genetics in Medicine. doi:10.1038/gim.2017.86.
control (n=16)
case
normalized read counts
DELETION CLASSIFIED AS PATHOGENIC
Merker et al. (2017) Genetics in Medicine. doi:10.1038/gim.2017.86.
PVS1 null variant (nonsense, frameshift, canonical splice sites, initiation codon,
single or multiexon deletion) in a gene where loss of function is a known
mechanism of disease.
PS2 de novo (both maternity and paternity confirmed) in a patient with the disease
and no family history
CLINICAL CASE HISTORY
Merker et al. (2017) Genetics in Medicine. doi:10.1038/gim.2017.86.
7 yrsleft atrial myxoma resection,
atrial repair
10 yrstesticular mass,
right orchiectomy
13 yrs pituitary tumor
16 yrsrecurrence of myxomata,
resection, adrenal microadenoma
18 yrsrecurrence of ventricular
myxomata, resection, VT
19 yrsACTH-independent Cushing’s
disease, thyroid nodules
21 yrstransphenoidal resection of
pituitary
present
(26 yrs)
recurrence of myxomata,
consideration for heart transplant
genetics suggests Carney complex
PRKAR1A testing negative
short-read whole genome
sequencing negative
PacBio sequencing identifies causative
structural variant in PRKAR1A
COMPARISON WITH OTHER TECHNOLOGIES
Same sample (NA12878):
PacBio: NGM-LR (GRCh37/hg19) & PBHoney; Illumina: 1000genomes.org
Deletions Insertions
10-fold PacBio (Sequel) 6,798 11,252
Illumina (1000G) 1,910 1,090
COMPARISON WITH OTHER TECHNOLOGIES
Same sample (NA12878):
PacBio: NGM-LR (GRCh37/hg19) & PBHoney; Illumina: 1000genomes.org: 10X: http://www.slideshare.net/GenomeInABottle/sept2016-sv-10x
Deletions Insertions
10-fold PacBio (Sequel) 6,798 11,252
Illumina (1000G) 1,910 1,090
30-fold 10xGenomics 3,166 -
COMMUNITY OBSERVATIONS
Michael Schatz “Personalized Phased Diploid Genomes of the EN-TEx Samples”, AGBT 2017
- Poor sensitivity for
insertions
- Poor sensitivity for
tandem repeat
expansions and
contractions
- Poor sensitivity for
insertions
- Spurious false
positive deletions
at 180-250bp
- Effectively no
tandem repeat
expansions
- Balanced number
of insertions and
deletions
- Biologically
plausible length
distribution (peak
at ALU)
- Long tandem
repeat expansions
/ contractions
- PacBio callsets are self-consistent
(7,857 SVs)
- 10X LongRanger only calls
deletions and very large events
(>30 kb)
- 10X callsets have little overlap
(1,486 SVs)
- Illumina callsets have even less
overlap (687 SVs)
COMPARISON WITH OTHER TECHNOLOGIES
Same sample (NA12878):
PacBio: NGM-LR (GRCh37/hg19) & PBHoney; Illumina: 1000genomes.org: 10X: http://www.slideshare.net/GenomeInABottle/sept2016-sv-10x; ONT:
https://github.com/nanopore-wgs-consortium/NA12878, 12/5/2016 release
Deletions Insertions
10-fold PacBio (Sequel) 6,798 11,252
Illumina (1000G) 1,910 1,090
30-fold 10xGenomics 3,166 -
20-fold ONT 28,791 3,900
COMMUNITY OBSERVATIONS
Mark Chaisson “SV Calling in the cliveome”,
http://lateholocene.blogspot.com/2017/01/sv-calling-in-cliveome.html
Oxford Nanopore SV callset
“The count of the deletion
variants is very high, driven by
the STR and Complex callsets”
“The validation rate for
Nanopore-specific SVs is 35%”
Oxford Nanopore SV callset
Wigard Kloosterman “Characterization of structural variations and
chromothripsis in a nanopore sequencing of human genomes”,
Nanopore Community Meeting 2016.
53,403 deletions
COMPARISON WITH OTHER TECHNOLOGIES
Same sample (NA12878):
PacBio: NGM-LR (GRCh37/hg19) & PBHoney; Illumina: 1000genomes.org: 10X: http://www.slideshare.net/GenomeInABottle/sept2016-sv-10x; ONT:
https://github.com/nanopore-wgs-consortium/NA12878, 12/5/2016 release; Bionano: http://www.genetics.org/content/202/1/351
Deletions Insertions
10-fold PacBio (Sequel) 6,798 11,252
Illumina (1000G) 1,910 1,090
30-fold 10xGenomics 3,166 -
20-fold ONT 28,791 3,900
Bionano Genomics 522 769
HUMAN GENOME SV SIZE DISTRIBUTION
- SV sensitivity: >1.5 kb
- Does not provide exact SV breakpoints
- Does not provide the sequence for insertions
Not accessible
with Bionano
Bionano:
COMPARISON WITH OTHER TECHNOLOGIES
Same sample (NA12878):
PacBio: NGM-LR (GRCh37/hg19) & PBHoney; Illumina: 1000genomes.org: 10X: http://www.slideshare.net/GenomeInABottle/sept2016-sv-10x; ONT:
https://github.com/nanopore-wgs-consortium/NA12878, 12/5/2016 release; Bionano: http://www.genetics.org/content/202/1/351
Deletions Insertions
10-fold PacBio (Sequel) 6,798 11,252
Illumina (1000G) 1,910 1,090
30-fold 10xGenomics 3,166 -
20-fold ONT 28,791 3,900
Bionano Genomics 522 769
PacBio has highest sensitivity and specificity
Other technologies have poor sensitivity
and/or high false positive rates
VARIANT DETECTION IN NA12878
10X Genomics: http://www.slideshare.net/GenomeInABottle/sept2016-sv-10x
Bionano: Hastie AR et. al. (2017) http://dx.doi.org/10.1101/102764
Illumina: http://1000genomes.org
Oxford Nanopore: http://github.com/nanopore-wgs-consortium/NA12878; Kloosterman W (2016).
PacBio: http://www.pacb.com/blog/identifying-structural-variants-na12878-low-fold-coverage-sequencing-pacbio-sequel-system/
0%
20%
40%
60%
80%
100%
0%20%40%60%80%100%
Sensitiv
ity
False Discovery Rate
PacBio (10-fold)
PacBio (30-fold)
Oxford Nanopore (30-fold)
Bionano
Illumina
10X Genomics
Best
Best
size of dot is relative price
Targeted Long Read Sequencing by Pull-Down Enrichment
TARGET ENRICHMENT METHODS
ADVANTAGE OF LONG READ CAPTURE SEQUENCING
Fragment 6-7 Kb Fragment 200 bp
Sequencing and map
TRUE FULL-GENE ANALYSIS
Example: MUTYH
Sequence data generated on PacBio RS II and MiSeq from cell line NA12762 captured with standard NimbleGen Oncology Panel
PacBio
(~5 kb
fragments)
MiSeq
(200 bp
fragments)
RESOLUTION OF STRUCTURAL VARIATION
PDE4DIP:
PacBio
(~5kb
fragments)
MiSeq
(200bp
fragments)
Allele 1
reads
Allele 2
reads
PacBio resolves a heterozygous ~740bp deletion containing an entire exon,
missed by Illumina
Gene/Pseudogene Discrimination
CYP2D6
SEQUENCING REGIONS OF HIGH SEQUENCE HOMOLOGY
- GENCODE project identified 11,216 unique pseudogenes to date
- Challenging for short-read NGS & Sanger sequencing
- Risk of false-positive and false-negative variant calls resulting from inaccurate
mapping of short reads to highly homologous regions, including pseudogenes
Mandelker et al. (2016) Genet Med 18: 1282-1289
SEQUENCING REGIONS OF HIGH SEQUENCE HOMOLOGY
Mandelker et al. (2016) Genet Med 18: 1282-1289
SEQUENCING REGIONS OF HIGH SEQUENCE HOMOLOGY
1Mandelker et al. (2016) Genet Med 18: 1282-1289; 2PacBio WGS on NA19240 (suppl by NA19239, HG00512 & HG00731 for chrY since NA19240 is female)
193 medically relevant genes in “NGS Problem List”1
163 (84%) of these resolvable with third-generation sequencing2
PMS2 EXAMPLE
- The PMS2 gene is associated with autosomal dominant Lynch syndrome (also called hereditary nonpolyposis colorectal cancer syndrome, or HNPCC)
- Identifying variants in PMS2 is hampered by the presence of a pseudogene, PMS2CL, which has nearly identical homology to PMS2 in the final four exons of the gene (exons 12–15)
- 99.2% identical (exons) vs. 98.2% identical (gene)
- Sequence reads derived from hybridization capture and short read sequencing methods cannot be unambiguously aligned to PMS2 or PMS2CL
PMS2
PMS2CL
10 11 12 13 14 15
1 2 3 4 5 6
98.2% identical
TARGETING APPROACH – LONG RANGE PCR
- This generates a ~17kb amplicon that can be turned into a SMRTbell and sequenced on PacBio
PMS2
PMS2CL
10 11 12 13 14 15
1 2 3 4 5 6
Design primers so that only PMS2 will amplify
17kb amplicon from PMS2
PMS2 RESULTS
- After making a library and sequencing, data is run through Long Amplicon Analysis
- Detection of all variants (exonic & intronic)
- Results in fully phased haplotypes:
Mandelker et al. (2016) Genet Med 18: 1282-1289
wt
mut
5 kb
Sequencing the unsequencable:PCR-free enrichment of repeat expansion genes
DISEASES THAT ARE CAUSED BY UNSTABLE REPEATS
Nature Reviews Genetics, September 2016, Euan A. Ashley, “Towards precision medicine”
Low complexity, high GC regions are intractable to short read sequencing
Short tandem repeats cannot be PCR amplified faithfully
UNIFORMITY: LACK OF CONTEXT BIAS MEAN FEWER
INFORMATION GAPS
RPGR
ClinVar pathogenic poly-T/rs10524523
CCTTCTCCTTCCTCCTCTTCTCCCTCCCCTTCTCCTTCCTCTTCTCCCTC
CCCTTCTCCTTCCTCCTCTTCCCCCTCCCCTTCTCCTTCCTCCCCTTCTT
CCTCCCCTTCTCCTTCTTCCCCTTCTTCCTCCCCTTTCCCTTCTCCTTCC
TCCTCTTCCCCCTCCCCTTCCTCCTCTTCCCCCTCCCCTTCCTCCTCTTC
CCCCTCACCCTCCTCCTCTTCCTCTTCCCTCTCTCCTTTCCCCTCCTCTA
CTTCCCCTCCCTCTACTTCCCCTCCCTCCTCTTTTTCCTCCCCTCTCCCC
TCTGTTTCCTCCTCTTCCCCCTCTCCTTGGTCTCCTTCTTCCTCTCCTTT
CTCCTCCTTCCCCGCTCTTTCCTCCTTTTTCCTCTCTCCTTCCTCCTTTT
CACGTTCTCCCTCCACTTCTTCCCCTTCTCCTTCCTCTTTCCCTTCTCCC
TCCTTCTCTTCTTCCTCTTCTCTGTCTCCCTCCTCTTCTTCTCCTTCTCC
ATGCTCCTCCTCCCCTCCCTCCTCCATCTCTTGGTTTCTTTCCTTCTGAT
PACBIO
ILLUMINA
TARGET ENRICHMENT VIA CAS9 DIGESTION
ABLE TO DETECT ALLELE-SPECIFIC NUMBER OF REPEATS,
INTERRUPTIONS AND DIFFERENT REPEATS
GC-RICH REGIONS
Example: FMR1 (CGG repeat, fragile X syndrome)
Loomis et al. (2013) Genome Research 23: 121-128
100% GC; ~2.5kb
AGG “INTERRUPTIONS” REDUCE THE CHANCES OF PRE -
TO FULL-MUTATION TRANSMISSION
APPLICATION TO THE GENETICS OF PARKINSON’S
PACIFIC BIOSCIENCES® CONFIDENTIAL
PARKINSON’S DISEASE ASSOCIATED WITH PURE ATXN10
REPEAT EXPANSION
- SCA10 is caused by an ATTCT repeat is located within the 66.4 kb intron 9 in ATAXN10
gene
- Normal population: allele size ranges from 10‐32 ATTCT repeats
- SCA10 disease phenotype: allele size ranges from 800‐4,500 ATTCT repeats
- Birgitt Schule et al. found that SCA10 and Parkinson’s Disease segregated in the same
family
- Study goal: genetically characterize the ATXN10 repeat expansion and to better
understand the phenotypic differences of progressive cerebellar ataxia with seizures
and parkinsonism
Schüle B. et al (2017) Parkinson's disease associated with pure ATXN10 repeat expansion. NPJ Parkinsons Dis;3:27
Pedigree of ATXN10 family
PACIFIC BIOSCIENCES® CONFIDENTIAL
- The genetic composition of the complete repeat expansion revealed a novel
phenotype-genotype correlation for Parkinson’s disease and SCA10
- Family members with ataxia: 480 ATTCT repeats followed by 920 ATTCC repeat
interruptions
- Family member with with clinically defined Parkinson’s disease: >1,300 ATTCT
repeats but no ATTCC repeat interruptions
CRISPR/CAS9 AND SMRT SEQUENCING YIELDS NEW
PHENOTYPE ASSOCIATION FOR SCA10 REPEAT EXPANSION
DISORDER
Schüle B. et al (2017) Parkinson's disease associated with pure ATXN10 repeat expansion. NPJ Parkinsons Dis;3:27
unaffected
PD
ataxiaataxia
ataxia ataxia
unaffected
PD
ataxiaataxia
ataxia ataxia
Full-Length Transcripts, No Assembly
Required with Iso-Seq™ Sequencing
DETERMINATION OF TRANSCRIPT ISOFORMS
Gene
Short-read
technologies:
Reads
spanning
splice
junctions
Insufficient Connectivity
Splice Isoform Uncertainty
PacBio’s
Iso-Seq
solution:
Full-length cDNA Sequence Reads
Splice Isoform Certainty – No Assembly Required
mRNA isoforms
PACBIO LONG READS COVER NEARLY ALL TRANSCRIPTS
Average PacBio Read: 12,000 bases
IL2
CD4
CYP2D6
CFTR
BRCA1
SMRT SEQUENCING OF ANDROGEN RECEPTOR VARIANTS
Kohli et al. (2017) Clinical Cancer Research, in press.
SMRT SEQUENCING OF ANDROGEN RECEPTOR VARIANTS
Full-length RNA-seq (Iso-Seq) data revealed that the previously reported
AR-V9 structure is incorrect
Kohli et al. (2017) Clinical Cancer Research, in press.
WITH THIS NEW INFORMATION, CAN WE RESOLVE WHETHER
AR-V7 OR AR-V9 IS THE BETTER BIOMARKER?
Kohli et. al. (2017) Clinical Cancer Research, Accepted for publication.
AR-V9 expression in pre-treatment
biopsies correlates with AR-V7 in a
cohort of 46 responders and 32
non-responders
Univariate Cox regression analysis
for 12-week composite PFS
AR-V9 levels in the highest
quartile is the best predictor of
primary resistance to therapy
TARGETED ISO-SEQ ANALYSIS OF FMR1 GENE
Tseng et al., Altered expression of the FMR1 splicing variants landscape in premutation
carriers, to appear in BBA – Gene Regulatory Mechanisms (2017)
- Follow up on 2014 study (Pretto et al.)
- Previous study identified 16 isoforms
- Matching premutation carriers vs controls
- Identified 49 unique isoforms
- 30 present in premutation group only
- New splicing patterns found
- Group “E” skips exon 11-14
- Validated by qRT-PCR & Sanger
- New exon found
- A 140-bp exon between exon 9 and 10
- Validated by qRT-PCR & Sanger
- Possibly results in truncated ORF
TARGETED ISO-SEQ ANALYSIS OF FMR1 GENE
Tseng et al., Altered expression of the FMR1 splicing variants landscape in premutation
carriers, to appear in BBA – Gene Regulatory Mechanisms (2017)
Iso4, Iso4b (group B) New exon “9.5” skipping exon 3
NOVEL COLORECTAL CANCER BIOMARKERS
SQANTI (STRUCTURAL AND QUALITY ANNOTATION OF NOVEL
TRANSCRIPT ISOFORMS): ISOFORM IDENTIFICATION AND
QUANTIFICATION ANALYSIS SOFTWARE
Is long read sequencing improving transcriptome quantification?
“...non-annotated variability at 3’ ends of expressed transcripts might be confounding
transcriptome quantification by short-reads alone. Our results indicate that using a full-
length, experiment specific transcriptome as a reference solves this problem and improves
accuracy of quantification estimates.
UNRELIABLE QUANTIFICATION BY SHORT READ NGS DUE TO
3 END VARIABILITY OF EXPRESSED TRANSCRIPT
Epigenetic base modification
METHYLATION DETECTION
Flusberg et al. (2010) Nature Methods 7: 461-465
Detectable by other Sequencing Methods
SIGNATURES OF DIFFERENT DNA BASE MODIFICATIONS
75
Prokaryotic Eukaryotic DNA Damage
m6A FOUND IN MAMMALS, SHOWN IN REGULATORY ROLES
EPIGENOME CHARACTERIZATION
- Methylation status of CpG islands (https://github.com/hacone/AgIn)
- Chr4: ANKRD17 (breast cancer)
Collaboration with M. Classon, V. Janakiraman, E. Stawiski, S. Durinck, S. Seshagiri (Genentech) & Y. Suzuki, S. Morishita (U of Tokyo)
- CGG repeat region appears to be heavily methylated (5mC)
DIRECT METHYLATION DETECTION OF FMR1 PREMUTATION
SAMPLE
2017 SEQUEL PERFORMANCE IMPROVEMENTS
2.1 Sequencing Chemistry
SMRT Link 5.0
Library Prep Improvements
Sequencing Yield Improvements
Analysis Improvements(SMRT Link 5.1)
ACCELERATED LIBRARY PREP
- Significantly faster (3 hours vs days)
- Less starting material (2.5x less DNA input)
- More efficient (~5x increase in size selected
library yield)
- Less damage to DNA (improved sequencing
performance)
- Amenable to automation
Release by end of the year
New prep for >30kb library:
ExoVII – 15 min
DDR – 30 min
ER/A-tail – 35 min
Ligation – 30 min
Proteinase K –
15 min
AMPure
~3 hours
* Size Selection still recommended for genomic libraries
- Single tube
- Additive workflow
- ~3 hours
2017 SEQUEL PERFORMANCE IMPROVEMENTS
2.1 Sequencing Chemistry
SMRT Link 5.0
Library Prep Improvements
Sequencing Yield Improvements
Analysis Improvements(SMRT Link 5.1)
SEQUEL ROADMAP
~8-fold increase in throughput per Sequel SMRT CellEnd of 2018:
~2-fold increase in throughput per Sequel SMRT Cell2017:
x
~30-fold increase
~150 Gb / Sequel SMRT Cell
(~50x human genome coverage for de novo assembly)
x~2-fold increase in throughput per Sequel SMRT Cell2018:
2017 PACBIO DISTRIBUTOR MEETING | PACIFIC BIOSCIENCES® CONFIDENTIAL
2017 - Double Gb per SMRT cell
- New process SMRT Cell 1M
- Sequel Sequencing Kits 2.1
- SMRTbell Express Template
Prep Kit
- Upcoming releases
2017 GOAL: SEQUENCING YIELD IMPROVEMENT
> 2-fold increase
RECENT SMRT CELL 1M RUNS IN PACBIO R&D
- ~30 kb insert shear library:
- Yield: 12 Gb
- Average RL: 25 kb
- Longest read: 160 kb
- 5 kb amplicon library:
- Yield: 16.5 Gb
- Average RL: 33.5 kb
- Longest read: 135 kb
- Pooled mixed amplicon library:
- Yield: 22 Gb
- Average RL: 37 kb
- Longest read: 200 kb
http://www.pacb.com/publications/
1 0 0 0 0 4 1 7 5 40103
296
591
~1020
0
200
400
600
800
1000200
3
200
4
200
5
20
06
200
7
200
8
200
9
201
0
20
11
201
2
201
3
201
4
201
5
201
6
>2000 publications
~3-4 new papers per day
For Research Use Only. Not for use in diagnostics procedures. © Copyright 2016 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio,
SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx.
All other trademarks are the sole property of their respective owners.
www.pacb.com