The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome...

17
FIND MEANING IN COMPLEXITY © Copyright 2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in diagnostic procedures. Elizabeth Tseng, Ph.D. Senior Staff Scientist The Iso-SeqMethod: Transcriptome Sequencing Using Long Reads

Transcript of The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome...

Page 1: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

FIND MEANING IN COMPLEXITY

© Copyright 2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in diagnostic procedures.

Elizabeth Tseng, Ph.D.

Senior Staff Scientist

The Iso-Seq™ Method: Transcriptome Sequencing Using Long Reads

Page 2: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

Transcription Variation Proteomic/Gene Complexity

2 slide from G. Shenykman, ASMS talk 2014

Page 3: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

A Single Gene Locus Many Transcripts

3

slide from G. Shenykman, ASMS talk 2014

Page 4: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

Short reads cannot accurately assemble complex

transcripts

Steijger et al. (2013) Assessment of transcript reconstruction methods for RNA-Seq. Nature Methods

doi:10.1038/nmeth.2714.

…the complexity of higher eukaryotic genomes imposes severe

limitations on transcript recall and splice product discrimination…

…assembly of complete isoform structures poses a major

challenge even when all constituent elements are identified…

…Ultimately, the evolution of RNA-seq will move toward single-

pass determination of intact transcripts….

Page 5: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

Iso-Seq™ Method: PacBio Transcriptome Sequencing

• Single-molecule observation

– one read = one transcript

• Sequence transcript in full length

– 0 – 15 kb full-length transcripts

– no assembly required

The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing

using the PacBio System, including those that do not follow recommended library

preparation or the Iso-Seq bioinformatics pipeline (ICE + Quiver, later slides)

Page 6: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

Iso-Seq Library Workflow

6

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription

(SMARTScribe RT)

Full Length 1st Strand cDNA

PCR

Optimization

Large Scale Amplification

(Phusion DNA Polymerase)

Amplified cDNA

1-2 kb

2-3 kb

3-6 kb

Size Selection

(gel / BluePippin / SageELF)

1-2 kb

2-3 kb

3-6 kb

Re-Amplification

(Phusion DNA Polymerase)

1-2 kb

2-3 kb

3-6 kb

SMRTbell Template

Preparation

1-2 kb

2-3 kb

3-6 kb

SMRT Sequencing

3-6 kb

Optional Size Selection

(BluePippin)

Size cuts can be arbitrary

Current max FL transcript seen: 15 kb

5-10 kb

5-10 kb

5-10 kb

5-10 kb

5-10 kb

Page 7: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

Full-Length (FL) read identification

Full-Length = 5’ primer seen, polyA tail seen, 3’ primer seen

• Identify and remove primers and polyA/T tail

• Identify transcript stranded-ness

Page 8: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

Bioinformatics Challenge

8

ATTTAAGGCC ATTTAAGGCC ATTTAAGGCC

GCCATG GCCATG

TATAGGCAAGTAACGTT TATAGGCAAGTAACGTT

ATTCAAGGCC AATTAGGGC TTTAGGCC AAT GGCCATTG

GCCATG

TATAGGCAAGTACGTT TATAGGGGCAAGTAACGTT

SAMPLE INPUT SEQUENCING OUTPUT

Need to recover the original sequence Error Correction

Page 9: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

Bioinformatics Challenge

9

ATTTAAGGCC ATTTAAGGCC ATTTAAGGCC

GCCATG GCCATG

TATAGGCAAGTAACGTT TATAGGCAAGTAACGTT

ATTCAAGGCC AATTAGGGC TTTAGGCC AAT GGCCATTG

GCCATG

TATAGGCAAGTACGTT TATAGGGGCAAGTAACGTT

SAMPLE INPUT SEQUENCING OUTPUT

Need to recover the original sequence Error Correction

POST-

ERROR CORRECTION

ATTTAAGGCC

GCCATG

TATAGGCAAGTAACGTT

Page 10: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

Bioinformatics Challenge

10

ATTTAAGGCC ATTTAAGGCC ATTTAAGGCC

GCCATG GCCATG

TATAGGCAAGTAACGTT TATAGGCAAGTAACGTT

ATTCAAGGCC AATTAGGGC TTTAGGCC AAT GGCCATTG

GCCATG

TATAGGCAAGTACGTT TATAGGGGCAAGTAACGTT

SAMPLE INPUT SEQUENCING OUTPUT

Need to recover the original sequence Error Correction

POST-

ERROR CORRECTION

ATTTAAGGCC: 3

GCCATG: 2

TATAGGCAAGTAACGTT: 2

Page 11: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

Error Correction: Three Approaches

11

Tool Author Genome-

Guided

Hybrid (long +

short reads)

Abundance

Inferrence

ToFU

(RS_IsoSeq) Liz T. N N (not really)

CONVEX Meisam R.

(David T.) N N Y

LSC + IDP Kin Fai A. Y Y Y

Page 12: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

For Research Use Only. Not for use in diagnostic procedures.

ToFU: The ICE + Quiver error correction pipeline

12

Page 13: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

Transcript isOforms: Full-length and Unassembled

ToFU is available through

SMRT Analysis (RS_IsoSeq)

and GitHub (ToFU)

Methods is available in paper supp

• de novo (no ref genome required)

• no assembly

• can handle any read length

• works for mixed accuracy

• post-Quiver: 99-100% accuracy

Page 14: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

ToFU pipeline: classify cluster (ICE) Quiver polishing

Per-molecule reads (ReadsOfInsert aka CCS reads)

Clusters of transcript alignments using FL + nFL reads

Transcript 1 Transcript 2 Transcript 3

Final transcript consensus

Transcript 1 Transcript 2 Transcript 3

Full-length (FL) reads

Non-FL reads

Transcript 1 Transcript 2 Transcript 3

Isoform-level clusters ICE

Quiver

Page 15: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

ToFU reveals transcriptional complexity in P. crispa

Gray are single gene transcripts

Green are polycistronic transcripts

that span 2+ genes

Top: Short read mapping

Bottom: PacBio transcripts

Gordon & Tseng, 2015

Page 16: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

From Novel Transcripts to Novel Proteins

Shenykman, ASMS talk 2014

PacBio public MCF-7 dataset

• ~90% predicted ORFs

matched mass spec peptide

• 251 novel ORFs found unique

to MCF-7

Page 17: The Iso-Seq Method...2015/08/24  · The term “Iso-Seq method” can refer to any transcriptome (cDNA) sequencing using the PacBio System, including those that do not follow recommended

For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq

are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.