Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data

Alex Zelikovsky Department of Computer Science

Georgia State University

Joint work with Serghei Mangul, Sahar Al Seesi, Adrian Caciula, Dumitru Brinza, Ion Mandoiu

Advances in Next Generation Sequencing

http://www.economist.com/node/16349358

Roche/454 FLX Titanium400-600 million reads/run

400bp avg. length

Illumina HiSeq 2000Up to 6 billion PE reads/run

35-100bp read length

SOLiD 4/55001.4-2.4 billion PE reads/run

35-50bp read length

Ion Proton Sequencer

RNA-SeqRNA-Seq

A B C D E

Make cDNA & shatter into fragments

Sequence fragment ends

Map reads

Gene Expression

Transcriptome Reconstruction Isoform Expression

Transcriptome Assembly

• Given partial or incomplete information about something, use that information to make an informed guess about the missing or unknown data.

Transcriptome Assembly Types

• Genome-independent reconstruction (de novo)– de Brujin k-mer graph

• Genome-guided reconstruction (ab initio)– Spliced read mapping – Exon identification– Splice graph

• Annotation-guided reconstruction– Use existing annotation (known transcripts) – Focus on discovering novel transcripts

Previous approaches

• Genome-independent reconstruction – Trinity(2011), Velvet(2008), TransABySS(2008)

• Genome-guided reconstruction – Scripture(2010)

• Reports “all” transcripts– Cufflinks(2010), IsoLasso(2011), SLIDE(2012),

CLIIQ(2012), TRIP(2012), Traph (2013)• Minimizes set of transcripts explaining reads

• Annotation-guided reconstruction– RABT(2011), DRUT(2011)

Gene representation

• Pseudo-exons - regions of a gene between consecutive transcriptional or splicing events

• Gene - set of non-overlapping pseudo-exons

e1 e3 e5

e2 e4 e6

Spse1Epse1

Epse2Spse3

Epse5 Spse6

Spse7Epse7

Pseudo-exons:

pse1 pse2 pse3 pse4 pse5 pse6 pse7

Splice GraphGenome

1 42 3 5 6 7 8 9

TSSpseudo-exons

• Map the RNA-Seq reads to genome

• Construct Splice Graph - G(V,E)– V : exons– E: splicing events

• Candidate transcripts– depth-first-search (DFS)

• Select candidate transcripts– IsoEM– greedy algorithm

Genome

MaLTA Maximum Likelihood Transcriptome Assembly

How to select?

• Select the smallest set of candidate transcripts • covering all transcript variants

Transcript : set of transcript variants

Sharmistha Pal, Ravi Gupta, Hyunsoo Kim, et al., Alternative transcription exceeds alternative splicing in generating the transcriptome diversity of cerebellar development, Genome Res. 2011 21: 1260-1272

alternative first exon alternative last exon exon skipping intron retention

alternative 5' splice junction alternative 5' splice junction splice junction

IsoEM: Isoform Expression Level Estimation

• Expectation-Maximization algorithm• Unified probabilistic model incorporating

– Single and/or paired reads– Fragment length distribution– Strand information– Base quality scores– Repeat and hexamer bias correction

Read-isoform compatibility graphirw ,

aaair FQOw ,

Fragment length distribution

Series1

Fa (j)

Greedy algorithm

1. Sort transcripts by inferred IsoEM expression levels in decreasing order

2. Traverse transcripts – Select transcripts if it contains novel transcript

variant– Continue traversing until all transcript variant

are covered

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

STOP. All transcript variant are covered.

MaLTA results on GOG-350 dataset

• 4.5M single Ion reads with average read length 121 bp, aligned using TopHat2• Number of assembled transcripts

– MaLTA : 15385 – Cufflinks : 17378

• Number of transcripts matching annotations– MaLTA : 4555(26%) – Cufflinks : 2031(13%)

Expression Estimation on Ion Torrent reads

IsoEM HBR Cufflinks HBR IsoEM UHR Cufflinks UHR0

ufflin

• Squared correlation– IsoEM / Cufflinks FPKMs vs qPCR values for 800 genes – 2 MAQC samples : Human Brain and Universal

Conclusions

• Novel method for transcriptome assembly • Validated on Ion Torrent RNA-Seq Data• Comparing with Cufflinks:

– similar number of assembled transcripts– 2x more previously annotated transcripts

• Transcript quantification is useful for transcript assembly better quantification?

Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data

Documents

Transcript of Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data

UNRAVEILING BIT-TORRENT

Torrent 91

RNA sequencing, transcriptome and expression quantification Henrik Lantz, BILS/SciLifeLab.

MedaPorts16 | Jordi Torrent

Final Torrent

El torrent 2014

RNA Quantification and Illumina Library Generation for RNA Seq RNA QC and...Transcriptome Seq Illumina TruseqRNA Exome 20ng-100ng Library Generation Methods. Illumina TruseqLibrary

Bit torrent seminar

Torrent Lists Master

Torrent Suite™ Software 5.6 Release Notes (Pub. No ... · Torrent Suite™ Software verification service Beginning with Torrent Suite™ Software 5.4., Torrent Suite™ Software

Torrent power presentation

Bit Torrent

Torrent Cables Ltd

Final Bit Torrent

Torrent Pharma

Ion Torrent

TORRENT 30-29

Bit Torrent Protocol

TORRENTS – (BIT TORRENT)

Torrent Pharmaceuticals Limited Levofloxacin Tablets, USP · Torrent Pharmaceuticals Limited Levofloxacin Tablets, USP ... contact Torrent Pharma Inc. at 1 ... Torrent Pharmaceuticals