Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with...
![Page 1: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/1.jpg)
Estimation of alternative splicing isoform frequencies
from RNA-Seq dataMarius Nicolae
Computer Science and Engineering Department
University of Connecticut
Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky
![Page 2: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/2.jpg)
Introduction EM Algorithm Results Conclusions and future work
Outline
![Page 3: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/3.jpg)
RNA-Seq
A B C D E
Make cDNA & shatter into fragments
Sequence fragment ends
Map reads
Gene Expression (GE)
A B C
A C
D E
Isoform Discovery (ID) Isoform Expression (IE)
![Page 4: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/4.jpg)
Read ambiguity (multireads)
What is the gene length?
Gene Expression Challenges
A B C D E
![Page 5: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/5.jpg)
Ignore multireads [Mortazavi et al. 08]
◦ Fractionally allocate multireads based on unique read estimates
[Pasaniuc et al. 10]◦ EM algorithm for solving ambiguities
Gene length: sum of lengths of exons that appear in at least one isoform Underestimate expression levels for genes with 2
or more isoforms [Trapnell et al. 10]
Previous approaches to GE
![Page 6: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/6.jpg)
Read Ambiguity in IE
A B C D E
A C
![Page 7: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/7.jpg)
[Jiang&Wong 09]◦ Poisson model, single reads only
[Li et al.10]◦ EM Algorithm, single reads only
[Feng et al. 10]◦ Convex quadratic program, pairs used only for ID
[Trapnell et al. 10]◦ Extends Jiang’s model to paired reads◦ Fragment length distribution
Previous approaches to IE
![Page 8: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/8.jpg)
EM Algorithm for IE◦ Single and paired reads◦ Fragment length distribution◦ Strand information◦ Base quality scores
Solving GE by adding isoform levels
Our contributions
![Page 9: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/9.jpg)
Introduction EM Algorithm Results Conclusions and future work
Outline
![Page 10: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/10.jpg)
Read-Isoform Compatibility
![Page 11: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/11.jpg)
Paired reads
Single reads
Fragment length distribution
A B C
A C
A B C
A CA C
A B C
A B C
A C
A B C
A C
A B C
A C
Series1
Series1
Series1
Series1
![Page 12: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/12.jpg)
IsoEM algorithm
E-step
M-step
![Page 13: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/13.jpg)
Introduction EM Algorithm Results Conclusions and future work
Outline
![Page 14: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/14.jpg)
Human genome UCSC known isoforms
GNFAtlas2 gene expression levels◦ Uniform/geometric expression of gene isoforms
Normally distributed fragment lengths◦ Mean 250, std. dev. 25
Experimental setup
0 5 10 15 20 25 30 35 40 45 50 551
10
100
1000
10000
100000
Number of isoforms
Num
ber
of
genes
0
5000
10000
15000
20000
25000
Isoform length
Num
ber
of
isofo
rms
![Page 15: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/15.jpg)
Error Fraction (EF)◦ Percentage of isoforms (or genes) with relative
error larger than given threshold t
Median Percent Error (MPE)◦ Threshold t for which EF is 50%
r2 ◦ Coefficient of determination
Accuracy measurements
![Page 16: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/16.jpg)
30M single reads of length 25
Main difference b/w IsoEM and RSEM is fragment length modeling
0 0.2 0.4 0.6 0.8 10
10
20
30
40
50
60
70
80
90
100Uniq Rescue UniqLN RSEM
IsoEM
Relative error threshold
% o
f is
ofo
rms
ove
r th
resh
old
Isoform Error Fraction Curves
![Page 17: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/17.jpg)
Gene Error Fraction Curves
0 0.2 0.4 0.6 0.8 10
10
20
30
40
50
60
70
80
90
100
Uniq Rescue
GeneEM RSEM
IsoEM
Relative error threshold
% o
f g
enes
ove
r th
resh
old
30M single reads of length 25
![Page 18: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/18.jpg)
Fixed sequencing throughput (750Mb)
50bp reads better than 100bp!
Read Length Effect
25 35 45 55 65 75 85 950
5
10
15
20
25
Paired reads
Single reads
Read lengthM
ed
ian
Perc
en
t E
rro
r25 35 45 55 65 75 85 95
0.962000000000001
0.964000000000001
0.966000000000001
0.968000000000001
0.970000000000001
0.972000000000001
0.974000000000001
0.976000000000001
0.978000000000001
Paired reads
Single reads
Read length
r2
![Page 19: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/19.jpg)
1-60M 75bp reads
Pairs help, strand info doesn’t [Trapnell et al. 10] r2=.95 for 13M PE reads
Effect of Pairs & Strand Information
0 10000000 20000000 30000000 40000000 50000000 600000000.9250.93
0.9350.94
0.9450.95
0.9550.96
0.9650.97
0.9750.98
0.985
RandomStrand-Pairs-PerfectMapping
RandomStrand-Pairs
CodingStrand-pairs
RandomStrand-Single
CodingStrand-single
# reads
r2
![Page 20: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/20.jpg)
Introduction EM Algorithm Results Conclusions and future work
Outline
![Page 21: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/21.jpg)
Presented EM algorithm for isoform frequency estimation that exploits fragment length distribution for both single and paired reads◦ Significant accuracy improvement over existing
methods◦ Code and datasets to be released publicly soon
Ongoing extensions◦ Confidence intervals◦ Allelic specific isoform expression◦ Testing for novel isoforms◦ Integration with isoform discovery
Conclusions & Future Work
![Page 22: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649d5f5503460f94a3f0a4/html5/thumbnails/22.jpg)
Questions?