Differential Alternative Splicing - an in silico detection ... · Bled Presentation Department of...
Transcript of Differential Alternative Splicing - an in silico detection ... · Bled Presentation Department of...
Differential Alternative Splicing -an in silico detection approach
Gero Doose
Bled Presentation
Department of Computer Science andInterdisciplinary Center for Bioinformatics,
University of Leipzig
February 19, 2016
Table of Contents
1 Motivation
2 Method
3 Results
2
Section 1
MotivationDifferential alternative splicing
3
Alternative splicing
4
Alternative splicing
5
Section 2 Method Overview of the algorithm
6
Algorithmic workflow
Core algorithm is based on splicejunction supports per sampleScanning routine for standard format(e.g. TCGA)Pre-calculation procedure for BAMfiles (segemehl)
7
Pre-calculation procedure for BAM files (segemehl)
Starting with a list of samples(location of the mapped RNA-seq files)
8
Pre-calculation procedure for BAM files (segemehl)
Starting with a list of samples(location of the mapped RNA-seq files)Building the union of all split-mappedreads
9
Pre-calculation procedure for BAM files (segemehl)
Starting with a list of samples(location of the mapped RNA-seq files)Building the union of all split-mappedreadsCalling splice junctions(cluster splice sites)
10
Pre-calculation procedure for BAM files (segemehl)
Starting with a list of samples(location of the mapped RNA-seq files)Building the union of all split-mappedreadsCalling splice junctions(cluster splice sites)Calculating table of individual samplesupports
11
Intersecting with gene annotation
Normal-spliced (co-linear) junctions
Circular-spliced junctions
In/out going vs. within the gene body
Not restricted to known exon bounderies
12
Matrix reduction
minimum splice junction support value (J)minimum samples amount (S)
Removing samples not showing (J) in atleast one splice junctionRemoving junctions with less than (S)samples showing (J)Removing genes not containing at least 2junctions and (S) samples per condition
13
Zero-replacement
Accounting for technical andbiological varianceSampling junction supports from anegative binomial distributionTracking junctions where more than 50%of a condition are replaced
14
Compositional data approach
Normalizing raw counts to ratios per gene
C1 = [40, 30, 10] C1 = [0.5, 0.375, 0.125]
C2 = [120, 30, 90] C2 = [0.5, 0.125, 0.375]
Simplex as the appropriate sample space:
SD = {[x1, ..., xD ] : xi ≥ 0 for i = 1, ...,D andD∑i=1
xi = 1}
Aitchison (1986)
15
Compositional data approach
C1 = [0.5, 0.375, 0.125]
C2 = [0.5, 0.125, 0.375]
Ternary Diagram16
Compositional data approach
Aitchison distance
d(xa, xb) =
[D∑i=1
(log (
xai
g(xa))− log (
xbi
g(xb))
)2] 1
2
with g(x) =(∏D
i=1 xi
) 1D
17
Compositional data approach
Central tendencyLet C = (xij , ..., xND) be a set of N compositionalvectors with D components:
cen(C) =
[g(xi1)∑Dj=1 g(xij )
, ...,g(xiD)∑Dj=1 g(xij )
]
with g(xij ) =(∏N
i=1 xij
) 1N
18
Compositional data approach
Abundance change
abc(xi ) = d(cen(Cbase)i , cen(Ccompare)i )
C1 = [0.5, 0.375, 0.125]
C2 = [0.5, 0.125, 0.375]19
Detection of differential alternative splicing
For all components with |abc| ≥ 1:Centered log-ratio transformation (clr):
clr(x) =
[log
x1
g(x), ..., log
xD
g(x)
]with g(x) =
(∏Di=1 xi
) 1D
non parametric test statistic(Wilcoxon rank-sum)
Multiple testing correction(Benjamini Hochberg)
20
Clustering and outlier detection
Finding upper quartil of all genes in regard toaverage distance between the centre and eachof the n compositional vectors
21
Clustering and outlier detection
Finding upper quartil of all genes in regard toaverage distance between the centre and eachof the n compositional vectorsCalculating all
(n2
)pairwise sample
combinations averaged over this set of genes
22
Clustering and outlier detection
Finding upper quartil of all genes in regard toaverage distance between the centre and eachof the n compositional vectorsCalculating all
(n2
)pairwise sample
combinations averaged over this set of genesHierarchical agglomerative clustering
BL_411
9027
BL_413
0003
BL_412
7766
BL_414
6289
BL_419
0495
BL_417
7856
BL_413
3511
BL_419
4891
BL_418
9998
BL_419
4218
BL_414
4633
GCB_416
0735
-gcbc
ell
GCB_412
2131
GCB_417
4884
GCB_414
9880
-gcbc
ell
GCB_411
8819
BL_411
2512
BL_417
7434
BL_418
2393
BL_412
5240
BL_416
2611
BL_419
3278
0.0
0.5
1.0
1.5
2.0
2.5
3.0
23
Section 3
ResultsFirst glimpse at the hidden treasures within the ICGC RNA-seq data
24
ICGC data set: Germinal Center B-cell Derived Lymphomas
6 comparisons (4 conditions: GCB, BL, FL, DLBCL)126 samples (5 GCB, 18 BL, 46 FL, 47 DLBCL)
25
Outlier detection
Dendrogram for BL samples
26
Outlier detection
Distance distribution for BL samples
27
Outlier detection
BL_4166151
DLBCL_4181460
DLBCL_4193638
FL_4199996
Outlier
BL 4166151Age 74 (median ≈ 10)
DLBCL 4193638Also an outlier in terms of expression
FL 4199996 and DLBCL 4181460not conspicuous in terms ofmethylation or expression
DLBCL 4176133DLBCL 4181460DLBCL 4184094DLBCL 4188398DLBCL 4188879DLBCL 4189035DLBCL 4193638DLBCL 4199714
DLBCL 4166706
FL/DLBCL 4120403
FL/DLBCL 4110120FL/DLBCL 4111337FL/DLBCL 4113211FL/DLBCL 4113971
normal BL L FL DLBCL others
Expression correlation
28
Subgroup detection
Dendrogram for GCB and BL samples
29
Differential Alternative Splicing - a statistical overview
6 comparisons (4 conditions: GCB, BL, FL, DLBCL)126 samples (5 GCB, 18 BL, 46 FL, 47 DLBCL)442,738 supported splice junctions18,700 supported genes (16,208 protein coding, 2,492 lincRNA)Significant if |abundance change| ≥ 1 and q-value < 0.01
30
Differential Alternative Splicing - a statistical overview
6 comparisons (4 conditions: GCB, BL, FL, DLBCL)126 samples (5 GCB, 18 BL, 46 FL, 47 DLBCL)442,738 supported splice junctions18,700 supported genes (16,208 protein coding, 2,492 lincRNA)Significant if |abundance change| ≥ 1 and q-value < 0.01
GCB-BL GCB-FL GCB-DLBCL BL-FL BL-DLBCL FL-DLBCLsplice junctions
tested 100,315 106,890 107,498 126,667 127,919 138,212sig. 1,012 1,629 1,499 1,089 610 132percent 1.01 1.52 1.39 0.86 0.48 0.10
genes
tested 8,229 8,419 8,450 10,304 10,426 11,418sig. 756 1,077 1,016 738 424 108percent 9.19 12.79 12.02 7.16 4.07 0.95
31
Differential Alternative Splicing - a statistical overview
6 comparisons (4 conditions: GCB, BL, FL, DLBCL)126 samples (5 GCB, 18 BL, 46 FL, 47 DLBCL)442,738 supported splice junctions18,700 supported genes (16,208 protein coding, 2,492 lincRNA)Significant if |abundance change| ≥ 1 and q-value < 0.01
GCB-BL GCB-FL GCB-DLBCL BL-FL BL-DLBCL FL-DLBCLsplice junctions
tested 100,315 106,890 107,498 126,667 127,919 138,212sig. 1,012 1,629 1,499 1,089 610 132percent 1.01 1.52 1.39 0.86 0.48 0.10
genes
tested 8,229 8,419 8,450 10,304 10,426 11,418sig. 756 1,077 1,016 738 424 108percent 9.19 12.79 12.02 7.16 4.07 0.95
Randomizing group info ⇒ no significant results
32
Lymphoma common genes
Overlap of sig. genes
Network of overlap
33
Lymphoma common genes
High protein expression in lymph nodes and other immune relatedtissues
500 bases hg1917,320,500 17,321,000 17,321,500 17,322,000
MYO9BMYO9BMYO9B
con.expr
7.21233 _
0.00233333 _
BL.expr
3.94875 _
0.0043125 _
FL.expr
5.00253 _
0.00473333 _
MYO9B
34
Lymphoma common genes
Plays a role in the invasiveness of cancer cells, and the formation ofmetastases
2 kb hg1970,266,000 70,266,500 70,267,000 70,267,500 70,268,000 70,268,500 70,269,000 70,269,500 70,270,000
CTTNCTTNCTTN
CTTN
con.expr
0.786333 _
0.00233333 _
BL.expr
0.707438 _
0.000375 _
CTTN
35
Enriched pathways (GCB vs BL)
B-cell receptor Apoptosis
36
Apoptosis pathway (GCB vs. BL)
2 kb hg19202,145,000 202,150,000
CASP8CASP8CASP8CASP8CASP8CASP8CASP8CASP8
con.expr
2.48733 _
0.000333333 _
BL.expr
1.48063 _
0.0009375 _
CASP8
500 bases hg1944,166,700 44,166,800 44,166,900 44,167,000 44,167,100 44,167,200 44,167,300 44,167,400 44,167,500 44,167,600 44,167,700 44,167,800
IRAK4IRAK4IRAK4IRAK4
con.expr
1.954 _
0.00233333 _
BL.expr
1.12556 _
0.00425 _
IRAK4
37
Validation by qPCR
PRDM10
38
Validation by qPCR
PRDM10
39
Connection between diff. alt. splicing and diff. expression
0
300
600
900
BL_DLBCL BL_FL FL_DLBCL GCB_BL GCB_DLBCL GCB_FL
amou
nt o
f sig
nific
ant g
enes
legend
not differentially expressed
differentially expressed
Intersection with sig. genes based on DESeq
40
Connection between diff. alt. splicing and RNA editing
Calling sig. differentially RNA editing sites between conditions (withMethylene)Intersected with both splice sites of all sig. diff. alt. splice junctions
GCB vs. BL
GCB vs. FL
Txn Factor ChIP
500 bases hg1917,690,400 17,690,500 17,690,600 17,690,700 17,690,800 17,690,900 17,691,000 17,691,100 17,691,200 17,691,300 17,691,400 17,691,500 17,691,600 17,691,700
COLGALT1
BL.expr
3.53906 _
0.006125 _
con.expr
2.15367 _
0.00233333 _
FL.expr
4.0762 _
0.00666667 _
RNA editing at position chr19:17,691,052
41
Connection between diff. alt. splicing and DMRs
Building 2 x 2 contingency tables of gene countsin regard to diff. alt. splicing and DMR overlap.Odds ratios and p-values from Fisher’s exact test:
GCB-BL GCB-FL BL-FLodds ratio 1.3 1.48 1.84p-value 0.00075 6.273e-09 1.719e-15genes yes yes 336 449 400
42
Abundance changes
−2
0
2
0 20 40 60
log2
abundance
chan
ge
junctions per gene
−2 −1 0 1 2log2 abundance change
dens
ity
−4 −2 0 2log2 abundance change
junction sets:
BL_vs_FL
GCB_vs_FL
GCB_vs_DLBCL
BL_vs_DLBCL
GCB_vs_BL
FL_vs_DLBCL
Abundance change as a measure for biological relevance
43
Junctions per gene
●●
●
●
●●
●●●●●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●●●●●
●
●
●●
●●
●
●
●
●●●
●
●
●
●●
●
●
●●●●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●
●
●
●●●
●
●●
●
●
●
●
●●●●●
●●
●
●●
●
●
●
●●●●●
●
●●
●
●
●
●●●
●●
●
●
●
●
●●●●
●
●
●
●
●
●●
●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●●
●
●●
●
●●
●
●
●●
●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●●●
●
●
●
●
●●
●
●●●
●●
●
●
●●●
●
●●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●●
●●●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●●
●
●●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●●
●●
●●●
●
●
●
●●●●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●
●●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●●
●●
●
●●●
●●●●●
●
●
●●
●●●●
●
●●●
●●
●
●
●
●●●●●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●●
●
●●
●●
●
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●●●●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●●
●
●●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●●●
●
●
●●●●●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●●●●●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●●
●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●●
●●
●●●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●●
●
●
●●
●●●●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●●
●●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●●●
●●
●●●●
●●
●
●●
●
●●●
●
●●●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●
●
●●●●
●
●●
●●
●
●●
●●
●●
●
●●●●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●●●●
●
●
●●
●●
●
●
●
●●
●●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●●
●
●
●●
●●
●
●●
●●
●
●
●●
●
●
●
●●●●
●●
●●
●
●
●
●
●
●
●●●●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●●●
●●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●●●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●●●
●●
●●
●
●
●●
●
●
●
●
●●
●●●●
●
●
●●●
●
●
●
●
●
●
●●●
●●
●
●●
●●
●
●
●●●
●
●●
●●
●●
●
●
●
●●●
●
●
●
●
●
●●●●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●●●●
●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●●●●●●
●●●●
●●
●●●●
BL vs. DLBCL BL vs. FL FL vs. DLBCL
GCB vs. BL GCB vs. DLBCL GCB vs. FL0
20
40
60
0
20
40
60
not significant significant not significant significant not significant significant
junc
tions
per
gen
e
Natural bias towards genes with more junctions
44
Zero-replacement portion
legend
genes
junctions with zero-replacement
junctions without zero-replacement
BL_DLBCL BL_FL FL_DLBCL
GCB_BL GCB_DLBCL GCB_FL
0
500
1000
1500
0
500
1000
1500
genes junctions genes junctions genes junctions
amou
nt o
f sig
nific
ant i
nsta
nces
BL_DLBCL BL_FL FL_DLBCL
GCB_BL GCB_DLBCL GCB_FL
0e+00
5e+04
1e+05
0e+00
5e+04
1e+05
genes junctions genes junctions genes junctions
amou
nt o
f tes
ted
inst
ance
s
Tested instances vs. significant instances45
Summary
Simple and robust method
Based only on direct splice evidence
Not restricted to current annotations
Fast core algorithm (3h for 419 samples)
Plausible results
46
Thanks to
Peter F. Stadler
Steve Hoffmann
Stephan Bernhart
Helene Kretzmer
Reiner Siebert
Rabea Wagener
Hoffmann Group
47
Thank you for your attention!
Questions?!
48
Thank you for your attention!
Questions?!
49
Thank you for your attention!
Questions?!
50