Erik Sonnhammer
Stockholm Bioinformatics CentreScience for Life Laboratory
Dept. Biochemistry and BiophysicsStockholm University
Comparative Interactomicswith FunCoup 2.0
How to map the human interactome?
• Genes: ~22000• Interactions: 100000-300000?• Known direct interactions:
~74000 (Intact)
• Experiments have high false negative and false positive rates.
• → Most interactions needto be inferred combinatorially
FunCoup:FunCoup:
Predicting Predicting
FunFunctional ctional CoupCoupling Between Genes/Proteins ling Between Genes/Proteins
Using Genomics Data and OrthologyUsing Genomics Data and Orthology
• Alexeyenko et al., NAR 40:D821 (2012)
• Alexeyenko & Sonnhammer, Genome Research 19:1107 (2009)
FunCoup Protein-protein interactions
Co-expression patterns Phylogenetic
profilesDomain interactions
Shared transcription factor binding
Other Organisms
OrthologyShared miRNAtargeting
Subcellularco-localisation
Genetic interactions
Naïve Bayesian training
Continuous variable
Discrete categories
Extract links
Test against positive and ”negative” reference datasets
Calculate enrichment as likelihood ratio = P(+) / P(-)
1 204
+
-
+
-
+
-
-1.0 1.0
0.6 1.0
FunCoup prediction of 1 linkRaw data
Bayesian LLR score
Raw data
Bayesian LLR score
Raw data
Bayesian LLR score
Raw data
Bayesian LLR score
Raw data
Bayesian LLR score
Sum of LLR scores
Confidence valuepfc
Naïve Bayesian training• Training:
– Learn log likelihood ratios (LLRs) for each individual evidence bin– When predicting, sum all the LLRs to a full Bayesian score (FBS).
∑=
=||
1 )()|(
log)(ε
εi ij
ij
EPFCEP
FBS
FC Functional coupling
ε Set of evidencesEij Evidence i, bin j
4 training datasets → 4 different types of functional coupling
• Metabolic pathway(KEGG)
• Signalling pathway(KEGG)
• Physical protein-protein interaction
• Complex member
FunCoup training
Human
Mouse
Rat
Fly
Worm
Yeast
Plant
MEXMIR
SCLPPI
PEXPHP
TFBDOM
10 7
10 5
10 3
INPUT DATA
HumanMouse
Rat
Fly
Worm
Yeast
Plant
FC-PIFC-CM
FC-MLFC-SL
5000
10000
15000
20000
25000
TRAINING SETS
BAYESIAN FRAMEWORK
ƒx, ƒy, ƒz, …
×
ΣSL =0+0-0.6+1.2-0.4+0.2+1.2+6.8+1.4=7.9ΣSL =0+0-0.6+1.2-0.4+0.2+1.2+6.8+1.4=5.8
ΣSL =0+0-0.6+1.2-0.4+0.2+1.2+6.8+1.4=5.5
FC-SL modelFC-ML model
ΣSL =0+0-0.6+1.2-0.4+0.2+1.2+6.8+1.4=5.8ΣSL =0+0-0.6+1.2-0.4+0.2+1.2+6.8+1.4=7.9
Raw data metrics on CDC2 – KPNB1Fly MEX (Li and White, 2003) PLC=0.42Rat MEX (Di Giovanni et al., 2004) PLC=0.48Mouse SLC (UniProt, ESLDB) WMI=0.04Mouse MEX (Zapala et al., 2005) PLC=0.70Mouse MEX (Su et al., 2004) PLC= -0.01Mouse MEX (Siddiqui et al., 2005) PLC=0.56Mouse MEX (Hutton et al., 2004) PLC=0.61Human PPI (IntAct, HPRD, BIND) PPI score=0.17Human MEX (Su et al., 2004) PLC=0.60…
FC-PI modelFBSPI = 0+0-0.6+1.2-0.4+0.2+1.2+6.3+1.4…= 11.2
FC-CM model
FC-SL modelFC-ML model
FC-PI modelFBSPI = 0+0-0.6+1.2-0.4+0.2+1.2+6.3+1.4…= 11.2
FC-CM model
(pfc scores)
FBS score and pfc confidence
∏∏
∏
==
=
+= ||
1
||
1
||
1
)()|()(
)|()()( εε
ε
ε
iij
iij
iij
EPFCEPFCP
FCEPFCPpfc
∑=
=||
1 )()|(
log)(ε
εi ij
ij
EPFCEP
FBSFC Functional coupling
ε Set of evidencesEij Evidence i, bin j
The total human FunCoup 2.0 network
0500,000
1,000,0001,500,0002,000,0002,500,0003,000,0003,500,0004,000,0004,500,0005,000,000
Nr of links
0.1 0.25 0.75Confidence cutoff
Nr of links at pfc cutoffs
0
2000000
4000000
6000000
8000000
10000000
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
pfc cutoff
# lin
ks
H. sapiens
M. musculus
R. norvegicus
C. familiaris
D. rerio
C. intestinalis
D. melanogaster
C. elegans
G. gallus
A. thaliana
Comparison to STRING
• FunCoup on average 75% larger (based on all links)
A. thalianaC. elegans
C. familiarisC. intestinalis
D. melanogasterD. rerio
G. gallusH. sapiens
M. musculusR. norvegicus
S. cerevisiae
0
1000000
2000000
3000000
4000000
5000000
FunCoup 2.0STRING 9.0
Support from species and evidence type
MEX: mRNA co-expression
PHP: phylogenetic profile similarity
PPI: protein–protein interaction
SCL: sub-cellular co-localization
MIR: co-miRNA regulation by shared miRNA targeting
DOM: domain interactions
PEX: protein co-expression
TFB: shared transcription factor binding
GIN: genetic interaction profile similarity
Validation: Recovering cancer pathways
• 36 signalling links in RTK/RAS/PI(3)K, p53, and RB signalling pathways (TCGARN, Science 2008).
• FunCoup predicted 29 of 36 links.
• 25 more links found.
Independent validation:Recovering tumour mutation sets
• Lists of genes co-mutated in glioblastoma tumours (The Cancer Genome Atlas).
• 6 of 9 lists (>= 10 genes) enriched (p<10-3) with internal FunCoup connections compared to random networks (preserving degree distribution).
FunCoup
Cross-talk between groups
Find novel interactions
Find network modules
Extend pathways
Find novel disease genes
FunCoup applications
http://FunCoup.sbc.su.se
ASPM - Abnormal spindle-like microcephaly-associated protein
ASPM
Data details
Klammer M, Roopra S, Sonnhammer EL. ”jSquid: a Java applet for graphical on-line network exploration” Bioinformatics 2008, 24:1467
Comparative interactomics
New in FunCoup 2.0 – ensures true conservation
Human presenilin in worm
RNA-polymerase II subunits: yeast-all
Comparative interactomicsApplications
• Hypothesis testing– Is a given pathway/complex conserved in another species?
• New discoveries– Finding ortholog pairs with conserved functional coupling – very
strong evidence for functional conservation– Can also find conservation that is not strictly 4-way: