Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov...
Transcript of Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov...
Integrating protein-protein interaction data: navigating the maze
Shoshana J. Wodak
VIB Structural Biology Research Center, VUB, Brussels Belgium
Omics data integration, Gent Nov 19, 2018
Genome-scale protein interaction (PPI) networks: an embarrassment of riches
Hairy monster: Typical PPI network Yeast, Human, Fly..
Over 30 PPI networks derived from experiments (yeasts, human, E.coli, D. melanogaster, C. elegans, P. falsiparum and more..)
25 PPI networks (and counting) inferred by computational methods
In the last 15 years
Predict protein function
Model evolutionary processes
Predict disease associations
Interpret information on mutations
Interpret information phenotype perturbations
Use as restraints in mutliscale modeling
Build 3D models
No information on stoichiometry, limited or absent temporal spatial and functional information… MUST MAKE MEANINGFUL USE OF THE DATA
Interactions explain everything, do they ?
ka A + B < -‐-‐-‐-‐-‐ > C kd
[A] [B] Kd = kd/ka = -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ [C]
Kd !Equilibrium dissociation constant (molar units)!∆Gd = -RT ln Kd /c° !Gibbs free energy of dissociation!
!(RT thermal energy, standard state c°=1M)!
Kd and ∆Gd quantify the binding affinity"!Their values determine whether the complex is formed given the component concentrations.!
The dynamics and time scales are governed by the rate constants ka (bimolecular) and kd (monomolecular):!
• it takes τa = 1/ka[A] to form a complex ([A] in excess)!
• the complex has a life-time τd = 1/kd!Adapted from J. Janin, 2014
Binding affinities and rates
Genome-wide studies answer by YES / NO the question Do proteins A and B form a complex ?
Yet PPI are dynamic and subject to the law of mass action !
Measurable range Kd 1M 1mM 1µM 1nM 1pM
τd <microsecond millisecond !second hour days
random short-lived transient stable permanent
Type of cell adhesion
assembly redox complexes antigen-antibody crystal enzyme-substrate enzyme-inhibitor
packing signal transduction
! ! ! !weak dimers
oligomeric proteins !
non-specific specific
The functional role of a PPI depends on Kd and the life-time τd = 1/kd
PPI in the cell: wide range of binding affinities & life-times
Adapted from J. Janin, 2014
Experimentally derived genome-scale PPI datasets: prominent examples
Y A
D
B
C E
LC/MS
AP/MS
Co-‐frac;ona;on/MS + massive data integra;on
Y2H
Split Ubiqui;n ( Membrane Y2H)
PCA
nucleus
Cytosol Cytosol
Binary A B
Co-‐complex
≤80Å
Roland et al human 2014 Y2H 4300 14000 NA Hain et al. Human 2015 AP-‐MS 5400 28500 195
Yang et al. human 2016 Y2H (248-‐if & 381) 629 1043 NA
758 268
380 310
778
120
1407
Y2H (union)
PCA (2008)
AP-MS (Babu et al., 2012)
523 360
966 858
230
355
821
Y2H (union) BioGRID HC
AP-MS (Babu et al, 2012)
2394 2264
207 42
248
21
12846
Y2H (union)
PCA (2008)
AP-MS (Babu et al., 2012)
2074 2690
3119 268
22
341
9934
Y2H (union)
BioGRID HC
AP-MS (Babu et al. 2012)
Interac;ons Proteins
Limited overlap of interaction networks from different experimental methods (yeast)
Why is the overlap so limited ?
Different methods probe complementary subspaces of the interactome: AP-MS probe mainly ‘stable’ interactions, Y2H/PCA more transient ones? Biases for proteins in different cellular processes?
Network quality and coverage vary for different methods: AP-MS have a higher rate of FP, Y2H have a higher rate of FN ?
Co-complex associations (AP-MS) ≠ Binary interactions (Y2H/PCA..)
Is there a sampling problem? If so, why? Vlasblom et al. Curr. Opin. Struct. Biol (2013) Pu et al. J. Proteomics, (2015)
The challenge of deriving the network (AP-MS)
High Confidence (HC) Co-complex Network
(~13,000 PPI) Raw co-complex data
(~700,000 PPI)
Scoring methods
A plethora of methods; HGScore, SAINT, PE, ComPASS, HART, Dice etc.
(Soluble PPI, yeast)
Y A D
B C
E
LC/MS
AP/MS
‘Quality’ assessment of PPI network (yeast membrane (2012)
Babu et al. Nature 2012
Comparison to Gold Standard PPI {GO annota;ons}
TAP-MS Y2H
Random
Correlation of mRNA expression profiles Experimental verification
by other methods
Yeast integrated PPI network, Babu et al. Nature 2012
-0.15 -0.1
-0.05 0
0.05 0.1
0.15 0.2
0.25 0.3
log 1
0(R
elat
ive
Ann
otat
ion
Freq
uenc
y)
Y2H
PCA
APMS
Log 10 ( Protein abundance)
Den
sity
Biases of different methods
Biases towards different cellular process, or in sampling co-complex association can be rationalized The bias towards high abundance proteins (PCA & AP-MS) is expected in the raw data (long history of contaminants), but not in the HC networks! It is by far the most consequential since abundant proteins are more likely to form non-specific interactions
Wodak et al., Curr. Opin Struct. Biol.. (2013)
Different PPI networks may yield different results.
PPIs of yeast soluble proteins
HC-Yeast BioGRID Network Integrate HC-Yeast HTP network
Hub End
?
Mauricio Macossay et al. SubmiJed
Over 100 databases specialize in curating information on functional and physical interactions from publications describing small scale and large-scale studies -Contain unique as well as redundant information -May focus on different areas of biology -Different coverage of the literature -Differences in cross referencing genes & proteins -Different conventions for representing interactions
How can one obtain a comprehensive view of currently known interactions?
Literature curated protein-protein interaction data
iRefWeb: consolidated PPI data
OPHID
CORUM!
InnateDB MatrixDB MPIDB iRefIndex consolidation : Ian Donaldson, UK, (VIB-Bioinfo core)
iReWeb portal: IrefWeb (URL: Wodaklab.org/irefweb)
iRefWeb (IrefIndex V13)
Interactions:
Total 509,876 Human: 222,465
Proteins: Total: 91,645 Human: 18,841
Tracks source DBs and PubMeds for each PPI Matches protein on basis of aa sequence + taxon Total of 81,132 PUBMEDs
PSICQUIC: ‘real time’ database federator
IrefIndex V15
The importance of standards data representation
PSI-MITAB 2.5 format
How consistent is the information curated by different databases?
Turinsky A. et al. Donaldson I. and Wodak SJ. Database (Oxford) 2010
Publica;on Publica;on
Measuring consistency between pairwise co-citations
Sorensen-Dice Similarity Coefficient: Size of overlap over average set size
Publication
DB2 DB1 PPI Overlaps
A-B A-C
A-B D-C
A-B A-C D-C
Protein Overlaps
A B C
D
Sppi = 1/2 Sprot = 6/7 Sets of annotated
protein-protein interactions
Analyzed 15,471 shared publications co-curated by two or more amongst 9 major public PPI DBs. When curating the same publication, on average two databases fully agree on : 42% of the interactions and 62% of the proteins Big variation of agreement levels for different organism categories
Agreement and overlap between databases
Turinsky et al., Nat. Biotech, 2011
Both proteins from same organism One protein from other organisms
Interactions curated from shared publicatio Interactions curated from shared publications
The Babel tower of organism assignment
Turinsky et al., Nat. Biotech, 2011
Inconsistencies in Recording PPIs From HTP Studies
Disagreement Between Databases: main Factors:
q Problems with mapping protein/gene ID’s, and divergent assignments of splice isoforms: ~10% of data
q Divergent assignments of organisms: ~21% of data
q Different ways of representing protein complexes: ~12% of the data q Inconsistent curation of HTP data: ~1-2% of the data
Most of these factors can be attributed to different curation policies by DBs
(Issues being addressed by PSI standards & IMEx consortium)
PPI data consumers, beware! - Not all PPI data are created equal
- Different methods probe different types of interactions (e.g. binary/co-complex)
- Double check data quality claims - Literature curated PPIs are a mixed bag, filtering needs to be applied, no global reliability scores!
Acknowledgements
Andrei Tourinsky (HSC, Toronto) Brian Turner (HSC, Toronto) Shuye Pu (HSC, Toronto) James Vlasblom (HSC, Toronto) Systems Support team (HSC, Toronto)
Andrew Emili (UoT) Jack Greenblatt (UoT) Edyta Marcon (UoT) Sadhna Phanse(UoT), Ruth Isserlin (UoT) Jonathan Olsen (UoT) Mohan Babu (UoT) Hyungwon Choi (NUS) Anne-Claude Gingras (SLRI, Toronto) Mathew E. Sowa (Harvard) Emmanuel Levy (Weizmann) Joel Janin (Orsay)
Funding Sources:
http://wodaklab.org