Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University
description
Transcript of Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University
![Page 1: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/1.jpg)
Inferring strengths of protein-protein interactions from experimental data using linear programming
Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu
Bioinformatics Center,Kyoto University
![Page 2: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/2.jpg)
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
![Page 3: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/3.jpg)
Background (1/3) Understanding protein-protein
interactions is useful for understanding of protein functions. Transcription factors
Proteins interact with a factor. Regulate the gene.
Receptors, etc.
![Page 4: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/4.jpg)
Background (2/3) Various methods were developed for inf
erence of protein-protein interactions Gene fusion/Rosetta stone (Enright et al. a
nd Marcotte et al. 1999) Number of possible genes to be applied is limit
ed. Molecular dynamics
Long CPU time Difficult to predict precisely
![Page 5: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/5.jpg)
Background (3/3) A Model based on domain-domain
interactions has been proposed. Use domains defined by databases
like InterPro or Pfam.
Domain
Domain
![Page 6: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/6.jpg)
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
![Page 7: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/7.jpg)
Probabilistic model of interaction (1/2) Model (Deng et al., 2002)
Two proteins interact. At least one pair of domains
interacts. Interactions between domains are
independent events.D1
D2
D3
D2 D4
P2P1
![Page 8: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/8.jpg)
: Proteins Pi and Pj interact : Domains Dm and Dn interact : Domain pair (Dm ,Dn) is include
d in protein pair PiX Pj
Probabilistic model of interaction (2/2)
![Page 9: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/9.jpg)
Overview Background Probabilistic model Related work
Association method (Sprinzak et al., 2001) EM method (Deng et al., 2002)
Biological experimental data Proposed methods Results of computational experiments Conclusion
![Page 10: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/10.jpg)
Related work INPUT:
interacting protein pairs (positive examples) non-interacting protein pairs (negative example
s) OUTPUT: Pr(Dmn=1) for all domain pairs
![Page 11: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/11.jpg)
Association method (Sprinzak et al., 2001) Inference of probabilities of
domain-domain interactions using ratios of frequencies
: Number of interacting protein pairs that include (Dm, Dn)
: Number of protein pairs that include (Dm, Dn)
![Page 12: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/12.jpg)
EM method (Deng et al.,2002) Probability (likelihood L) that experiment
al data {Oij={0,1}} are observed.
Use EM algorithm in order to (locally) maximize L.
Estimate Pr(Dmn=1)
![Page 13: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/13.jpg)
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
![Page 14: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/14.jpg)
Biological experimental data Related methods (Association and EM) use o
nly binary data (interact or not). Experimental data using Yeast 2 hybrid
Ito et al. (2000, 2001) Uetz et al. (2001)
For many protein pairs, different results (Oij = {0,1}) were observed.
We developed new methods using raw numerical data.
![Page 15: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/15.jpg)
Numerical data Ito et al. (2000,2001)
For each protein pair, experiments were performed multiple times.
IST (Interaction Sequence Tag) Number of observed interactions By using a threshold, we obtain binary
data.
![Page 16: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/16.jpg)
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
![Page 17: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/17.jpg)
Proposed methods It seems difficult
to modify EM method for numerical data.
Linear Programming
For binary data LPBN Combined methods
LPEM EMLP
SVM-based method For numerical data
ASNM LPNM
![Page 18: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/18.jpg)
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
![Page 19: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/19.jpg)
LPBN (LP-based method)(1/2) Transformation into linear
inequalities Pi and Pj interact
![Page 20: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/20.jpg)
LPBN (LP-based method)(2/2) Linear programming for inference
of protein-protein interactions
![Page 21: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/21.jpg)
Combination of EM and LPBN LPEM method
Use the results of LPBN as initial parameter values for EM.
EMLP method Constrains to LPBN with the
following inequalities so that LP solutions are close to EM solutions.
![Page 22: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/22.jpg)
Simple SVM-based method Feature vector
Simple linear kernel with Interacting pairs = Positive examples Non-interacting pairs = Negative
examples
![Page 23: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/23.jpg)
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
![Page 24: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/24.jpg)
Strength of protein-protein interaction For each protein pair, experiments were
performed multiple times. The ratio can be considered as streng
th.
Kij : Number of observed interactions for a protein pair (Pi,Pj)
Mij : Number of experiments for (Pi,Pj)
![Page 25: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/25.jpg)
LPNM method (1/2) Minimize the gap between Pr(Pij=1)
and using LP.
![Page 26: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/26.jpg)
LPNM method (2/2) Linear programming for inference
of strengths of protein-protein interactions
![Page 27: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/27.jpg)
ASNM Modified Association method for numeri
cal data
For binary data (Sprinzak et al., 2001)
![Page 28: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/28.jpg)
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
![Page 29: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/29.jpg)
Computational experimentsfor binary data DIP database (Xenarios et al., 2002)
1767 protein pairs as positive 2/3 of the pairs for training, 1/3 for test
Computational environment Xeon processor 2.8 GHz LP solver: loqo
![Page 30: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/30.jpg)
Results on training data (binary data)
SVM
EM
LPBN
Association
![Page 31: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/31.jpg)
Results on test data (binary data)
SVM
EMEML
P
Association
LPEM
![Page 32: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/32.jpg)
Computational experimentsfor numerical data YIP database (Ito et al., 2001, 2002)
IST (Interaction Sequence Tag) 1586 protein pairs 4/5 for training, 1/5 for test
Computational environment Xeon processor 2.8 GHz LP solver: lp_solve
![Page 33: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/33.jpg)
Results on test data (numerical data)
ASNMEMLPN
MAssociation
![Page 34: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/34.jpg)
Results on test data (numerical data)
LPNM is the best. EM and Association methods
classify Pr(Pij=1) into either 0 or 1.
LPNM ASNM
EM ASSOC
Ave. Error 0.0308 0.0405 0.295 0.277
CPU (sec.) 1.20 0.0077 1.62 0.0088
![Page 35: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/35.jpg)
Conclusion We have defined a new problem to infer
strengths of protein-protein interactions.
We have proposed LP-based methods. For binary data
LPBN, LPEM, EMLP SVM-based method
For numerical data ASNM LPNM LPNM outperformed the other methods.
![Page 36: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University](https://reader036.fdocuments.in/reader036/viewer/2022062521/56814c47550346895db94c88/html5/thumbnails/36.jpg)
Future work Improve the methods to avoid overfittin
g. Improve the probabilistic model to under
stand protein-protein interactions more accurately.