TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk...
-
Upload
anabel-boyd -
Category
Documents
-
view
213 -
download
0
Transcript of TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk...
TORQUE: TOPOLOGY-FREE QUERYING OF PROTEIN INTERACTION NETWORKS
Sharon Bruckner1, Falk Hüffner1 , Richard M. Karp2, Ron Shamir1, and Roded Sharan1
1 School of computer science, Tel Aviv University2 Int. Computer Science Institute, Berkley, CA
OUR GOAL: NETWORK QUERYING Start with a protein-protein interaction network of
some species A. We seek subnetworks that match complexes or
pathways.
Network Querying: Given a protein complex from another species B, identify the subnetwork of A that is most similar to it.
Why network querying? Match hints at an evolutionary conserved region Infer the functionality of the matched region.
Previous Methods Assume knowledge of the interactions within
the query complex (the topology). Look for a match in the network with the same topology. Examples: Qnet (Dost et al, 2008), GraphFind (Ferro et al,
2008).
??
?
NO NEED FOR TOPOLOGY!
Interaction information is noisy and incomplete, and for some species – not available.
THE PROBLEM
Input: Graph G=(V,E) , |V|
=n, |E|=m
Color set {1,2,...,k}
A coloring of network vertices
THE PROBLEM
We seek:Is there are connectedsubgraph of G that
has exactly one vertex of each color?
Call such a subgraph “colorful”
ABOUT THE PROBLEM NP-complete
Hard even when the graph is a tree with max degree 3 (via reduction from 3SAT (Fellows et al, 2007)
Our Contributions: A fixed parameter dynamic
programming algorithm. Integer Linear Program Fast heuristics Implementation using a combination of
the above.
DEFINING THE BASIC DP ALGORITHM
Input: A graph where each vertex is colored by one of k colors.Output: Find a colorful tree
Every connected subgraph has a spanning tree
Every colorful connected subgraph will have a colorful spanning tree
Instead of looking for a colorful subgraph, look for a colorful tree
Input: A graph where each vertex is colored by one of k colors.Output: Find the highest scoring colorful tree
DYNAMIC PROGRAMMING ALGORITHM (FELLOWS
ET AL, 2008)
Row for each vertex Column for each subset of
colors, in increasing size.
S1 S2 S3 S4
v1 0 0 None 3.4
v2 0 None 2.3 2
v3 None 0 3.15 None
v4 None None 13.5 7.42
v5 0 0 6.4 8.1
vertices
Score of best tree Rooted in v3 that Is colored exactlyBy S3
IDEA: Instead of looking at all nk possible subgraphs, look only at all 2k color sets
DYNAMIC PROGRAMMING ALGORITHM
The last column contains, for every vertex v, the highest scoring tree rooted in v colored by all the colors of the query!
Running time: O(3k|E|).
1 2
1 2, , , ,u N vS S S
T v S MAX T v S T u S w u v
EXAMPLE
vv
uu
T(v, { } )
ww
v
u
1 2
1 2, , , ,u N vS S S
T v S MAX T v S T u S w u v
EXTENSION 1: ALLOWING DELETIONS – MATCHING WITH LESS COLORS
?
EXTENSION 2: ALLOWING INSERTIONS: SPECIAL NON-COLORED VERTICES,ARBITRARY VERTICES
ALLOWING NON-COLORED INSERTIONS
For j insertions, we would expect running time: O(3k+jm).
Can show: O(3kmj). Make j copies of each column, and
recursively solve:
B(v, S, j’) = Highest score of a tree, rooted in v, colored by S, using exactly j’ insertions
FORMULA & EXAMPLE
1 2
1 2
1 2 21
' , , ' 0
, , , , , , ,u N vSj j jS S
j j B v S j
B v S MAX B v S B u S w u v otherw sj j j i e
a
d
b
c
f
g
e
Running Time: O(3km*j)
Extension 3: ALLOWING MULTIPLE COLORS PER VERTEX
?
PUTTING IT TOGETHER…
3
3
1.25
0.82
3.14
8
2.34
6.6
1.25
4.57
2.25
4.8
3.9 0.25
0.3
A SECOND APPROACH
Formulate the problem as an integer linear program (ILP).
Use efficient ILP solvers.
ILP at a glance
Want: Subset T of the vertices Formulate colorfulness
Only vertices in T are colored. Every vertex should get at most one color Every color should be given to at most one
vertex Formulate connectivity
Find a flow such that: Only vertices in T can be involved in the flow. Flow of k-1, single sink, k-1 sources Every source has connection to the sink via flow
edges.
The Integer Linear Program
Heuristic Speedups
First do data reduction only 5% of the vertices are associated with one
or more query colors many non-colored vertices are too far from any
colored vertex to be useful For each remaining connected component:
Try a shortest-paths based heuristic that does not allow mismatches.
If this fails: If few colors, but large instance, use dynamic
programming Otherwise, use ILP
IMPLEMENTATION, EXPERIMENTS & RESULTS
Experiments
We applied our method to query complexes within: yeast (5430 proteins, 39936 interactions), fly (6650 proteins, 21275 interactions) human (7915 proteins, 28972 interactions).
Queries: yeast, fly, human bovine, mouse, and rat.
COMPARISON WITH OTHER METHODS Most previous work tested queries with a
known topology.
? We compare our results with those of Qnet (Dost
et al, 2008), designed to tackle topology-based queries.
QNet uses color coding to tackle the subgraph homemorphism problem, allowing insertions and deletions.
Comparison with QNet
Results Evaluation
Functional coherence Used GO TermFinder for functional enrichment in
T. Specificity
Looked at overlap between T and known complexes in the target species.
Compared to overlap between random subgraphs and the known complexes.
Corrected for multiple testing using FDR (q<0.05).
Quality match: Functionally coherent and specific.
SELECTED RESULTS
Thanks: Nir Yosef, the TAU Computational Genomics group , and the Computational System Biology group.
Israel Science Foundation, Edmond J. Safra Bioinformatics Program, Tel Aviv Univ.
The PPI network querying problem motivates the colorful connected subgraph problem. A fixed parameter dynamic programming algorithm, allowing insertions, deletions, and multiple colors per vertex, along with an ILP formulation and heuristics, obtains good results.
SUMMARY
REFERENCES [FFHV07] M. R. Fellows, G. Fertin, D. Hermelin, and S. Vialette.
Borderlines for finding connected motifs in vertex-colored graphs. In Proc. ICALP’07, volume 4596, pages 340–351. Springer-Verlag, 2007.
[N06] R. Niedermeier. Invitation to Fixed-Parameter Algorithms. Number 31 in Oxford Lecture Series in Mathematics and Its Applications. Oxford University Press, 2006.
[BFKN08] N. Betzler, M. R. Fellows, C. Komusiewicz, and R. Niedermeier. Parameterized algorithms and hardness results for some graph motif problems. In Proc. 19th CPM, volume 5029 of LNCS, pages 31{43. Springer, 2008.
[AYZ95] N. Alon, R. Yuster, and U. Zwick. Color coding. Journal of the ACM, 42: 844{856, 1995}.
[DSGRBS08] B. Dost, T. Shlomi, N. Gupta, E. Ruppin, V. Bafna, and R.Sharan. Qnet: A tool for querying protein interaction networks. Journal of Computational Biology, 15(7):913-925, 2008.