Fast D irection- A ware P roximity for Graph Mining
description
Transcript of Fast D irection- A ware P roximity for Graph Mining
![Page 1: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/1.jpg)
Fast Direction-Aware Proximity for Graph Mining
KDD 2007, San JoseHanghang Tong, Yehuda Koren,
Christos Faloutsos
![Page 2: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/2.jpg)
2
Defining Direction-Aware Proximity (DAP): escape probability
• Define Random Walk (RW) on the graph• Esc_Prob(AB)– Prob (starting at A, reaches B before returning to A)
Esc_Prob = Pr (smile before cry)
A Bthe remaining graph
![Page 3: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/3.jpg)
3
Esc_Prob(1->5) =
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
P=
I - +
-1
1 5
3
2
6
4
0.5 0.5
0.5
0.50.5
0.5
0.5
1
0.5 1
P: Transition matrix (row norm.)
![Page 4: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/4.jpg)
Intuition of Formula
1 2 3
2
,
,
1. = + + ,
2. tells the probability that start from , take two
steps to arrive at 3. gives the stationary distribution.4. tells the probability we started from and
i j
i j
Q I P I P P P
P i
jQQ i
ended with .j
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
P*P=
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
![Page 5: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/5.jpg)
5
Esc_Prob(1->5) =
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
P=
I - +
-1
1 5
3
2
6
4
0.5 0.5
0.5
0.50.5
0.5
0.5
1
0.5 1
P: Transition matrix (row norm.)
![Page 6: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/6.jpg)
6
• Case 1, Medium Size Graph– Matrix inversion is feasible, but…– What if we want many proximities?– Q: How to get all (n ) proximities efficiently?– A: FastAllDAP!
• Case 2: Large Size Graph – Matrix inversion is infeasible– Q: How to get one proximity efficiently?– A: FastOneDAP!
Challenges
2
![Page 7: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/7.jpg)
7
FastAllDAP
• Q1: How to efficiently compute all possible proximities on a medium size graph?– a.k.a. how to efficiently solve multiple linear
systems simultaneously?• Goal: reduce # of matrix inversions!
![Page 8: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/8.jpg)
8
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
FastAllDAP: Observation
1 5
3
2
6
4
0.5 0.5
0.5
0.50.5
0.5
0.5
1
0.5 1
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
Need two different matrix inversions!
P=
P=
![Page 9: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/9.jpg)
9
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
FastAllDAP: Rescue
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
Redundancy among different linear systems!
P=
P=
Overlap between two gray parts!
Prox(1 5)
Prox(1 6)
![Page 10: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/10.jpg)
10
FastAllDAP: Theorem
• Theorem:
• Proof: by SM Lemma
• Example:
![Page 11: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/11.jpg)
11
FastAllDAP: Algorithm• Alg.– Compute Q– For i,j =1,…, n, compute
• Computational Save O(1) instead of O(n )!
• Example– w/ 1000 nodes, – 1m matrix inversion vs. 1 matrix!
2
![Page 12: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/12.jpg)
12
FastOneDAP
• Q1: How to efficiently compute one single proximity on a large size graph?– a.k.a. how to solve one linear system
efficiently?• Goal: avoid matrix inversion!
![Page 13: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/13.jpg)
13
FastOneDAP: Observation
1 5
3
2
6
4
0.5 0.5
0.5
0.50.5
0.5
0.5
1
0.5 1
Partial Info. (4 elements /2 cols ) of Q is enough!
![Page 14: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/14.jpg)
14
FastOneDAP: Observation
• Q: How to compute one column of Q?• A: Taylor expansion
Reminder:
i col of Qth
[0, …0, 1, 0, …, 0]T
![Page 15: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/15.jpg)
15
FastOneDAP: Observation
x x x
Sparse matrix-vector multiplications!
….
i col of Qth[0, …0, 1, 0, …, 0]
T
![Page 16: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/16.jpg)
16
FastOneDAP: Iterative Alg.
• Alg. to estimate i Col of Qth
![Page 17: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/17.jpg)
17
FastOneDAP: Property• Convergence Guaranteed !
• Computational Save– Example: • 100K nodes and 1M edges (50 Iterations)• 10,000,000x fast!
• Footnote: 1 col is enough! – (details in paper)
![Page 18: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/18.jpg)
18
Esc_Prob is good, but…
• Issue #1: – `Degree-1 node’ effect
• Issue #2:–Weakly connected pair
Need some practical modifications!
![Page 19: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/19.jpg)
19
Issue#1: `degree-1 node’ effect[Faloutsos+] [Koren+]
• no influence for degree-1 nodes (E, F)!– known as ‘pizza delivery guy’ problem in undirected graph
• Solutions: Universal Absorbing Boundary!
A BD1 1
A BD1 1/3
E F
1/31/311
Esc_Prob(a->b)=1
Esc_Prob(a->b)=1
![Page 20: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/20.jpg)
20
Universal Absorbing Boundary
U-A-B is a black-hole!
A BD1 1
U-A-B
Footnote: fly-out probability = 0.1
A BD0.9 0.9
U-A-B0.1
0.10.1
1
![Page 21: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/21.jpg)
21
Introducing Universal-Absorbing-Boundary
A BD0.9 0.9
U-A-B0.1
0.10.1
A BD0.9 0.3
E F
0.30.30.90.9
U-A-B
0.1
0.10.10.10.1
Prox(a->b)=0.91
Prox(a->b)=0.74
A BD1 1
A BD1 1/3
E F
1/31/311
Footnote: fly-out probability = 0.1
Esc_Prob(a->b)=1
Esc_Prob(a->b)=1
![Page 22: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/22.jpg)
22
Issue#2: Weakly connected pair
A B1 1 1
wi j
Prox(AB) = Prox (BA)=0
Solution: Partial symmetry!
a w
i j(1-a) w
.
.
![Page 23: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/23.jpg)
23
Practical Modifications: Partial Symmetry
A B1 1 1
Prox(AB) = Prox (BA)=0
A B0.9 0.9 0.9
0.1 0.1 0.1
Prox(AB) =0.081 > Prox (BA)=0.009
![Page 24: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/24.jpg)
24
Efficiency: FastAllDAP
Size of Graph
Time (sec)Straight-Solver
FastAllDAP
1,000xfaster!
![Page 25: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/25.jpg)
25
Efficiency: FastOneDAP
Size of Graph
Time (sec)
FastOneDAP
Straight-Solver
1,0000xfaster!
![Page 26: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/26.jpg)
27
Link Prediction: direction
• Q: Given the existence of the link, what is the direction of the link?
• A: Compare prox(ij) and prox(ji)>70%
Prox (ij) - Prox (ji)
density
![Page 27: Fast D irection- A ware P roximity for Graph Mining](https://reader035.fdocuments.in/reader035/viewer/2022062811/56815fe8550346895dceebae/html5/thumbnails/27.jpg)