SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.
-
Upload
marian-chase -
Category
Documents
-
view
214 -
download
1
Transcript of SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.
SCS CMU
Proximity on Large Graphs
Speaker:
Hanghang Tong
2008-4-10 15-826 Guest Lecture
SCS CMU
2
Graphs are everywhere!
SCS CMU
4
Graph Mining: the big picture Graph/Global Level
Subgraph/Community Level
Node Level We are here!
SCS CMU
5
Proximity on Graph: What?
A BH1 1
D1 1
E
F
G1 11
I J1
1 1
a.k.a Relevance, Closeness, ‘Similarity’…
SCS CMU
6
Proximity is the main tool behind…
• Link prediction [Liben-Nowell+], [Tong+]• Ranking [Haveliwala], [Chakrabarti+]• Email Management [Minkov+]• Image caption [Pan+]• Neighborhooh Formulation [Sun+]• Conn. subgraph [Faloutsos+], [Tong+], [Koren+]• Pattern match [Tong+]• Collaborative Filtering [Fouss+]• Many more…
Will return to this later
SCS CMU
7
Roadmap
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
• Basic: RWR
• Variants
• Asymmetry of Prox.
• Group Prox
• Prox w/ Attributes
• Prox w/ Time
SCS CMU
8
Why not shortest path?
‘pizza delivery guy’ problem
A BD1 1
A BD1 1
E
F
G1 11
A BD1 1
E F1
1 1
A BD1 1
‘multi-facet’ relationship
Some ``bad’’ proximities
SCS CMU
9
A BD1 1
A BD1 11 E
Why not max. netflow?
No punishment on long paths
Some ``bad’’ proximities
SCS CMU
11
What is a ``good’’ Proximity?
A BH1 1
D1 1
E
F
G1 11
I J1
1 1
• Multiple Connections
• Quality of connection
•Direct & In-directed Conns
•Length, Degree, Weight…
…
SCS CMU
12
1
4
3
2
56
7
910
8
11
12
Random walk with restart
SCS CMU
13
Random walk with restart
Node 4
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12
0.130.100.130.220.130.050.050.080.040.030.040.02
1
4
3
2
56
7
910
811
120.13
0.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
Ranking vector More red, more relevant
Nearby nodes, higher scores
4r
SCS CMU
Why RWR is a good score?
14
2c 3cc ...
W 2W 3W
all paths from i to j with length 1
all paths from i to j with length 2
all paths from i to j with length 3
SCS CMU
15
Roadmap
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
• Basic: RWR
• Variants
• Asymmetry of Prox.
• Group Prox
• Prox w/ Attributes
• Prox w/ Time
SCS CMU
16
Variant: escape probability
• Define Random Walk (RW) on the graph• Esc_Prob(AB)
– Prob (starting at A, reaches B before returning to A)
Esc_Prob = Pr (smile before cry)
A Bthe remaining graph
SCS CMU
17
Other Variants
• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]
• Equivalence of Random Walks– Electric Networks:
• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+]
– String Systems
• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+]
SCS CMU
18
Other Variants
• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]
• Equivalence of Random Walks– Electric Networks:
• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+]
– String Systems
• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+]
All are related to, or similar to random walk with restart!
SCS CMU
19
Roadmap
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
• Basic: RWR
• Variants
• Asymmetry of Prox.
• Group Prox
• Prox w/ Attributes
• Prox w/ Time
SCS CMU
20
Asymmetry of Proximity [Tong+ KDD07 a]
A B
1 1
1
111
1 A B
1 1
1
10.51
0.5
What is Prox from A to B?What is Prox from B to A?
What is Prox between A and B?
SCS CMU
21
Asymmetry also exists in un-directed graphs
• Hanghang’s most important conf. is KDD
• The most important author in KDD is ...
So is love…
HanghangKDD
SCS CMU
22
Roadmap
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
• Basic: RWR
• Variants
• Asymmetry of Prox.
• Group Prox
• Prox w/ Attributes
• Prox w/ Time
SCS CMU
23
Group Proximity [Tong+ 2007]
• Q: How close are Accountants to SECs?
• A: Prob (starting at any RED, reaches any GREEN before touching any RED again)
Accountant
CEO
Manager
SEC
SCS CMU
24
Proximity on Attribute Graphs
What is the proximity from node 7 to 10?If we know that…
Accountant
CEO
Manager
SEC
SCS CMU
25
Sol: Augmented graphs
12
1311
9
10
5
6
7
8
1
2
3
4
SCS CMU
26
Attributes on nodes/edges (ER graph) [Chakrabarti+ WWW07]
skip
Wrote Sent Received
In-Replied-toCited
Works
SCS CMU
27
Proximity w/ Time
• Sol #1: treat time an categorical attr. [Minkov+]
• Sol #2: aggregate slice matrices [Tong+]
Time
•Global aggregation
•Slide window
•Exponential emphasis
SCS CMU
28
Summary of Part I
• Goal: Summarize multiple … relationships
• Solutions– Basic: Random Walk with Restart– Property: Asymmetry– Variants: Esc_Prob and many others.– Generalization: Group Prox.; w/ Attr.; w/ Time
SCS CMU
29
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
Roadmap
• B_Lin: RWR
• FastAllDAP: Esc_Prob
• BB_Lin: Skewed BGs
• FastUpdate: Time-Evolving
SCS CMU
Preliminary: Sherman–Morrison Lemma
30
Tv
uAA =
If:
1 11 1 1
1( )
1
TT
T
A u v AA A u v A
v A u
Then:
SCS CMU
SM Lemma: Applications
• RLS – and almost any algorithm in time series!
• Leave-one-out cross validation for LS
• Kalman filtering
• Incremental matrix decomposition
• … and all the fast sols we will introduce!
31
SCS CMU
32
Computing RWR
1
43
2
5 6
7
9 10
811
12
0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0
0.13
0.22
0.13
0.050.9
0.05
0.08
0.04
0.03
0.04
0.02
0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4 0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0 0 0 1/4 0 1/3 0 1/2
0 0 0 0 0 0 0 0 0 1/3 1/3 0
0.13 0
0.10 0
0.13 0
0.22
0.13 0
0.05 00.1
0.05 0
0.08 0
0.04 0
0.03 0
0.04 0
2 0
1
0.0
n x n n x 1n x 1
Ranking vector Starting vectorAdjacent matrix
1
(1 )i i ir cWr c e
Restart p
SCS CMU
33
Beyond RWR
P-PageRank[Haveliwala]
PageRank[Haveliwala]
RWR[Pan, Sun]
SM Learning[Zhou, Zhu]
RL in CBIR[He]
Fast RWR (B_Lin) Finds the Root Solution !
: Maxwell Equation for Web![Chakrabarti]
SCS CMU
34
• RWR is the building block for computing…– Escape Probability (augmented w/sink)
[Tong+]– ..Effective ConductancResistance Dist.
Commute Time– MRF (special structure) [Cohen]
• Similar Idea of B_Lin to compute other measurements
Beyond RWR
SCS CMU
35
• Q: Given query i, how to solve it?
0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 1/4 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4
0.9
0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
0
0
0
0
00.1
0
0
0
0
0 0 1/4 0 1/3 0 1/2 0
0 0 0 0 0 0 0 0 0 1/3 1/3
1
0 0
??
Adjacent matrix Starting vector
SCS CMU
36
1
43
2
5 6
7
9 10
8 11
120.130.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
OntheFly: 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 1/4 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4
0.9
0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
0
0
0
0
00.1
0
0
0
0
0 0 1/4 0 1/3 0 1/2 0
0 0 0 0 0 0 0 0 0 1/3 1/3
1
0 0
0
0
0
1
0
0
0
0
0
0
0
0
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
1
43
2
5 6
7
9 10
811
12
0.3
0
0.3
0.1
0.3
0
0
0
0
0
0
0
0.12
0.18
0.12
0.35
0.03
0.07
0.07
0.07
0
0
0
0
0.19
0.09
0.19
0.18
0.18
0.04
0.04
0.06
0.02
0
0.02
0
0.14
0.13
0.14
0.26
0.10
0.06
0.06
0.08
0.01
0.01
0.01
0
0.16
0.10
0.16
0.21
0.15
0.05
0.05
0.07
0.02
0.01
0.02
0.01
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
No pre-computation/ light storage
Slow on-line response O(mE)
ir
ir
SCS CMU
37
0.20 0.13 0.14 0.13 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.34
0.28 0.20 0.13 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.45
0.14 0.13 0.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.33
0.13 0.10 0.13 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32
0.09 0.09 0.09 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.56
0.03 0.04 0.04 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22
0.03 0.04 0.04 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22
0.08 0.11 0.04 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.13
0.03 0.04 0.03 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.79
0.04 0.04 0.04 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.80
0.04 0.05 0.04 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.72
0.02 0.03 0.02 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05
4
PreCompute
1 2 3 4 5 6 7 8 9 10 11 12r r r r r r r r r r r r
1
43
2
5 6
7
9 10
8 11
120.130.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
13
2
5 6
7
9 10
811
12
[Haveliwala]
R:
SCS CMU
38
2.20 1.28 1.43 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.34
1.28 2.02 1.28 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.45
1.43 1.28 2.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.33
1.29 0.96 1.29 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32
0.91 0.86 0.91 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.56
0.37 0.35 0.37 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22
0.37 0.35 0.37 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22
0.84 1.14 0.84 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.13
0.29 0.40 0.29 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.79
0.35 0.48 0.35 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.80
0.39 0.53 0.39 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.72
0.22 0.30 0.22 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05
PreCompute:
1
43
2
5 6
7
9 10
8 11
120.130.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
1
43
2
5 6
7
9 10
811
12
Fast on-line response
Heavy pre-computation/storage costO(n ) O(n )
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
3 2
SCS CMU
39
Q: How to Balance?
On-line Off-line
SCS CMU
40
B_Lin: Basic Idea[Tong+]
1
43
2
5 6
7
9 10
8 11
120.130.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
1
43
2
5 6
7
9 10
811
12
Find Community
Fix the remaining
Combine1
43
2
5 6
7
9 10
8 11
12
1
43
2
5 6
7
9 10
8 11
12
5 6
7
9 10
811
12
1
43
2
5 6
7
9 10
8 11
12
1
43
2
5 6
7
9 10
8 11
12
1
43
2
SCS CMU
41
Pre-computational stage
• Q: • A: A few small, instead of ONE BIG, matrices inversions
Efficiently compute and store Q-1
SCS CMU
42
• Q: Efficiently recover one column of Q• A: A few, instead of MANY, matrix-vector multiplication
On-Line Query Stage
+
0
0
0
0
0
0
1
0
0
0
0
0
-1
ie ir
SCS CMU
43
Pre-compute Stage
• p1: B_Lin Decomposition– P1.1 partition– P1.2 low-rank approximation
• p2: Q matrices– P2.1 computing (for each partition)– P2.2 computing (for concept space)
11Q
SCS CMU
44
P1.1: partition
1
43
2
5 6
7
9 10
811
12
1
43
2
5 6
7
9 10
811
12
Within-partition links cross-partition links
skip
SCS CMU
45
P1.1: block-diagonal
1
43
2
5 6
7
9 10
811
12
1
43
2
5 6
7
9 10
811
12
skip
SCS CMU
46
P1.2: LRA for
31
4
2
5 6
7
9 10
811
12
1
43
2
5 6
7
9 10
811
12
|S| << |W2|~
skip
SCS CMU
47
+
=
skip
SCS CMU
48
p2.1 Computing
c
skip
SCS CMU
49
Comparing and
• Computing Time– 100,000 nodes; 100 partitions– Computing 100,00x is Faster!
• Storage Cost – 100x saving!
Q 1,1
Q 1,2
Q 1,k
11Q
=
11Q
11Q
skip
SCS CMU
50
• Q: How to fix the green portions?
W +~
~~
11Q
+ ?
skip
SCS CMU
51
p2.2 Computing:
1S UV=_
-1
1
43
2
5 6
7
9 10
811
12
Q 1,1
Q 1,2
Q 1,k
skip
SCS CMU
52
SM Lemma says:
We have:
Communities Bridges
1 1 11 1
11U VcQQ Q Q
skip
SCS CMU
53
On-Line Stage
• Q
+
Query
0
0
0
0
0
0
1
0
0
0
0
0
Result
?
• A (SM lemma)
Pre-Computation
ie ir
skip
SCS CMU
54
On-Line Query Stage
q1:q2:q3:q4:q5:q6:
skip
SCS CMU
55
skip
SCS CMU
56
Query Time vs. Pre-Compute Time
Log Query Time
Log Pre-compute Time
•Quality: 90%+ •On-line:
•Up to 150x speedup•Pre-computation:
•Two orders saving
SCS CMU
57
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
Roadmap
• B_Lin: RWR
• FastAllDAP: Esc_Prob
• BB_Lin: Skewed BGs
• FastUpdate: Time-Evolving
SCS CMU
58
FastAllDAP [Tong+]
Footnote: augmented w/ universal sink as practical modification
A Bthe remaining graph
• Q: How to compute – Esc_Prob = Pr (smile before cry)?
SCS CMU
59
Solving DAP (Straight-forward way)
One matrix inversion, one proximity!
2 1,
ˆProx( )=c ( )ti j i ji j p I cP p c p
1 x (n-2) 1 x (n-2)(n-2) x (n-2)
1-c: fly-out probability (to black-hole)
SCS CMU
60
Esc_Prob(1->5) =
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
P=
I - +
-1
1 5
3
2
6
4
0.5 0.5
0.5
0.50.5
0.5
0.5
1
0.5 1
P: Transition matrix (row norm.)
2cc
SCS CMU
61
• Case 1, Medium Size Graph– Matrix inversion is feasible, but…– What if we want many proximities?– Q: How to get all (n ) proximities efficiently?– A: FastAllDAP!
• Case 2: Large Size Graph – Matrix inversion is infeasible– Q: How to get one proximity efficiently?– A: FastOneDAP!
Challenges
2
skip
SCS CMU
62
FastAllDAP
• Q1: How to efficiently compute all possible proximities on a medium size graph?– a.k.a. how to efficiently solve multiple
linear systems simultaneously?
• Goal: reduce # of matrix inversions!
SCS CMU
63
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
FastAllDAP: Observation
1 5
3
2
6
4
0.5 0.5
0.5
0.50.5
0.5
0.5
1
0.5 1
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
Need two different matrix inversions!
P=
P=
SCS CMU
64
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
FastAllDAP: Rescue
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
Redundancy among different linear systems!
P=
P=
Overlap between two gray parts!
Prox(1 5)
Prox(1 6)
SCS CMU
65
FastAllDAP: Theorem
• Theorem:
• Proof: by SM Lemma
• Example:
SCS CMU
66
FastAllDAP: Algorithm
• Alg.– Compute Q– For i,j =1,…, n, compute
• Computational Save O(1) instead of O(n )!
• Example– w/ 1000 nodes, – 1m matrix inversion vs. 1 matrix!
2
SCS CMU
67
FastAllDAP
Size of Graph
Time (sec)
Straight-Solver
FastAllDAP
1,000xfaster!
SCS CMU
68
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
Roadmap
• B_Lin: RWR
• FastAllDAP: Esc_Prob
• BB_Lin: Skewed BGs
• FastUpdate: Time-Evolving
SCS CMU
RWR on Bipartite Graph
69
n
m
authors
Conferences
Author-Conf. Matrix
Observation: n >> m!
Examples: 1. DBLP: 400k aus, 3.5k confs 2. NetFlix: 2.7M usrs, 18k mvs
SCS CMU
70
• Q: Given query i, how to solve it?
0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 1/4 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4
0.9
0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
0
0
0
0
00.1
0
0
0
0
0 0 1/4 0 1/3 0 1/2 0
0 0 0 0 0 0 0 0 0 1/3 1/3
1
0 0
RWR on Skewed bipartite graphs
??
… . . .. . …. .. .
. …. . ..
0
0
n m
Ar
… .
. .. . …. .. .
. …. . .. Ac
SCS CMU
• Step 1:
• Step 2:
• Cost:
• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes
71
BB_Lin: Pre-Computation [Tong+ 06]
M = Ac ArX
1( 0.9 )I M 3( )O m m E
2-step RWR for Conferences
All Conf-Conf Prox. Scores
SCS CMU
72
BB_Lin: Pre-Computation [Tong+ 06]
• Step 1:
• Step 2:
M = Ac ArX
1( 0.9 )I M
2-step RWR for Conferences
All Conf-Conf Prox. Scores
SCS CMU
73
BB_Lin: Pre-Computation [Tong+ 06]
• Step 1:
• Step 2:
• Cost:
• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes
M = Ac ArX
1( 0.9 )I M 3( )O m m E
2-step RWR for Conferences
All Conf-Conf Prox. Scores
Ac/ArE edges
m x m
SCS CMU
BB_Lin: On-Line Stage
74Ac/Ar
E edges
Case 1: - Conf - Conf
authors
Conferences
Read out !
SCS CMU
BB_Lin: On-Line Stage
75Ac/Ar
E edges
Case 2: - Au - Conf
authors
Conferences
1 matrix-vec!
SCS CMU
BB_Lin: On-Line Stage
76Ac/Ar
E edges
Case 3: - Au - Au
authors
Conferences
2 m atrix-vec!
SCS CMU
BB_Lin: Examples
• NetFlix dataset (2.7m user x 18k movies)– 1.5hr for pre-computation; – <1 sec for on-line
• DBLP dataset (400k authors x 3.5k confs)– A few minutes for pre-computation– <0.01 sec for on-line
77
SCS CMU
78
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
Roadmap
• B_Lin: RWR
• FastAllDAP: Esc_Prob
• BB_Lin: Skewed BGs
• FastUpdate: Time-Evolving
SCS CMU
79
Challenges
• BB_Lin is good for skewed bipartite graphs– for NetFlix (2.7M nodes and 100M edges)– w/ 1.5 hr pre-computation for m x m core matrix– fraction of seconds for on-line query
• But…what if the graph is evolving over time– New edges/nodes arrive; edge weights increase…– 1.5hr itself becomes a part of on-line cost!
SCS CMU
80
1( 0.9 )I M 3( )O m m E t=0
Q: How to update the core matrix?
t=11( 0.9 )I M
~ ~3( )O m m E
?
SCS CMU
Update the core matrix
• Step 1:
• Step 2:
81
M = Ac ArX
1( 0.9 )I M ~
~
~ ?
M= X+
Rank 2 update
= + X
2( )O m 3( )O m m E
SCS CMU
Update : General Case [Tong+ 2008]
• E’ edges changed
• Involves n’ authors, m’ confs.
• Observation
82
M = Ac ArX~
min( ', ') 'n m E
n authors
m Conferences
SCS CMU
83
• Observation: – the rank of update is small!
• Algorithm:– E’ edges changed– Involves n’ authors, m’ confs.
– our Alg.
– (details in the paper)
Update : General Case
2(min( ', ') ')O n m m E3( )O m m E
min( ', ') 'n m E
83
n authors
m Conferences
SCS CMU
84
FastOneUpdate
176x speedup
40x speedup
Time (Seconds)
Datasets
SCS CMU
85
Fast-Batch-Update
Min (n’, m’)E’
Time (Seconds)Time (Seconds)
15x speed-up on average!
SCS CMU
86
Summary of Part II
• Goal: Efficiently Solve Linear System(s)
• Sols.– B_Lin: Approximate one large linear system– FastAllDAP: multiple inner-related linear
systems– BB_Lin: the intrinsic complexity is small– FastUpdate: (smooth) dynamic linear system
SCS CMU
87
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
B_Lin
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
FastAllDAP
…
BB_Lin…FastUpdate
SCS CMU
88
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
Roadmap
• Link Prediction
• NF
• gCap
• CePS
• G-Ray
• pTrack/cTrack
SCS CMU
890 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0
0.05
0.1
0.15
0.2
0.25
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18 Link Prediction: existence
no link
with link
density
density
Prox (ij)+Prox (ji)
Prox (ij)+Prox (ji)
Prox. is effective to distinguish red and blue!
SCS CMU
90
Link Prediction: direction
• Q: Given the existence of the link, what is the direction of the link?
• A: Compare prox(ij) and prox(ji)>70%
Prox (ij) - Prox (ji)
density
SCS CMU
91
Neighborhood Formulation
ICDM
KDD
SDM
Philip S. Yu
IJCAI
NIPS
AAAI M. Jordan
Ning Zhong
R. Ramakrishnan
…
…
… …
Conference Author
A: RWR! [Sun ICDM2005]
Q: what is most related conference to ICDM
SCS CMU
92
NF: example
ICDM
KDD
SDM
ECML
PKDD
PAKDD
CIKM
DMKD
SIGMOD
ICML
ICDE
0.009
0.011
0.0080.007
0.005
0.005
0.005
0.0040.004
0.004
SCS CMU
93
gCaP: Automatic Image Caption• Q
…
Sea Sun Sky Wave{ } { }Cat Forest Grass Tiger
{?, ?, ?,}
?A: RWR! [Pan KDD2004]
SCS CMU
94
Test Image
Sea Sun Sky Wave Cat Forest Tiger Grass
Image
Keyword
Region
SCS CMU
95
Test Image
Sea Sun Sky Wave Cat Forest Tiger Grass
Image
Keyword
Region
{Grass, Forest, Cat, Tiger}
SCS CMU
96
Center-Piece Subgraph(CePS)
A C
B
A C
B
?
Original GraphBlack: query nodes
CePS
Q
A: RWR! [Tong KDD 2006]
Red: Max (Prox(Red, A) x Prox(Red, B) x Prox(Red, C))
CePS guy
SCS CMU
97
CePS: Example
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V. Jagadish
Laks V.S. Lakshmanan
Heikki Mannila
Christos Faloutsos
Padhraic Smyth
Corinna Cortes
15 1013
1 1
6
1 1
4 Daryl Pregibon
10
2
11
3
16
SCS CMU
98
K_SoftAnd: Relaxation of AND
Asking AND query? No Answer!
Disconnected Communities
Noise
SCS CMU
99
1 7
5
10
11
9 8
12
13
4
3
62
0.0103
0.1617
0.1387
0.1617
0.0849
0.1617
0.1617
0.0103
0.1387
0.13870.16170.1617
0.0103
2_SoftAnd
And
1_SoftAnd (OR)
x 1e-4
1 7
5
10
11
9 8
12
13
4
3
62
0.4505
0.1010
0.0710
0.1010
0.2267
0.1010
0.1010
0.4505
0.0710
0.07100.10100.1010
0.4505
1 7
5
10
11
9 8
12
13
4
3
62
0.0103
0.0046
0.0019
0.0046
0.0024
0.0046
0.0046
0.0103
0.0019
0.00190.00460.0046
0.0103
SCS CMU
100
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V. Jagadish
Laks V.S. Lakshmanan
Umeshwar Dayal
Bernhard Scholkopf
Peter L. Bartlett
Alex J. Smola
1510
13
3 3
5 2 2
327
4
CePS: 2 Soft_AND
Stat.
DB
SCS CMU
101
OutputInput
Attributed Data Graph
Query Graph
Matching SubgraphAccountant
CEO
Manager
SEC
Graph X-Ray
SCS CMU
102
G-Ray: How to?
matching node
matching node
matching node
matching node
Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12)
SCS CMU
103
Effectiveness: star-query
Query Result
SCS CMU
104
Effectiveness: line-query
Query
Result
SCS CMU
105
Query
Result
Effectiveness: loop-query
SCS CMU
106
pTrack
• [Given] – (1) a large, skewed time-evolving bipartite
graphs, – (2) the query nodes of interest
• [Track] – (1) top-k most related nodes for each query
node at each time step t; – (2) the proximity score (or rank of proximity)
between any two query nodes at each time step t
Author A’ Rank in KDD
Year
SCS CMU
107
Philip S. Yu’s Top-5 conferences up to each year
ICDE
ICDCS
SIGMETRICS
PDIS
VLDB
CIKM
ICDCS
ICDE
SIGMETRICS
ICMCS
KDD
SIGMOD
ICDM
CIKM
ICDCS
ICDM
KDD
ICDE
SDM
VLDB
1992 1997 2002 2007
DatabasesPerformanceDistributed Sys.
DatabasesData Mining
SCS CMU
108
KDD’s Rank wrt. VLDB over years
Rank
Year
Data Mining and Databases are more and more relavant!
SCS CMU
109
cTrack
• [Given] – (1) a large, skewed time-evolving graphs, – (2) the query nodes of interest
• [Track] – (1) top-k most central nodes at each time step t; – (2) the centrality score (or rank of centrality) for
each query node at each time step t
SCS CMU
110
Ranking of Centrality up to each year (in NIPS)
M. Jordan
G.HintonC. Koch
T. Sejnowski
Year
Rank of Influential-ness
SCS CMU
111
10 most influential authors up to each year
Author-paper bipartite graph from NIPS 1987-1999. 3k. 1740 papers, 2037 authors, spreading over 13 years
T. Sejnowski
M. Jordan
SCS CMU
112
A BH1 1
D1 1
E
F
G1 11
I J1
1 1
RWR
Varia
nts
w/ Tim
e
w/ Attr
ibut
e
Gro
up P
orx.
Definitions
B_Lin
FastAllDAP
BB_Lin
FastUpdate
Computations
Link Prediction
NF
gCap
CePS
G-Ray
pTrack
cTrack
Applications
ProximityOn
Graphs
Weighted Multiple
Relationship
EfficientlySolve Linear
System(s)
Use Proximity as
Building block
SCS CMU
Take-home Messages
• Proximity Definitions – RWR
– and a lot of variants
• Computations– SM Lemma
113
(1 )i i ir cWr c e
1 11 1 1
1( )
1
TT
T
A u v AA A u v A
v A u
SCS CMU
References
• L. Page, S. Brin, R. Motwani, & T. Winograd. (1998), The PageRank Citation Ranking: Bringing Order to the Web, Technical report, Stanford Library.
• T.H. Haveliwala. (2002) Topic-Sensitive PageRank. In WWW, 517-526, 2002
• J.Y. Pan, H.J. Yang, C. Faloutsos & P. Duygulu. (2004) Automatic multimedia cross-modal correlation discovery. In KDD, 653-658, 2004.
• C. Faloutsos, K. S. McCurley & A. Tomkins. (2002) Fast discovery of connection subgraphs. In KDD, 118-127, 2004.
• J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005) Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM, 418-425, 2005.
• W. Cohen. (2007) Graph Walks and Graphical Models. Draft.114
SCS CMU
References
• P. Doyle & J. Snell. (1984) Random walks and electric networks, volume 22. Mathematical Association America, New York.
• Y. Koren, S. C. North, and C. Volinsky. (2006) Measuring and extracting proximity in networks. In KDD, 245–255, 2006.
• A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006) Learning to rank networked entities. In KDD, 14-23, 2006.
• S. Chakrabarti. (2007) Dynamic personalized pagerank in entity-relation graphs. In WWW, 571-580, 2007.
• F. Fouss, A. Pirotte, J.-M. Renders, & M. Saerens. (2007) Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355-369 2007.
115
SCS CMU
References
• H. Tong & C. Faloutsos. (2006) Center-piece subgraphs: problem definition and fast solutions. In KDD, 404-413, 2006.
• H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random Walk with Restart and Its Applications. In ICDM, 613-622, 2006.
• H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction-aware proximity for graph mining. In KDD, 747-756, 2007.
• H. Tong, B. Gallagher, C. Faloutsos, & T. Eliassi-Rad. (2007) Fast best-effort pattern matching in large attributed graphs. In KDD, 737-746, 2007.
• H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. (2008) Proximity Tracking on Time-Evolving Bipartite Graphs. to appear in SDM 2008.
116