Communities, Spectral Clustering, and Random Walksbindel/present/2012-07-mmds.pdf · Random Walks...
Transcript of Communities, Spectral Clustering, and Random Walksbindel/present/2012-07-mmds.pdf · Random Walks...
Communities, Spectral Clustering, andRandom Walks
David Bindel
Department of Computer ScienceCornell University
13 Jul 2012
Spectral clustering recipe
Ingredients:1. A subspace basis with useful information
(eigenvectors of L, L̄, B, etc; NMF factors)2. A way to extract clusters from the basis
(sign patterns; (scaled) latent coordinates)
What works for small communities? With overlap?
Classic spectral bisection
Goal: Split graph into equal parts, minimizing edges cut.
Strategy: turn into a constrained integer QPI Identify partition with label vector s̄ ∈ {±1}n.I Objective: s̄T Ls̄ = 4× |edges cut|I Constraint: eT s̄ = 0
This is NP hard, but...
Classical spectral bisection
2 4 6 8 10 12 14 16 18 20−1
−0.5
0
0.5
1
Hard: min s̄T Ls̄ s.t. eT s̄ = 0, s̄ ∈ {±1}n.Easy: min vT Lv s.t. eT v = 0, v ∈ Rn, ‖v‖2 = n.
v is the Fiedler vector (second eigenvector of L)
Three clusters?
Take three eigenvectors as latent coordinates + cluster:
V ≈
r1...r1r2...r2r3...r3
=
1...1
1...1
1...1
r1r2r3
= SR.
Equivalent to a matrix factorization!
Overlapping communities
0 20 40 60 80 100 120 140 160 180
0
20
40
60
80
100
120
140
160
180
nz = 3996
≈
A ≈ S diag(β)ST , S ∈ {0,1}n×c
Spectrum for A with overlap
20 40 60 80 100 120 140 160 180
0
10
20
Index
Eig
enva
lue
Clustering and overlap
Dominant eigenvectors for A:
012
Alternate basis for the space:
0
0.5
1
How do we get the latter basis?
Recovering indicators
minimize ‖s̃‖1 (proxy for sparsity of s̃)s.t. s̃ = Uy (s̃ in the right space)
s̃i ≥ 1 (“seed” constraint)s̃ ≥ 0 (componentwise nonnegativity)
Given a basis U, want to extract a vector s̃ s.t.I s̃ lies close to the span of UI s̃ is almost an indicator for a communityI Some known “seeds” are in the community
Indicators from subspaces: LP edition
minimize ‖s̃‖1 (proxy for sparsity of s̃)s.t. s̃ = Uy (s̃ in the right space)
s̃i ≥ 1 (“seed” constraint)s̃ ≥ 0 (componentwise nonnegativity)
Recovers smallest set containing node i ifI U = SY−1 exactly.I Each set contains at least one element only in that set.
(Frequently works if there is not “too much” overlap.)
Got noise? Need thresholding!
Alas
The method is willing, but the space is weak:I Eigenvectors of A may not be ideal – what about L, L̄,
transition matrices for random walks?I Many small communities =⇒ many eigenvectors?
So consider:I Why eigenvectors?
I Optimization connectionI Dynamics connection
I What might serve better?
Optimization and quadratic forms
Indicate V ′ ⊆ V by s ∈ {0,1}n. Measure subgraph:
sT As = |E ′| = internal edges
sT Ds = edges incident on subgraph
sT Ls = edges between V ′ and V̄ ′
sT Bs = “surprising” internal edges
Rayleigh quotients
sT AssT s
= mean internal degree in subgraph
sT LssT s
= edges cut between V ′ and V̄ ′ relative to |V ′|
sT AssT Ds
= fraction of incident edges internal to V ′
sT LssT Ds
= fraction of incident edges cut
sT BssT s
= mean “surprising” internal degree in subgraph
sT BssT Ds
= mean fraction of internal degree that is surprising
Rayleigh quotients and eigenvalues
Basic connection (M spd):
xT KxxT Mx
stationary at x ⇐⇒ Kx = λMx
Easy despite lack of convexity.
Rayleigh quotients big and small
Decompose:
W T MW = I and W T KW = Λ = diag(λ1, . . . , λn).
For any x 6= 0, xT Mx = 1, have
x =∑
j
wjzj
xT Kx =∑
j
λjz2j
SosT KssT Ms
≈ λmax =⇒ s ≈∑
λj≈λmax
wjzj .
So look at invariant subspaces for extreme eigenvalues.
The dynamics perspective
The random walker
Basic idea: extract structure from random walk.
Start at seed and walk forwardDay 1: I came up with a funny joke!Day 2: I tell everyone in my familyDay 3: My mother tells a friend?
Or look at how quickly source is forgottenDay 1: David came up with a funny joke!Day 2: There’s a joke going around Cornell CS.Day 3: I read this bad joke on the web...
The random walker
Lazy random walk with transition matrix
S = (αI + AD−1)/(1 + α).
1. Start at p0, take k steps. Distribution:
pk = Skp0 (→ d/m as k →∞)
2. End at q0 after k steps. Conditional distribution on start:
qk ∝(
ST)k
q0 (→ e/n as k →∞)
Notes:I If the graph is undirected, ST = D−1SD.I If the graph is also regular, ST = S.
Simon-Ando theory
Markov chain with loosely-coupled subchains:I Rapid local mixing: after a few steps
pk ≈c∑
j=1
αj,kp(j)∞
where p(j)∞ is a local equilibrium for the j th subchain
I Slow equilibration: αj,k → αj,∞.
Alternately, rapid local mixing looks like:
qk ≈c∑
j=1
γj,ksj
where sj is an indicator for nodes in one subchain.
Spectral Simon-Ando picture
Exactly decoupled case (c decoupled chains):I Eigenvalue one has multiplicity c.I Eigenvectors of S are local equilibria.I Eigenvectors of ST are indicators for chains.I Rapid mixing =⇒ large gap to λc+1.
Weakly coupled case:I Cluster of c eigenvalues near 1.I Eigenvectors of S are combinations of local equilibria.I Eigenvectors of ST are combinations of indicators.I Large gap between λc and λc+1.
Beyond eigenvectors
Connections that suggest using dominant eigenvectors:I Optimization of quadratic formsI Asymptotic dynamics of random walks
Works well for a few big communities – what about local ones?
Ritz vectors
Variational characterization of eigenpairs:
xT KxxT Mx
stationary at x ⇐⇒ Kx = λMx
Rayleigh-Ritz approximation: x = Vy
(Vy)T K (Vy)
(Vy)T M(Vy)stationary at y ⇐⇒ (V T KV )y = λ̂(V T MV )y
Krylov + Rayleigh-Ritz
Usual approach to large-scale symmetric eigenproblems:1. Generate a basis for a Krylov subspace
Kk (A, x0) = span{x0,Ax0,A2x0, . . . ,Ak−1x0}
2. Use Rayleigh-Ritz on subpace3. Re-start if subspace gets too large
Krylov + Rayleigh-Ritz + dynamics
Eigendecomposition picture:
pk =N∑
j=1
αjλkj vj
Krylov + Rayleigh-Ritz:
{(µj ,wj)}rj=1 = Ritz pairs from Kr (S,p0)
pk =r∑
j=1
βjµkj wj , k < r
The idea: Ritz subspaces
Idea: Use unconverged subspace computations
Invariant subspace Ritz subspaceLong random walks Short random walksGlobal information Local informationPotentially expensive Cheap to computeMay need large space Small space okay
Current favorite method
1. Pick “seed” nodes j1, j2, . . .2. Take short random walks (length k ) from each seed3. Extract few Ritz vectors from span{q0,q1, . . . ,qk−1}.4. Approx indicators in span of all Ritz vectors.5. Possibly add more seeds and return to step 1.6. Convert raw “score” vector to a {0,1} indicator.
Wang test graph
1
2
3
4
5
6 7
8
9
10
11
12
13
14
15
16
17
1819
20
21
22
23
24
25
26
27
28
29
30
Spectrum for Wang test graph
5 10 15 20 25 30−0.5
0
0.5
1
Index
Eig
enva
lue
Zachary Karate graph
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Spectrum for Karate
5 10 15 20 25 30
−0.5
0
0.5
1
Index
Eig
enva
lue
Football graph
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
3031
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
5960
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
Spectrum for Football
20 40 60 80 100
0
0.5
1
Index
Eig
enva
lue
Dolphin graph
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
Spectrum for Dolphin
10 20 30 40 50 60
−0.5
0
0.5
1
Index
Eig
enva
lue
Non-overlapping synthetic benchmark (µ = 0.5)
0 100 200 300 400 500 600 700 800 900 1000
0
100
200
300
400
500
600
700
800
900
1000
nz = 15746
Spectrum for synthetic benchmark
20 40 60 80 100
0.4
0.6
0.8
1
Index
Eig
enva
lue
Score vector
100 200 300 400 500 600 700 800 900 1,0000
0.2
0.4
0.6
0.8
1
Index
Sco
re
Score vector for the two-node seed of 492 and 513 in the firstLFR benchmark graph. Ten steps, three Ritz vectors.
Non-overlapping synthetic benchmark (µ = 0.6)
0 100 200 300 400 500 600 700 800 900 1000
0
100
200
300
400
500
600
700
800
900
1000
nz = 15316
Spectrum for synthetic benchmark
20 40 60 80 100
0.4
0.6
0.8
1
Index
Eig
enva
lue
Score vector
100 200 300 400 500 600 700 800 900 1,0000
0.5
1
Index
Sco
re
Score vector for the two-node seed of 492 and 513 in the firstLFR benchmark graph. Ten steps, three Ritz vectors.
Overlapping synthetic benchmark (µ = 0.3)
I 1000 nodesI 47 communitiesI 500 nodes belong to two communities
Spectrum for synthetic benchmark
20 40 60 80 100
0.4
0.6
0.8
1
Index
Eig
enva
lue
Score vector
100 200 300 400 500 600 700 800 900 1,0000
0.5
1
Index
Sco
re
Score vector for the two-node seed of 521 and 892.The desired indicator is in red.
Score vector
100 200 300 400 500 600 700 800 900 1,0000
0.5
1
1.5
Index
Sco
re
Score vector for the two-node seed of 521 and 892 +twelve reseeds. The desired indicator is in red.
Conclusions
Classic spectral methods use eigenvectors to cluster, but:I We don’t need to stop at partitioning!
I Overlap is okayI Key is how we mine the subspace
I We don’t need to stop at eigenvectors!I Can also use Ritz vectorsI Computation is cheap: short random walks
Still very much need:I Better theoretical groundingI Better ideas for thresholdingI Testing on large-scale examples