Communities, Spectral Clustering, and Random Walksbindel/present/2012-07-mmds.pdf · Random Walks...

Post on 15-Mar-2020

8 views 0 download

Transcript of Communities, Spectral Clustering, and Random Walksbindel/present/2012-07-mmds.pdf · Random Walks...

Communities, Spectral Clustering, andRandom Walks

David Bindel

Department of Computer ScienceCornell University

13 Jul 2012

Spectral clustering recipe

Ingredients:1. A subspace basis with useful information

(eigenvectors of L, L̄, B, etc; NMF factors)2. A way to extract clusters from the basis

(sign patterns; (scaled) latent coordinates)

What works for small communities? With overlap?

Classic spectral bisection

Goal: Split graph into equal parts, minimizing edges cut.

Strategy: turn into a constrained integer QPI Identify partition with label vector s̄ ∈ {±1}n.I Objective: s̄T Ls̄ = 4× |edges cut|I Constraint: eT s̄ = 0

This is NP hard, but...

Classical spectral bisection

2 4 6 8 10 12 14 16 18 20−1

−0.5

0

0.5

1

Hard: min s̄T Ls̄ s.t. eT s̄ = 0, s̄ ∈ {±1}n.Easy: min vT Lv s.t. eT v = 0, v ∈ Rn, ‖v‖2 = n.

v is the Fiedler vector (second eigenvector of L)

Three clusters?

Take three eigenvectors as latent coordinates + cluster:

V ≈

r1...r1r2...r2r3...r3

=

1...1

1...1

1...1

r1r2r3

= SR.

Equivalent to a matrix factorization!

Overlapping communities

0 20 40 60 80 100 120 140 160 180

0

20

40

60

80

100

120

140

160

180

nz = 3996

A ≈ S diag(β)ST , S ∈ {0,1}n×c

Spectrum for A with overlap

20 40 60 80 100 120 140 160 180

0

10

20

Index

Eig

enva

lue

Clustering and overlap

Dominant eigenvectors for A:

012

Alternate basis for the space:

0

0.5

1

How do we get the latter basis?

Recovering indicators

minimize ‖s̃‖1 (proxy for sparsity of s̃)s.t. s̃ = Uy (s̃ in the right space)

s̃i ≥ 1 (“seed” constraint)s̃ ≥ 0 (componentwise nonnegativity)

Given a basis U, want to extract a vector s̃ s.t.I s̃ lies close to the span of UI s̃ is almost an indicator for a communityI Some known “seeds” are in the community

Indicators from subspaces: LP edition

minimize ‖s̃‖1 (proxy for sparsity of s̃)s.t. s̃ = Uy (s̃ in the right space)

s̃i ≥ 1 (“seed” constraint)s̃ ≥ 0 (componentwise nonnegativity)

Recovers smallest set containing node i ifI U = SY−1 exactly.I Each set contains at least one element only in that set.

(Frequently works if there is not “too much” overlap.)

Got noise? Need thresholding!

Alas

The method is willing, but the space is weak:I Eigenvectors of A may not be ideal – what about L, L̄,

transition matrices for random walks?I Many small communities =⇒ many eigenvectors?

So consider:I Why eigenvectors?

I Optimization connectionI Dynamics connection

I What might serve better?

Optimization and quadratic forms

Indicate V ′ ⊆ V by s ∈ {0,1}n. Measure subgraph:

sT As = |E ′| = internal edges

sT Ds = edges incident on subgraph

sT Ls = edges between V ′ and V̄ ′

sT Bs = “surprising” internal edges

Rayleigh quotients

sT AssT s

= mean internal degree in subgraph

sT LssT s

= edges cut between V ′ and V̄ ′ relative to |V ′|

sT AssT Ds

= fraction of incident edges internal to V ′

sT LssT Ds

= fraction of incident edges cut

sT BssT s

= mean “surprising” internal degree in subgraph

sT BssT Ds

= mean fraction of internal degree that is surprising

Rayleigh quotients and eigenvalues

Basic connection (M spd):

xT KxxT Mx

stationary at x ⇐⇒ Kx = λMx

Easy despite lack of convexity.

Rayleigh quotients big and small

Decompose:

W T MW = I and W T KW = Λ = diag(λ1, . . . , λn).

For any x 6= 0, xT Mx = 1, have

x =∑

j

wjzj

xT Kx =∑

j

λjz2j

SosT KssT Ms

≈ λmax =⇒ s ≈∑

λj≈λmax

wjzj .

So look at invariant subspaces for extreme eigenvalues.

The dynamics perspective

The random walker

Basic idea: extract structure from random walk.

Start at seed and walk forwardDay 1: I came up with a funny joke!Day 2: I tell everyone in my familyDay 3: My mother tells a friend?

Or look at how quickly source is forgottenDay 1: David came up with a funny joke!Day 2: There’s a joke going around Cornell CS.Day 3: I read this bad joke on the web...

The random walker

Lazy random walk with transition matrix

S = (αI + AD−1)/(1 + α).

1. Start at p0, take k steps. Distribution:

pk = Skp0 (→ d/m as k →∞)

2. End at q0 after k steps. Conditional distribution on start:

qk ∝(

ST)k

q0 (→ e/n as k →∞)

Notes:I If the graph is undirected, ST = D−1SD.I If the graph is also regular, ST = S.

Simon-Ando theory

Markov chain with loosely-coupled subchains:I Rapid local mixing: after a few steps

pk ≈c∑

j=1

αj,kp(j)∞

where p(j)∞ is a local equilibrium for the j th subchain

I Slow equilibration: αj,k → αj,∞.

Alternately, rapid local mixing looks like:

qk ≈c∑

j=1

γj,ksj

where sj is an indicator for nodes in one subchain.

Spectral Simon-Ando picture

Exactly decoupled case (c decoupled chains):I Eigenvalue one has multiplicity c.I Eigenvectors of S are local equilibria.I Eigenvectors of ST are indicators for chains.I Rapid mixing =⇒ large gap to λc+1.

Weakly coupled case:I Cluster of c eigenvalues near 1.I Eigenvectors of S are combinations of local equilibria.I Eigenvectors of ST are combinations of indicators.I Large gap between λc and λc+1.

Beyond eigenvectors

Connections that suggest using dominant eigenvectors:I Optimization of quadratic formsI Asymptotic dynamics of random walks

Works well for a few big communities – what about local ones?

Ritz vectors

Variational characterization of eigenpairs:

xT KxxT Mx

stationary at x ⇐⇒ Kx = λMx

Rayleigh-Ritz approximation: x = Vy

(Vy)T K (Vy)

(Vy)T M(Vy)stationary at y ⇐⇒ (V T KV )y = λ̂(V T MV )y

Krylov + Rayleigh-Ritz

Usual approach to large-scale symmetric eigenproblems:1. Generate a basis for a Krylov subspace

Kk (A, x0) = span{x0,Ax0,A2x0, . . . ,Ak−1x0}

2. Use Rayleigh-Ritz on subpace3. Re-start if subspace gets too large

Krylov + Rayleigh-Ritz + dynamics

Eigendecomposition picture:

pk =N∑

j=1

αjλkj vj

Krylov + Rayleigh-Ritz:

{(µj ,wj)}rj=1 = Ritz pairs from Kr (S,p0)

pk =r∑

j=1

βjµkj wj , k < r

The idea: Ritz subspaces

Idea: Use unconverged subspace computations

Invariant subspace Ritz subspaceLong random walks Short random walksGlobal information Local informationPotentially expensive Cheap to computeMay need large space Small space okay

Current favorite method

1. Pick “seed” nodes j1, j2, . . .2. Take short random walks (length k ) from each seed3. Extract few Ritz vectors from span{q0,q1, . . . ,qk−1}.4. Approx indicators in span of all Ritz vectors.5. Possibly add more seeds and return to step 1.6. Convert raw “score” vector to a {0,1} indicator.

Wang test graph

1

2

3

4

5

6 7

8

9

10

11

12

13

14

15

16

17

1819

20

21

22

23

24

25

26

27

28

29

30

Spectrum for Wang test graph

5 10 15 20 25 30−0.5

0

0.5

1

Index

Eig

enva

lue

Zachary Karate graph

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

Spectrum for Karate

5 10 15 20 25 30

−0.5

0

0.5

1

Index

Eig

enva

lue

Football graph

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

3031

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

5960

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

Spectrum for Football

20 40 60 80 100

0

0.5

1

Index

Eig

enva

lue

Dolphin graph

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24 25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

Spectrum for Dolphin

10 20 30 40 50 60

−0.5

0

0.5

1

Index

Eig

enva

lue

Non-overlapping synthetic benchmark (µ = 0.5)

0 100 200 300 400 500 600 700 800 900 1000

0

100

200

300

400

500

600

700

800

900

1000

nz = 15746

Spectrum for synthetic benchmark

20 40 60 80 100

0.4

0.6

0.8

1

Index

Eig

enva

lue

Score vector

100 200 300 400 500 600 700 800 900 1,0000

0.2

0.4

0.6

0.8

1

Index

Sco

re

Score vector for the two-node seed of 492 and 513 in the firstLFR benchmark graph. Ten steps, three Ritz vectors.

Non-overlapping synthetic benchmark (µ = 0.6)

0 100 200 300 400 500 600 700 800 900 1000

0

100

200

300

400

500

600

700

800

900

1000

nz = 15316

Spectrum for synthetic benchmark

20 40 60 80 100

0.4

0.6

0.8

1

Index

Eig

enva

lue

Score vector

100 200 300 400 500 600 700 800 900 1,0000

0.5

1

Index

Sco

re

Score vector for the two-node seed of 492 and 513 in the firstLFR benchmark graph. Ten steps, three Ritz vectors.

Overlapping synthetic benchmark (µ = 0.3)

I 1000 nodesI 47 communitiesI 500 nodes belong to two communities

Spectrum for synthetic benchmark

20 40 60 80 100

0.4

0.6

0.8

1

Index

Eig

enva

lue

Score vector

100 200 300 400 500 600 700 800 900 1,0000

0.5

1

Index

Sco

re

Score vector for the two-node seed of 521 and 892.The desired indicator is in red.

Score vector

100 200 300 400 500 600 700 800 900 1,0000

0.5

1

1.5

Index

Sco

re

Score vector for the two-node seed of 521 and 892 +twelve reseeds. The desired indicator is in red.

Conclusions

Classic spectral methods use eigenvectors to cluster, but:I We don’t need to stop at partitioning!

I Overlap is okayI Key is how we mine the subspace

I We don’t need to stop at eigenvectors!I Can also use Ritz vectorsI Computation is cheap: short random walks

Still very much need:I Better theoretical groundingI Better ideas for thresholdingI Testing on large-scale examples