Graph Wavelets and Multiscale Community Mining in networks

61
Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion + Graph Wavelets and Multiscale Community Mining in networks Pierre Borgnat, Nicolas Tremblay CR1 CNRS – Laboratoire de Physique, ENS de Lyon, Université de Lyon Équipe S I S Y P HE : Signaux, Systèmes et Physique 08/2014 p. 1

Transcript of Graph Wavelets and Multiscale Community Mining in networks

Page 1: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Graph Wavelets and Multiscale CommunityMining in networks

Pierre Borgnat, Nicolas Tremblay

CR1 CNRS – Laboratoire de Physique, ENS de Lyon, Université de Lyon

Équipe SISYPHE : Signaux, Systèmes et Physique

08/2014

p. 1

Page 2: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Content of the talk

• General objective: revisit the classical question of findingcommunities in networks using multiscale processingmethods on graphs.

• The things that will be discussed:

1. Recall the notion of community in networks2. Recall spectral graph wavelets3. Multiscale community mining with graph wavelets

p. 2

Page 3: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Examples of networks from our digital world

LinkedIn Network Citation Graph Vehicle Network

USA Power grid Web Graph Protein Network

p. 3

Page 4: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Communities in networks

• Observed, real-world, networks are often inhomogeneous,made of communities (or modules):groups of nodes having a larger proportion of links insidethe group than with the outside

• This is observed in various types of networks: social,technological, biological,...

• There exist several extensive surveys:

[S. Fortunato, Physic Reports, 2010]

[von Luxburg, Statistics and Computating, 2007]

...

p. 4

Page 5: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Purpose of community detection?

p. 5

Page 6: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Purpose of community detection?

p. 5

Page 7: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Purpose of community detection?

1) It gives us a sketch of the network:

p. 6

Page 8: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Purpose of community detection?1) It gives us a sketch of the network:

2) It gives us intuition about its components:

p. 6

Page 9: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Some examples of networks with communities ormodules

• Social face-to-face interaction networks [Sociopatterns;Barrat, Cattuto, et al.]

(Lab. physique, ENSL, 2013) (école primaire; Sociopatterns, 2011)

• Brain networks [Bullmore, Achard, 2006]

10 neurons11

fMRI10 voxels

0.3 Hz5

Parcellation

Time series

Connectivityusing wavelets

Graphs of cerebral connections

GRAPHSIP j t h ll

p. 7

Page 10: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Classical methods to find communities in networks• I will not pretend to make a full survey... Some important

steps are:• Cut algorithms (legacy from computer science)• Spectral clustering (relaxed cut problem)• Modularity optimization (physicists’ contribution) [Newman,

Girvan , 2004]• Greedy modularity optimization a la Louvain (computer

science strikes back) [Blondel et al., 2008]

• Using information compression [Rosvall, Bergstrom, 2008]• Inference for stochastic-block models (e.g. with BP

[Decelle et al., 2012]; with spectral approach [Lelarge,Massoulié,... 2012, 2014])

p. 8

Page 11: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Parenthesis: Stochastic Block Model• Representation: as a matrix, as a network

• Conjectured phase diagram of identifiability

[Decelle, Krzakala, Moore, Zdeborova, 2011][Lelarge, Massoulié, Xu, 2013]p. 9

Page 12: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Spectral analysis of networksSpectral theory for networkThis is the study of graphs through the spectral analysis(eigenvalues, eigenvectors) of matrices related to the graph:the adjacency matrix, the Laplacian matrices,....

NotationsG = (V ,E ,w) a weighted graph

N = |V | number of nodesA adjacency matrix Aij = wijd vector of strengths di =

∑j∈V wij

D matrix of strengths D = diag(d)f signal (vector) defined on V

p. 10

Page 13: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Definition of the Laplacian matrix of graphs

Laplacian matrixL laplacian matrix L = D − A

(λi) L’s eigenvalues 0 = λ0 < λ1 ≤ λ2 ≤ ... ≤ λN − 1

(χi) L’s eigenvectors Lχi = λi χi

Note: χ0 = 1.

A simple example: the straight line

←→ L =

⎛⎜⎜⎜⎝

...

... −1 0 0 0 0

... 2 −1 0 0 0−1 2 −1 0 00 −1 2 −1 00 0 −1 2 −10 0 0 −1 2 ...0 0 0 0 −1 ...

...

⎞⎟⎟⎟⎠

For this regular line graph, L is the 1-D classical laplacian operator(i.e. double derivative operator).

p. 11

Page 14: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Spectral clustering vs. Modularity• Spectral clustering: relaxation of the optimization of the

minimal cut. The cut size between groups of assignment

si = ±1 is: R =12

∑i,j

Aij(1− sisj) =14

s�Ls

• By spectral decomposition of L, Lij =∑N−1

k=1 λk (χk )i(χk )j ,the minimum is for si = (χ1)i → relaxed in si = sign((χ1)i).

• Problems with spectral clustering:1) No assessment of the quality of the partitions2) No reference to comparison to some null hypothesis

• Modularity [Newman, 2003] (with 2m =∑

i di )

Q =1

2m

∑ij

[Aij −

didj

2m

]δ(ci , cj)

• Null model: Bernoulli random graph with prob. di dj2m

• Q is between −1 and +1 (≤ 1− 1/nc if nc groups)p. 12

Page 15: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Spectral clustering vs. Modularity

• Comparison of optimization of cut and optimization of Q• Modularity works well, better than spectral clustering

• More efficient algorithm: the greedy (ascending) Louvainapproach (ok for millions of nodes !) [Blondel et al., 2008]

p. 13

Page 16: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Existence of multiscale community structure in a graph16 com. Q=0.80

4 com. Q=0.74

8 com. Q=0.83

2 com. Q=0.50

• All representations correct; modularity favours one• Note: one could integrate a ad-hoc scale into modularity

[Arenas et al., 2008; Reichardt and Bornholdt, 2006]p. 14

Page 17: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Relating the Laplacian of graphs to Signal Processing

Laplacian matrix

L or L laplacian matrix L = D − A or L = I − D−1/2AD−1/2

(λi) L’s eigenvalues 0 = λ0 < λ1 ≤ λ2 ≤ ... ≤ λN − 1

(χi) L’s eigenvectors Lχi = λi χi

A simple example: the straight line

←→ L =

⎛⎜⎜⎜⎝

...

... −1 0 0 0 0

... 2 −1 0 0 0−1 2 −1 0 00 −1 2 −1 00 0 −1 2 −10 0 0 −1 2 ...0 0 0 0 −1 ...

...

⎞⎟⎟⎟⎠

For this regular line graph, L is the 1-D classical laplacian operator(i.e. double derivative operator):

its eigenvectors are the Fourier vectors, and its eigenvalues theassociated (squared) frequenciesp. 15

Page 18: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Objective and Fundamental analogy[Shuman, Vandergheynst et al., IEEE SP Mag, 2013]

Objective: Definition of a Fourier Transform adapted tograph signals

f : signal defined on V ←→ f : Fourier transform of f

Fundamental analogyOn any graph, the eigenvectors χi of the Laplacian matrix L orL will be considered as the Fourier vectors, and itseigenvalues λi the associated (squared) frequencies.

• Works exactly for all regular graphs (+ Beltrami-Laplace)• Conduct to natural generalizations of signal processing

p. 16

Page 19: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

The graph Fourier transform

• f is obtained from f ’s decomposition on the eigenvectors χi :

f =

⎛⎜⎜⎜⎜⎝

< χ0, f >< χ1, f >< χ2, f >

...< χN − 1, f >

⎞⎟⎟⎟⎟⎠

Define χ = (χ0|χ1|...|χN − 1) : f = χ� f

• Reciprocally, the inverse Fourier transform reads: f = χ f

• Parseval theorem: ∀(g, h) < g, h >=< g, h >

• Filtering: apply g(λi) in the Fourier domain on the f (i).

p. 17

Page 20: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Fourier modes: examples in 1D and in graphsLOW FREQUENCY: HIGH FREQUENCY:

• Alternative Fourier transform: use the adjacency matrix A[Sandryhaila, Moura, IEEE TSP, 2013]

p. 18

Page 21: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Spectral analysis: the χi and λi of a multiscale toy graph

Mode #

node

s

20 40 60 80 100 120

20406080

100120 −0.5

0

0.5

0 20 40 60 80 100 120 1400

5

10

15

Mode #

λi

p. 19

Page 22: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Spectral Graph Wavelets[Hammond et al., ACHA 2011]

• Fourier is a global analysis. Fourier modes (eigenvectors ofthe laplacian) are used in classical spectral clustering, butdo not enable a jointly local and scale dependent analysis.

• For that classical signal processing (or harmonic analysis)teach us that we need wavelets.

• Wavelets : local functions that act as well as a filter arounda chosen scale.A wavelet:

– Translated:

– Scaled• Classical wavelets

by analogy−−−−−−→ Graph waveletsp. 20

Page 23: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Classical waveletsby analogy−−−−−−→ Graph wavelets

Classical (continuous) world Graph world

Real domain x node a

Fourier domain ω eigenvalues λi

Filter kernel ψ(ω) g(λi)⇔ G

Filter bank ψ(sω) g(sλi)⇔ Gs

Fourier modes exp−iωx eigenvectors χi

Fourier transf. of f f (ω) =∫∞−∞ f (x) exp−iωx dx f = χ� f

The wavelet at scale s centered around a is given by:

ψs,a(x) =1sψ

(x − a

s

)=

∫ ∞

−∞δa(ω)ψ(sω) expiωx dω

In the graph world: ψs,a = χ Gsδa = χ Gsχ� δap. 21

Page 24: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Examples of graph waveletsA WAVELET:

TRANSLATING: SCALING:

p. 22

Page 25: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Examples of wavelets: they encode the local topology

ψs=1,a

ψs=35,a

ψs=25,a

ψs=50,a

p. 23

Page 26: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Example of wavelet filters• More precisely, we will use the following kernel:

g(x ;α, β, x1, x2) =

⎧⎪⎨⎪⎩

x−α1 xα for x < x1

p(x) for x1 ≤ x ≤ x2

xβ2 x−β for x > x2.

• To emphasize χ1, the parameters are:

smin =1λ2

, x2 =1λ2

, smax =1λ2

2, x1 = 1, β = 1/log10

(λ3

λ2

)

• This leads to: (choice α = 2)

0 1 20

2

4

6

8

λ

g(sλ

)

s=7

s=13

s=25

s=47

0

2

4

6

8

10

x1

x2 x

g(x)

α=1

α=2

α=10

α=50

p. 24

Page 27: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

A new method for multiscale community detection[N. Tremblay, P. Borgnat, 2013]

General Ideas• Take advantage of local topological information encoded in

Graph Wavelets.Wavelet = ego-centered vision from a node

• Group together nodes whose local environments aresimilar at the description scale

• This will naturally offer a multiscale vision of communities

The method is based on:1. wavelets (resp. scaling functions) as feature vectors2. the correlation distance to compare them3. the complete linkage clustering algorithm

p. 25

Page 28: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

1) Wavelets as featuresEach node a has feature vector ψs,a.Globally, one will need Ψs, all wavelets at a given scale s, i.e.

Ψs =(ψs,1|ψs,2| . . . |ψs,N

)= χGsχ

�.

NODE

A:

NODE

B:

AT SMALL SCALE: AT LARGE SCALE:

p. 26

Page 29: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

2) Correlation distances

Ds(a, b) = 1− ψ�s,aψs,b

||ψs,a||2 ||ψs,b||2.

NODE

A:

NODE

B:CORR.COEF.: -0.50 0.97

p. 27

Page 30: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

3) Complete linkage clustering and dendrogram

• Bottom to top hierarchical algorithm:start with as many clusters as nodes and work the way upto fewer clusters (by linking subclusters together) untilreaching one global cluster.

• Computation of the distance between two subclusters:the maximum distance between all pairs of nodes, takingone from each cluster

• Output: a dendrogram

p. 28

Page 31: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Example of a dendrogram at a given scale s

0

0.5

1

1.5

2

corr

elat

ion

dist

ance

The big question: where should we cut the dendrogram?

p. 29

Page 32: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut at maximal gap

Simplest method: cut the dendrogram at its maximal gap.At small scale:

0

0.5

1

1.5

2

corr

elat

ion

dist

ance

At large scale:

0

0.5

1

1.5

2

corr

elat

ion

dist

ance

Note: we use the toy graphp. 30

Page 33: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut at maximal gap

Simplest method: cut the dendrogram at its maximal gap.At small scale:

0

0.5

1

1.5

2

corr

elat

ion

dist

ance

At large scale:

0

0.5

1

1.5

2

corr

elat

ion

dist

ance

Note: we use the toy graphp. 30

Page 34: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut at maximal gap

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

scale number

AR

coe

ffici

ent

scale number

node

s

10 20 30 40

20

40

60

80

100

120

• Improvement: cut at average maximal gap

p. 31

Page 35: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

The Sales-Pardo benchmark• Three community structures nested in one another• Parameters:

• sizes of the communities (N = 640)• ρ tunes how well separated the different scales are• k is the average degree; the sparser is the graph, the

harder it is to recover the communities.

p. 32

Page 36: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Results on the Sales-Pardo benchmark

10 15.8 25.1 39.80

0.5

1

scale s

Adj

. Ran

d in

dex

Large Scale

Medium Scale

Small Scale

23 1

p. 33

Page 37: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

The case of larger networks

• Limit of the method: computation of the N ×N matrix of thewavelets Ψs.

• Improvement: use of random features.• Let r ∈ R

N be a random vector on the nodes of the graph,composed of N independent normal random variables ofzero mean and finite variance σ2.

• Define the feature fs,a ∈ R at scale s associated to node aas

fs,a = ψ�s,ar =

N∑k=1

ψs,a(k)r(k).

p. 34

Page 38: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

The case of larger networks• Let us define the correlation between features

Cor(fs,a, fs,b)=E((fs,a − E(fs,a))(fs,b − E(fs,b)))√

Var(fs,a)Var(fs,b).

• It is easy to show that:

Cor(fs,a, fs,b) =ψ�

s,aψs,b

||ψs,a||2 ||ψs,b||2.

• Therefore, the sample correlation estimator Cab,η satisfies:

limη→+∞ Cab,η =

ψ�s,aψs,b

||ψs,a||2 ||ψs,b||2= 1− Ds(a, b).

• This leads to a faster algorithm.

p. 35

Page 39: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Results on the Sales-Pardo benchmark

• As a function of η, the number of random vectors used

20 40 60 80 1000

0.5

1

η

<ra

tios>

LS RecallMS RecallSS Recall

20 40 60 80 1003.23.43.63.8

44.2

η

<co

mp.

tim

e> (

sec)

p. 36

Page 40: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Stability of the communities• Not all partitions are relevant: only those stable enough

convey information about the network• Lambiotte’s approach to stability:

Create B resampled graphs by randomly adding ±p%(typically p = 10) to the weight of each link and computingthe corresponding B sets of partitions {Pb

s }b∈[1,B],s∈S .Then, stability:

γr (s) =2

B(B − 1)

∑(b,c)∈[1,B]2,b �=c

ari(Pbs ,P

cs ), (1)

• New approach: we have a stochastic algorithm.Consider J sets of η random signals and compute theassociated sets of partitions {Pj

s}j∈[1,J],s∈S . Let stability be:

γa(s) =2

J(J − 1)

∑(i,j)∈[1,J]2,i �=j

ari(Pis,P

js). (2)

p. 37

Page 41: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Results with stabilities on the Sales-Pardo benchmark

10 15.8 25.1 39.80

0.5

1

scale s

Adj

. Ran

d in

dex

Large Scale

Medium Scale

Small Scale

23 1

10 15.8 25.1 39.80

0.5

1

scale s

Inst

abili

ties

1−γr

1−γa

123

p. 38

Page 42: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

In addition: statistical test of relevance of thecommunities

• It is possible to design a data-driven test on γa(computation of a numerical threshold for the configuration(or Chung-Lu) model).

• Result: threshold for 1− γa above which the partition incommunities is irrelevant.

Sales-Pardo graph Chung-Lu graph

10 15.8 25.1 39.80

0.5

1

scale s

1−γa

3 2 1

1.9 2.2 2.50

0.5

1

scale s

1−γa

p. 39

Page 43: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Comparison on larger Sales-Pardo graphs

N = 6400 nodes

Schaub-Delvenne

0 0.1 10

0.5

1

Markov time

Adj

. Ran

d in

dex

LSMSSS

0 0.1 10

0.5

1

Markov time

Var

. Inf

orm

atio

n

Wavelets

6.3 10 15.8 25.10

0.5

1

scale s

Adj

. Ran

d in

dex

LSMSSS

6.3 10 15.8 25.10

0.5

1

scale s1−

γa

p. 40

Page 44: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Sensor network on the swiss roll manifold• Three scale ranges of relevant community structure

10000 100000 1e+06 1e+070

0.1

0.2

0.3

0.4

scale s

1−γa

p. 41

Page 45: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

The dynamic social network of a primary school

Collaboration with A. Barrat (CPT Marseille), C. Cattuto (ISI, Turin)Sociopatterns project

• Acquisition of face-to-face human contacts (resolved intime) using active RFID tags and + fixed antenna

• Interest: social studies, spreading processes (ofinformation, of epidemic,...), contact dynamics,...

• Time for a movie!p. 42

Page 46: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Multi-scale Communities in Primary School

scale s20 28 37 51 74 103

1st a1st b

2nd a2nd b3rd a3rd b4th a4th b5th a5th b

20 28 37 51 74 1030

0.5

1

scale s1−

γa

p. 43

Page 47: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Multi-scale Communities in Primary School

p. 44

Page 48: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Conclusion• Wavelet ψs,a gives an ”egocentered view“ of the network

seen from node a at scale s• Correlation between these different views gives us a

distance between nodes at scale s• This enables multi-scale clustering of nodes in

communities• Associated to a notion of stability and of statistical

detection of relevance

• I hope also that you were interested inthe emerging field of graph signal processing for networks.

http://perso.ens-lyon.fr/pierre.borgnat

Acknowledgements: thanks to Nicolas Tremblay for borrowingmany of his figures or slides.

p. 45

Page 49: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

A toy graph for introducing the methodsmallest scale (16 com.): small scale (8 com.):

medium scale (4 com.): large scale (2 com.):

p. 46

Page 50: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut with prior knowledgeLet us cheat by using prior knowledge on the number ofcommunities we are looking for.If we cut each dendrogram in two clusters

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

scale number

AR

coe

ffici

ent

Using wavelets as featuresConclusion: the dendrograms at different scales contain thecommunity structure at various scales.

p. 47

Page 51: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut with prior knowledgeLet us cheat by using prior knowledge on the number ofcommunities we are looking for.If we cut each dendrogram in four clusters

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

scale number

AR

coe

ffici

ent

Using wavelets as featuresConclusion: the dendrograms at different scales contain thecommunity structure at various scales.

p. 47

Page 52: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut with prior knowledgeLet us cheat by using prior knowledge on the number ofcommunities we are looking for.If we cut each dendrogram in eight clusters

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

scale number

AR

coe

ffici

ent

Using wavelets as featuresConclusion: the dendrograms at different scales contain thecommunity structure at various scales.

p. 47

Page 53: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut with prior knowledgeLet us cheat by using prior knowledge on the number ofcommunities we are looking for.If we cut each dendrogram in sixteen clusters

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

scale number

AR

coe

ffici

ent

Using wavelets as featuresConclusion: the dendrograms at different scales contain thecommunity structure at various scales.

p. 47

Page 54: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut with prior knowledgeLet us cheat by using prior knowledge on the number ofcommunities we are looking for.The four levels of communities.

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

scale number

AR

coe

ffici

ent

Using wavelets as features

Conclusion: the dendrograms at different scales contain thecommunity structure at various scales.

p. 47

Page 55: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut with modularity• By max. of with classical modularity Q• or by max. of a filtered modularity [Arenas, Delvenne,...]

Classical Modu Opt. Filtered Modu Opt.

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

scale number

AR

coe

ffici

ent

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

scale number

AR

coe

ffici

ent

• The solutions are not really good at all scale.

p. 48

Page 56: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut at maximal gap: non robust to outliers

0

0.5

1

1.5

corr

elat

ion

dist

ance

nodes

p. 49

Page 57: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut at maximal average gap

0 0.2 0.4 0.6 0.8 10

0.5

1

correlation distance

Γa

Γ =1

Nmax(corr. dist.)

∑a∈V

Γa

At small scale

0 0.2 0.4 0.6 0.8 10

0.5

1

correlation distance

Γ

0

0.5

1co

rrel

atio

n di

stan

ce

nodes

p. 50

Page 58: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut at maximal average gap

0 0.2 0.4 0.6 0.8 10

0.5

1

correlation distance

Γa

Γ =1

Nmax(corr. dist.)

∑a∈V

Γa

At larger scale

0 0.5 1 1.50

0.5

1

correlation distance

Γ

0

0.5

1

1.5co

rrel

atio

n di

stan

ce

nodes

p. 51

Page 59: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Dendrogram cut at maximal average gap

For the previous graph:

0 0.5 10

0.5

1

correlation distance

Γ

p. 52

Page 60: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

Recall: The Adjusted Rand IndexLet:

• C and C′ be two partitions we want to compare.• a be the # of pairs of nodes that are in the same

community in C and in the same community in C′• b be the # of pairs of nodes that are in different

communities in C and in different communities in C′• c be the # of pairs of nodes that are in the same

community in C and in different communities in C′• d be the # of pairs of nodes that are in different

communities in C and in the same community in C′

a + b is the number of “agreements“ between C and C′.c + d is the number of “disagreements“ between C and C′.

p. 53

Page 61: Graph Wavelets and Multiscale Community Mining in networks

Introduction Communities in networks Wavelets on graphs Multiscale community mining Conclusion +

The Adjusted Rand Index

The Rand index, R, is:

R =a + b

a + b + c + d=

a + b(n2

)The Adjusted Rand index AR is the corrected-for-chanceversion of the Rand index:

AR =R − ExpectedIndex

MaxIndex − ExpectedIndex

p. 54