Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis 1 Random Walk Start from a...

48
Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis Random Walk on Graph 1 Random Walk Start from a given node at time 0 Choose a neighbor randomly (including previous) and move there Repeat until time t = n t=0 Q1. Where does this converge to as n ∞ Q2. How fast does it converge? Q3. What are the implications for different applications?

Transcript of Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis 1 Random Walk Start from a...

Page 1: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Random Walk on Graph

1

Random Walk Start from a given node at time

0 Choose a neighbor randomly

(including previous) and move there

Repeat until time t = n

t=0

Q1. Where does this converge to as n ∞

Q2. How fast does it converge?

Q3. What are the implications for different applications?

Page 2: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Random Walks on Graphs

Node degree ki move to any neighbor with prob = 1/ki

This is a Markov chain! Start at a node i p(0) = (0,0,…,1,…0,0) p(n) = p(0) An

π = π A [where π = limn∞ p(n)]

Q: what is π for a random walk on a graph?

2

0 1/k1 1/k1 0 0

1/k2 0 1/k2 1/k2 0

1/k3 1/k3 0 0 1/k3

0 1/k4 0 0 1/k4

0 0 1/k5 1/k5 0

0 1 1 0 0

1 0 1 1 0

1 1 0 0 1

0 1 0 0 1

0 0 1 1 0

A =

Page 3: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Random Walks on Undirected Graphs

Stationarity: π(z) = Σxπ(x)p(x,z) p(x,y) = 1/kx

Could try to solve these or global balance. Not Easy!!

Define N(z): {neighbors of z) Σx ∈ N(z) kx⋅p(x,z) =Σx ∈ N(z) kx⋅(1/kx) = Σx ∈ N(z)1 = kz Normalize by (dividing both sides with) Σxkx

Σxkx = 2|E| (|E| = m = # of edges)Σx ∈ N(z) (kx/2|E|)⋅p(x,z) = kz/2|E| π(x) = kx/2|E| is the stationary distribution

always satisfies the stationarity eq π(x) = π(x)P 3

Page 4: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

What about Random Walks on Directed Graphs?

Assign each node centrality 1/n (for n nodes)

4

1/8

1/8

1/8

1/8

1/8 1/8

1/8

1/8

4/13

2/13

1/13

1/13

1/13 1/13

2/13

1/13

Page 5: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

A Problematic Graph

Q: What is the problem with this graph?A: All centrality “points” will eventually go to F and GSolution: when at node i

1) With probability β jump to any (of the total N) node(s)2) With 1-β jump to a random neighbor of i

Q: Does this remind you of something?A: PageRank algorithm! PageRank of node i is the stationary probability for a random walk on this (modified) directed graph

factor β in PageRank function avoids this problem by “leaking” some small amount of centrality from each node to all other nodes

5

Page 6: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

PageRank Centrality

PageRank as a Random Walk

A (bored) web surfer Either surf a linked webpage

with probability 1-β Or surf a random page (e.g. new search)

with probability β The probability of ending up at page X, after a large

enough time = PageRank of page X! Can generalize PageRank with general β = (β1,β2,…,βn) Undirected network: removing β degree centrality

6

k

xAβ)(1-x out

j

j

j iji

Page 7: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Applications of RW: Measuring Large Networks We are interested in studying the properties (degree

distribution, path lengths, clustering, connectivity, etc.) of many real networks (Internet, Facebook, YouTube, Flickr, etc.) as this contain many important ($$$) information

E.g. to plot degree distribution, we need to crawl the whole network and obtain a “degree value” for each node.

This networks might contain millions of nodes!!

7

Page 8: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Online Social Networks (OSNs)

8

(over 15% of world’s population, and over 50% of world’s Internet users !)

> 1 billion users October 2010

500 million 2

200 million 9

130 million 12

100 million 43

75 million 10

75 million 29

Size Traffic

Page 9: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Facebook:•500+M users•130 friends each (on average)•8 bytes (64 bits) per user ID

The raw connectivity data, with no attributes:•500 x 130 x 8B = 520 GB

This is neither feasible nor practical. Solution: Sampling!

To get this data, one would have to download:•100+ TB of (uncompressed) HTML data!

Measuring FaceBook

9

Page 10: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Measuring Large Networks (for the mere mortals)Obtaining complete dataset difficult

companies usually unwilling to share data for privacy and performance reasons (e.g. Facebook will ban

accounts if it sees extensive crawling) tremendous overhead to measure all (~100TB for Facebook)

Representative samples desirable study properties test algorithms

10

Page 11: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

• Topology?• Nodes?

What:• Directly?• Exploration?

How:

Sampling

11

Page 12: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

(1) Breadth-First-Search (BFS)

C

A

EGF

BD

H

12

Unexplored

Explored

Visited

Starting from a seed, explores all neighbor nodes. Process continues iteratively without replacement.

BFS leads to bias towards high degree nodes

Lee et al, “Statistical properties of Sampled Networks”, Phys Review E, 2006

Early measurement studies of OSNs use BFS as primary sampling technique

i.e [Mislove et al], [Ahn et al], [Wilson et al.]

Page 13: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

(2) Random Walk (RW)

C

A

EGF

BD

H

1/3

1/3

1/3

Next candidate

Current node

13

Explores graph one node at a time with replacement

Restart from different seeds

Or multiple seeds in parallel

Does this lead to a good sample??

Page 14: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Random Walk (RW):

[1] M. Gjoka, M. Kurant, C. T. Butts and A. Markopoulou, “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs”, INFOCOM 2010.

Implications for Random Walk Sampling

Say, we collect a small part of the Facebook graph using RW

Higher chance to visit high-degree nodes High-degree nodes overrepresented Low-degree nodes under-represented

Real degree distribution

sampled degree distribution 2?

sampled degree distribution 1?

Page 15: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Random Walk Sampling of Facebook

15

Q: How can we fix this?A: Intuition Need to reduce (increase) the probability of

visiting high (low) degree nodes

Real average node degree: 94Observed average node degree: 338

sampled

real

Page 16: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Markov Chain Monte Carlo (MCMC)

Q:How should we modify the Random Walk?A: Markov Chain Monte Carlo theory

Original chain: move xy with prob Q(x,y) Stationary distribution π(x)

Desired chain: Stationary distribution w(x) (for uniform sampling: w(x) =

1/N)

New transition probabilities

16

xzz: xy if y)y)a(x,Q(x,1 xy if y)y)a(x,Q(x,y)P(x,

Page 17: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

MCMC (2)

a(x,y): probability of accepting proposed move

Q: How should we choose a(x,y) so as to converge to the desired stationary distribution w(x)?

A: w(x) station. distr. w(x)P(x,y) = w(y)P(y,x) (for all x,y)

Q: Why? Local balance (time-reversibility) equations

w(x)Q(x,y)a(x,y) = w(y)Q(y,x)a(y,x) (denote b(x,y) = b(y,x)) a(x,y) ≤ 1 (probability) b(x,y) ≤ w(x)Q(x,y) b(x,y) = b(y,x) ≤ w(y)Q(y,x) 17

xzz: xy if y)y)a(x,Q(x,1 xy if y)y)a(x,Q(x,y)P(x,

y)w(x)Q(x,x)w(y)Q(y,1,miny)a(x,

Page 18: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

MCMC for Uniform Sampling

w(x) = w(y) (= 1/n…doesn’t really matter)

Q(y,x)/Q(x,y) = kx/ky

Metropolis-Hastings random walk: Move to lower degree node always accepted Move to higher degree node reject with prob related to

degree ratio

18

y)w(x)Q(x,x)w(y)Q(y,1,miny)a(x,

y

xkk1,miny)a(x,

Page 19: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Explore graph one node at a time with replacement

In the stationary distribution

,

,

1min(1, ) if neighbor of

1 if =

MH ww

MHy

y

kw

k kPP w

Metropolis-Hastings (MH) Random Walk

1

V

19

C

A

EGF

BD

H

Next candidate

Current node

2/15

5

1

5

3

3

1MH

ACP15

2)

5

1

3

1

3

1(1 MH

AAP

Page 20: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Degree Distribution of FB with MHRW

Sampled degree distribution almost identical to real one

MCMC methods have MANY other applications Sampling Optimization

20

Page 21: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Node Importance: Who is most “central”?

21

?

?

?

Page 22: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Node Centrality: Depends on Application Influence: Which social network nodes should I pick to

advertise/spread a video/product/opinion?

Resilience: Which node(s) should I attack to disconnect the network?

Malware/Virus Infection: Which nodes should I immunize (e.g. upload a patch) to stop a given Internet “worm” from spreading quickly?

Performance: Which nodes are the bottleneck in a network?

Search Engines: Which nodes contain the most relevant information?

A centrality measure implicitly solves some optimization problem

22

Page 23: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

indegree

In each of the following networks, X has higher centrality than Y according to a particular measure

outdegree betweenness closeness

Centrality: Importance based on network position

23

Page 24: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

He who has many friends is most important.

When is the number of connections the best centrality measure? people who will do favors for you people you can talk to (influence set, information access, …) influence of an article in terms of citations (using in-degree)

Degree Centrality

24

Page 25: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

divide by the max. possible, i.e. (N-1)

Normalized Degree Centrality

25

Page 26: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

jkjkB gigiC /)()(

Where gjk = the number of shortest paths connecting j-k, and gjk = the number that node i is on.

Usually normalized by:

2' /)()( niCiC BB

Betweeness Centrality: Definition

26

betweenness of vertex i paths between j and k that pass through i

all paths between j and k

Page 27: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

betweenness on toy networks

non-normalized version:

27

bridge

Page 28: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Nodes are sized by degree, and colored by betweenness.

Can you spot nodes with high betweenness but relatively low degree?

What about high degree but relatively low betweenness?

Betweeness vs. Degree Centrality

28

Page 29: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Why is Betweeness Centrality Important?Connectivitya) Remove random nodeb) Remove high degree nodec) Remove high betweeness node

29

Page 30: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Why is Betweeness Centrality Important? The network below is a wireless network (e.g. sensor

network) Nodes run on battery total energy Emax

Each node picks a destination randomly and sends data at constant rate every packet going through a node spends E of its energy

Q: How long would it take until the first node dies out of battery?

30

S1

D1

D2

S2

Page 31: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

How About in This Network?

31

Page 32: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Why is Betweeness Centrality Important?Monitoringa) Where would you place a traffic monitor in order to

track the maximum number of packets (if this was your university network)?

b) Where would you place traffic cameras if that was a street network?

32

Page 33: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Why is Betweeness Centrality Important? Traffic Flow: Each link has capacity 1Q: What is the maximum throughput between S-D?A: Max Flow – Min Cut theorem max flow equal to min

number of links removed to disconnect S-D S-D throughput = 1

33

S

D

Page 34: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Spectral Analysis of (ergodic) Markov Chains If a Markov Chain (defined by transition matrix P) is

ergodic (irreducible, aperiodic, and positive recurrent) P(n)

ik πk and π = [π1, π2,…, πn]

Q: But how fast does the chain converge? E.g. how many steps until we are “close enough” to π

A: This depends on the eigenvalues of P The convergence time is also called the mixing time

34

Page 35: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Eigenvalues and Eigenvectors of matrix P

Left EigenvectorsA row vector π is a left eigenvector for eigenvalue λ of matrix P iff πP = λπ Σk πk pki = λπi

Right EigenvectorsA column vector v is a right eigenvector for eigenvalue λ of matrix P iff Pv = λv Σk pik vk = λvi

Q: What eigenvalues and eigenvectors can we guess already?

A: λ = 1 is a left eigenvalue with eigenvector π the stationary distr. λ = 1 is a right eigenvalue with eigenvector v=1 (all 1s) 35

Page 36: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Both sets have non-zero solutions (P-λI) is singular There exists v ≠ 0 such that (P-λI)v = 0

Þ Determinant |P-λI| = 0Þ (p11- λ)(p22- λ)-p12p21 = 0Þ λ1=1, λ2 = 1 – p12 – p21 (replace above and confirm using some algebra)

|λ2| < 1

(normalized: π(1) to be a stationary distribution AND v(i) ∙π(i) = 1, ∀i)

Eigenvalues and Eigenvectors for 2-state Chains

36

Page 37: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Diagonalization

Þ Eigenvalue decomposition: P = U Λ U-1

Q: What is P(n)?

A:

=>

Q: How fast does the chain converge to stationary distrib.?

A: It converges exponentially fast in n, as (λ2)n 37

)()(

)()(

)()(

)()(

ππ

ππ

λ

λ

vv

vvUΛUP

22

21

12

11

2

1

22

12

21

111

0

0

)()(

)()(

n

n

)()(

)()(nn

ππ

ππ

λ

λ

vv

vvUΛUP

22

21

12

11

2

1

22

12

21

111

0

0

Page 38: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Generalization for M-state Markov Chains We’ll assume that there are M distinct eigenvalues

(see notes for repeated ones) Matrix P is stochastic all eigenvalues |λi| ≤ 1Q: Why?

A:

Q: How fast does an (ergodic) chain converge to stationary distribution?

A: Exponentially with rate equal to 2nd largest eigenvalue 38

1 UΛUP nn i

iini

nP )()( πv

)2()2(2

)2(1

)(1

)2(1

)2(1

221

1

1

1

1 n

n

nn

nn

v

v

v

P

Page 39: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Speed of Sampling on this Network?

λ2 (2nd largest eigenvalue) related to (balanced) min-cut of the graph

The more “partitioned” a graph is into clusters with few links between them the longer the convergence time for the respective MC the slower the random walk search

39

39

Page 40: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Community Detection - Clustering

Page 41: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Laplacian

P7-41

L= D-A=1

2 3

4

Diagonal matrix, dii=di

Page 42: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Weighted Laplacian

P7-42

1

2 3

410

0.3

2

4

Page 43: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

so, zero is an eigenvalue

If k connected components,

Fiedler (‘73) called “algebraic connectivity of a graph”The further from 0, the more connected.

Laplacian: fast facts

Page 44: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Connected Components

P7-44

G(V,E)

L=

eig(L)=

#zeros = #components

1 2 3

6

7 5

4

Page 45: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Connected Components

P7-45

G(V,E)

L=

eig(L)=

#zeros = #components

1 2 3

6

7 5

40.01

Indicates a “good cut”

Page 46: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Spectral Image Segmentation (Shi-Malik ‘00)

Page 47: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

The second eigenvector

Page 48: Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 1 Random Walk  Start from a given node at time 0  Choose a neighbor randomly.

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Second Eigenvector’s sparsest cut