Eurecom, Sophia-Antipolis Thrasyvoulos Spyropoulos / [email protected] Random Graph Models:...

Eurecom, Sophia-AntipolisThrasyvoulos Spyropoulos / [email protected]

Random Graph Models: Create/Explain Complex Network Properties

Thrasyvoulos Spyropoulos / [email protected] Eurecom, Sophia-Antipolis

Random Graph Models: Why do we Need Them? The networks discussed are quite large!

Impossible to describe or visualize explicitly.

Consider this example: You have a new Internet routing algorithm You want to evaluate it, but do not have a trace of the Internet

topology You decide to create an “Internet-like” graph on which you will

run your algorithm How do you describe/create this graph??

Random graphs: local and probabilistic rules by which vertices are connected

Goal: from simple probabilistic rules to observed complexity

Q: Which rules gives us (most of) the observed properties? 2


Emergence of Complexity

3


Emergent Complexity in Cellular Automata

This is “Conway’s game of life” (many other automata) http://www.youtube.com/watch?v=ma7dwLIEiYU&feature=

related (demo)

http://www.bitstorm.org/gameoflife/ (try your own)4

Local Rules Each cell either white or blue Each cell interacts with its 8 neighbors Time is discrete (rounds)1. Any blue cell with fewer than two live

neighbors becomes white2. Any blue cell with two or three blue

neighbors lives on to the round3. Any blue cell with more than three blue

neighbors becomes white4. Any white cell with exactly three blue

neighbors become blue

http://www.youtube.com/watch?v=ma7dwLIEiYU&feature=related

http://www.youtube.com/watch?v=ma7dwLIEiYU&feature=related

http://www.bitstorm.org/gameoflife/


Back to Networks: (Erdös-Rényi) Random Graphs A very (very!) simple local rule:

(any) two vertices are connected with probability p Only inputs: number of vertices n and probability p

Denote this class of graphs as G(n,p)

5

Erdös-Rényi model (1960)

Connect with probability p

p=1/6 N=10

average degree k ~ 1.5


N and p do not uniquely define the network– we can have many different realizations of it. How many?

P(G(N,L))pL (1 p)N (N 1)2

L

G(10,1/6)N=10 p=1/6

G(N,L): a graph with N nodes and L linksThe probability to form a particular graph G(N,L) is That is, each graph G(N,L)

appears with probability P(G(N,L)).

How Many Networks in G(n,p)?

2𝑁 (𝑁− 1)

2


P(L): the probability to have exactly L links in a network of N nodes and probability p:

P(L)N

2

L

pL (1 p)

N(N 1)2

L

The maximum number of links in a network of N nodes.

Number of different ways we can choose L links among all potential

links.

Binomial distribution...

Relation of G(N,p) to G(N,L)


P(L): the probability to have a network of exactly L links

P(L)N

2

L

pL (1 p)

N(N 1)2

L

L LP(L)pN(N 1)2L0

N(N 1)2

The average number of links <L> in a random graph

The standard deviation

2 p(1 p)N(N 1)2

)1( Npk

G(N,p) statistics

Average node degree <k>


As the network size increases, the distribution becomes increasingly narrow—which means that we are increasingly confident that the number of links the graph has is in the vicinity of <L>.

NO

NNp

p

L

1

)1(

212/1

G(N,p) as N ∞


The degree distribution average degree is <k> = p(N-1) variance σ2 = p(1-p)(N-1)

Assuming z=Np is fixed, as N → ∞,B(N,k,p) is approximated by a Poisson distribution

As N → ∞ Highly concentrated around the mean Probability of very high node degrees is exponentially small Very different from power law!

Random Graphs: Degree Distribution

zk

ek!

zz)P(k;p(k)

10

k1)(Nk p)(1pk

1Np)N,B(k;p(k)

1/2

1/2

k

1)(N

1

1)(N

1

p

p1

k

σ

Binomial


The secret behind the small world effect – Looking at the network volume

ddS 4)(

Are Erdos-Renyi (Poisson) Graphs Small-World?



d

x

dddxdN1

2~)1(24)(

Polynomial growth

The Volume of Geometric Graphs



d

x

dddxdN1

2~)1(24)(

Polynomial growth

The Exploding Volume of Random Graphs



d

x

dddxdN1

2~)1(24)(

Polynomial growth

dd

x

dx k

k

kkdN ~

1

1)(

1

1

Exponential growth

The Exploding Volume of Random Graphs (2)

k

Nd

Nd

Nk

k

d

ln

ln

log


klog

Nloglmax

Given the huge differences in scope, size, and average degree, the agreement is excellent!

Distance in Random Graphs Compare with Real Data


Random Graphs: Clustering Co-efficient

Consider a random graph G(n,p)Q: What is the probability that two of your neighbors are

also neighbors?A: It is equal to p, independent of local structure

clustering coefficient C = p

when z is fixed (sparse networks): C = z/n =O(1/n)

16


Given the huge differences in scope, size, and average degree, there is a clear disagreement.

Clustering in Random Graphs Compare with Real Data


Summary: Are Real Networks Random Graphs? Erdos-Renyi Graphs are “small world”

path lengths are O(logn)

Erdos-Renyi Graphs are not “scale-free” Degree distribution binomial and highly-concentrated (no

power-law) Exponentially small probability to have “hubs” (no heavy-tail)

Erdos-Renyi Graphs are not “clustered” C 0, as N becomes larger

Conclusion: ER random graphs are not a good model of real networks BUT: still provide a great deal of insight!

18

√

X

X


Some of your neighbors neighbors are also your own

Exponential growth: k

Nd

ln

ln

dkdS

kS

kS

)(

)2(

)1(2

Clustering inhibits the small-worldness

pkkN

dSdSdkSdS dd 21

)2()1(1)1()(

)1()1(

)2()3(

11

)2(

)1(

1)0(

32

22

kpkN

kpkNkSS

pkN

kNkS

kS

S

Poisson Graph Diameter: Growth is slightly slower


Small World Graphs: Watts-Strogatz Model Short paths must be combined with

High clustering coefficient

Watts and Strogatz model [WS98] Start with a ring, where every node is connected to the next k nodes With probability p, rewire every edge (or, add a shortcut) to a random

node

20

order randomness

p = 0 p = 10 < p < 1


Small World Graphs (2)

The Watts Strogatz Model: It takes a lot of randomness to ruin the clustering, but a very small amount to overcome locality 21

log-scale in p

When p = 0, C = 3(k-2)/4(k-1) ~ ¾ L = n/k

For small p, C ~ ¾ L ~ logn

Clustering Coefficient – Characteristic Path Length


Nodes: online user Links: email contact, tweet, or friendship

Alan Mislove, Measurement and Analysis of Online Social Networks

All distributions show a fat-tail behavior:there are orders of magnitude spread in the degrees

Online Social Networks


World Wide Web


Scale-free Graphs: What About Power Laws? The configuration model

input: the degree sequence [d1,d2,…,dn] process:

- Create di copies of node i; link them randomly

- Take a random matching (pairing) of the copies• self-loops and multiple edges are allowed

24

4 1 3 2

But: Too artificial!


Networks continuously expand by the addition of new nodes

Barabási & Albert, Science 286, 509 (1999)

ER, WS models: the number of nodes, N, is fixed (static models)

One Explanation of Scale-Free(ness): Growth


(1) Networks continuously expand by the addition of new nodes

Add a new node with m links

Barabási & Albert, Science 286, 509 (1999)

Growth Models


Barabási & Albert, Science 286, 509 (1999)jj

ii k

kk

)(

PREFERENTIAL ATTACHMENT:

the probability that a node connects to a node with k links is proportional to k.

A: New nodes prefer to link to highly connected nodes.

Q: Where will the new node link to?ER, WS models: choose randomly.

Growth Models: Preferential Attachment


Preferential Attachment in Networks

“The rich get richer”

First considered by [Price 65] as a model for citation networks each new paper is generated with m citations (on average) new papers cite previous papers with probability proportional

to their indegree (citations) what about papers without any citations?

- each paper is considered to have a “default” citation- probability of citing a paper with degree k, proportional to k+1

Power law with exponent α = 2+1/m

28


Barabasi-Albert model

The BA model (undirected graph) input: some initial subgraph G0, and m the number of edges

per new node the process:

- nodes arrive one at a time- each node connects to m other nodes selecting them with probability

proportional to their degree- if [d1,…,dt] is the degree sequence at time t, the node t+1 links to

node i with probability

Results in power-law with exponent α = 3

Various Problems: cannot account for every power law observed (Web), correlates age with degree, etc.

29

2mtd

dd i

i i

i


Comparison with Real Networks (B-A Model)

30

Path length Clust. Coeff.

ln 𝑁ln (ln𝑁 )

Larger than ER Still goes to 0 as N ∞

Eurecom, Sophia-AntipolisThrasyvoulos Spyropoulos / [email protected]

Network Resilience or How to Break a Network


Phase Transitions in Random Graphs

We saw that increasing p denser networks In the large N case we increase z = Np the average degree

But what really happens as p (or z) increases?

32

A random network on 50 nodes:p = 0.01 disconnected, largest component = 3


Phase Transitions in Random Graphs (2)

p = 0.03 large component appears But almost 40% of nodes still disconnected

33



p = 0.05 “giant” component emerges Only 3 nodes disconnected Giant component the graph “percolates”

34



p = 0.10 all nodes connected

35


S: the fraction of nodes in the giant component, S=NGC/N

there is a phase transition at <k>=1:

for <k> < 1 there is no giant component

for <k> > 1 there is a giant component

for large <k> the giant component contains all nodes (S=1)

http://linbaba.files.wordpress.com/2010/10/erdos-renyi.png

Connectivity (“Percolation”) of Random GraphsS

<k>


Network Resilience

Q1: How does the degree distribution affect resilience?Q2: How does the removal strategy affect resilience?

Def: network is still “functional” as long as there is a “giant component”

Def: “giant component” S contains a finite percentage of all nodes n as n ∞ S = c•n (or (Θ(n))

37


Uniform removal of vertices

Def: φ = probability a vertex not having been removed i.e. percentage of vertices present

38

φ = 1 φ = 0.7

φ = 0.3 φ = 0


Uniform removal of vertices

Assume degree distribution pk

Probability of a uniformly chosen node having k neighbors

Step 1: pick random vertex i Step 2: i not in giant cluster none of its neighbors is

in the giant clusterDef: u = average probability that a neighbor j does not

connect i to giant componentResult: If i has degree k Prob(i not in S) = uk

39


Graphical Solution for u

40

φc given at the point at which the curve is tangent to u = 1

tangent to u = 1

2 solutions for ugiant component S exists

u = 1: gives S = 0u = 1: threshold

11u(u)g1dud 1


Critical Threshold for Percolation φc

Critical Threshold depends on mean degree (<k>) and degree variability (<k2>)

41

kk k2c

11u(u)g1dud 1


Critical Threshold for Poisson Graphs

42

k!cep kck Degree distribution

Average degree <k> = c <k2> = c(c+1)

Critical threshold

Example: for average degree c = 4 φc = 0.25

75% vertices must be removed to “kill” the network

c1c


Critical Threshold for Power-Law Graphs

43

αk ckp Degree distribution

Fact: most networks exhibit a power-law degree distribution with α in the range (2,3)

Q: what is <k> and <k2> for these networks?A: <k> is finite but <k2> is infinite!

Q: What is the critical threshold?A: it is 0!! power-law graphs can “survive” any number

of failures


Size of giant cluster

Exponential degree distribution

44

Power-law degree distribution


Scale-free networks: resilient to random attack

gnutella (P2P) network 20% of nodes removed

574 nodes in giant component 427 nodes in giant component


Applications

Network attacks (good news) Real networks (social networks, internet, P2P

networks) are extremely robust to uniform removal attacks

The higher the variance of the degree distribution, the better

Malware Infections and Immunization (bad news) Epidemic occurs if a majority of nodes gets infected Stop virus/worm/etc. from spreading vaccinate/fix a

number of nodes --- Goal: disconnect “contact” graph Need to immunize a large majority of nodes to avoid

spread

46


Non-uniform Removal of Nodes

A more efficient attack: remove the highest degree nodes

47

Exponential degree distribution Power-law degree distribution


Targeted attacks are effective against scale-free

nets gnutella network, 22 most connected nodes removed (2.8% of the nodes)

301 nodes in giant component574 nodes in giant component


jkjkB gigiC /)()(

Where gjk = the number of shortest paths connecting j-k, and gjk = the number that node i is on.

Usually normalized by:

2' /)()( niCiC BB

Betweeness Centrality: Definition

49

betweenness of vertex i paths between j and k that pass through i

all paths between j and k


betweenness on toy networks

50

bridge


Nodes are sized by degree, and colored by betweenness.

Can you spot nodes with high betweenness but relatively low degree?

What about high degree but relatively low betweenness?

Betweeness vs. Degree Centrality

51


Why is Betweeness Centrality Important?Connectivitya) Remove random nodeb) Remove high degree nodec) Remove high betweeness node

52


Why is Betweeness Centrality Important? The network below is a wireless network (e.g. sensor

network) Nodes run on battery total energy Emax

Each node picks a destination randomly and sends data at constant rate every packet going through a node spends E of its energy

Q: How long would it take until the first node dies out of battery?

53

S1

D1

D2

S2


How About in This Network?

54


Why is Betweeness Centrality Important?Monitoringa) Where would you place a traffic monitor in order to

track the maximum number of packets (if this was your university network)?

b) Where would you place traffic cameras if that was a street network?

55


Why is Betweeness Centrality Important? Traffic Flow: Each link has capacity 1Q: What is the maximum throughput between S-D?A: Max Flow – Min Cut theorem max flow equal to min

number of links removed to disconnect S-D S-D throughput = 1

56

S

D

Eurecom, Sophia-Antipolis Thrasyvoulos Spyropoulos / [email protected] Random Graph Models:...

Documents

Transcript of Eurecom, Sophia-Antipolis Thrasyvoulos Spyropoulos / [email protected] Random Graph Models:...