“Power laws, network robustness, and Small World...

ECS 289 / MAE 298, Lecture 4April 10, 2014

“Power laws, network robustness, and SmallWorld Networks”

Network properties studied so far

• Node degree

• Edges: directed, weighted

• Adjacency Matrix

• Degree distribution

• Components

• Diameter

• Clustering coefficient (next slide)

Clustering Coefficient for a node i

• Ci for a vertex i is the proportion of links between the verticeswithin its neighborhood divided by the number of links thatcould possibly exist between them.

• Ci = (# number of links between neighbors of i, excluding i) /(total number of links that could exist between neighbors)

Ci = 2|ejk|/ki(ki − 1)

• where |ejk| is the total number of links between all nodes j andk that are connected to node i (NOT including i), and ki is thedegree of node i.

• Note the factor of 2 comes from the fact we are considering undirectededges , so the total number of edges that could exist between neighbors ofi is ki(ki − 1)/2.

Example of Ci for the blue node

•What is Ci if the nodeis disconnected, ki = 0?

Ci = 2|ejk|/ki(ki − 1)

(Typically assume zero)

• Network average clustering coefficient

C = 1n

∑ni=1Ci

What is a small-world

[Watts and Strogatz, “Collective dynamics of ‘small-world’networks”, Nature, 393 (1998)]

• Start with regular 1D lattice, with each node connected to it’s knearest neighbors.

• Randomly re-wire each link independently with probability p.

Ave shortest path L(p) andclustering coefficient C(p)

Nature © Macmillan Publishers Ltd 1998

8

letters to nature

NATURE | VOL 393 | 4 JUNE 1998 441

removed from a clustered neighbourhood to make a short cut has, atmost, a linear effect on C; hence C(p) remains practically unchangedfor small p even though L(p) drops rapidly. The important implica-tion here is that at the local level (as reflected by C(p)), the transitionto a small world is almost undetectable. To check the robustness ofthese results, we have tested many different types of initial regulargraphs, as well as different algorithms for random rewiring, and allgive qualitatively similar results. The only requirement is that therewired edges must typically connect vertices that would otherwisebe much farther apart than Lrandom.

The idealized construction above reveals the key role of shortcuts. It suggests that the small-world phenomenon might becommon in sparse networks with many vertices, as even a tinyfraction of short cuts would suffice. To test this idea, we havecomputed L and C for the collaboration graph of actors in featurefilms (generated from data available at http://us.imdb.com), theelectrical power grid of the western United States, and the neuralnetwork of the nematode worm C. elegans17. All three graphs are ofscientific interest. The graph of film actors is a surrogate for a socialnetwork18, with the advantage of being much more easily specified.It is also akin to the graph of mathematical collaborations centred,traditionally, on P. Erdos (partial data available at http://www.acs.oakland.edu/!grossman/erdoshp.html). The graph ofthe power grid is relevant to the efficiency and robustness ofpower networks19. And C. elegans is the sole example of a completelymapped neural network.

Table 1 shows that all three graphs are small-world networks.These examples were not hand-picked; they were chosen because oftheir inherent interest and because complete wiring diagrams wereavailable. Thus the small-world phenomenon is not merely acuriosity of social networks13,14 nor an artefact of an idealized

model—it is probably generic for many large, sparse networksfound in nature.

We now investigate the functional significance of small-worldconnectivity for dynamical systems. Our test case is a deliberatelysimplified model for the spread of an infectious disease. Thepopulation structure is modelled by the family of graphs describedin Fig. 1. At time t ! 0, a single infective individual is introducedinto an otherwise healthy population. Infective individuals areremoved permanently (by immunity or death) after a period ofsickness that lasts one unit of dimensionless time. During this time,each infective individual can infect each of its healthy neighbourswith probability r. On subsequent time steps, the disease spreadsalong the edges of the graph until it either infects the entirepopulation, or it dies out, having infected some fraction of thepopulation in the process.

p = 0 p = 1 Increasing randomness

Regular Small-world Random

Figure 1 Random rewiring procedure for interpolating between a regular ring

lattice and a random network, without altering the number of vertices or edges in

the graph. We start with a ring of n vertices, each connected to its k nearest

neighbours by undirected edges. (For clarity, n ! 20 and k ! 4 in the schematic

examples shown here, but much larger n and k are used in the rest of this Letter.)

We choose a vertex and the edge that connects it to its nearest neighbour in a

clockwise sense. With probability p, we reconnect this edge to a vertex chosen

uniformly at random over the entire ring, with duplicate edges forbidden; other-

wise we leave the edge in place. We repeat this process by moving clockwise

around the ring, considering each vertex in turn until one lap is completed. Next,

we consider the edges that connect vertices to their second-nearest neighbours

clockwise. As before, we randomly rewire each of these edges with probability p,

and continue this process, circulating around the ring and proceeding outward to

more distant neighbours after each lap, until each edge in the original lattice has

been considered once. (As there are nk/2 edges in the entire graph, the rewiring

process stops after k/2 laps.) Three realizations of this process are shown, for

different values of p. For p ! 0, the original ring is unchanged; as p increases, the

graph becomes increasingly disordered until for p ! 1, all edges are rewired

randomly. One of our main results is that for intermediate values of p, the graph is

a small-world network: highly clustered like a regular graph, yet with small

characteristic path length, like a random graph. (See Fig. 2.)

Table 1 Empirical examples of small-world networks

Lactual Lrandom Cactual Crandom.............................................................................................................................................................................Film actors 3.65 2.99 0.79 0.00027Power grid 18.7 12.4 0.080 0.005C. elegans 2.65 2.25 0.28 0.05.............................................................................................................................................................................Characteristic path length L and clustering coefficient C for three real networks, comparedto random graphs with the same number of vertices (n) and average number of edges pervertex (k). (Actors: n ! 225;226, k ! 61. Power grid: n ! 4;941, k ! 2:67. C. elegans: n ! 282,k ! 14.) The graphs are defined as follows. Two actors are joined by an edge if they haveacted in a film together. We restrict attention to the giant connected component16 of thisgraph, which includes !90% of all actors listed in the Internet Movie Database (available athttp://us.imdb.com), as of April 1997. For the power grid, vertices represent generators,transformers and substations, and edges represent high-voltage transmission linesbetween them. For C. elegans, an edge joins two neurons if they are connected by eithera synapse or a gap junction. We treat all edges as undirected and unweighted, and allvertices as identical, recognizing that these are crude approximations. All three networksshow the small-world phenomenon: L ! Lrandom but C q Crandom.

0

0.2

0.4

0.6

0.8

1

0.0001 0.001 0.01 0.1 1

p

L(p) / L(0)

C(p) / C(0)

Figure 2 Characteristic path length L(p) and clustering coefficient C(p) for the

family of randomly rewired graphs described in Fig. 1. Here L is defined as the

number of edges in the shortest path between two vertices, averaged over all

pairs of vertices. The clustering coefficient C(p) is defined as follows. Suppose

that a vertex v has kv neighbours; then at most kv"kv " 1#=2 edges can exist

between them (this occurs when every neighbour of v is connected to everyother

neighbour of v). Let Cv denote the fraction of these allowable edges that actually

exist. Define C as the average of Cv over all v. For friendship networks, these

statistics have intuitive meanings: L is the average number of friendships in the

shortest chain connecting two people; Cv reflects the extent to which friends of v

are also friends of each other; and thus C measures the cliquishness of a typical

friendship circle. The data shown in the figure are averages over 20 random

realizations of the rewiring process described in Fig.1, and have been normalized

by the values L(0), C(0) for a regular lattice. All the graphs have n ! 1;000 vertices

and an average degree of k ! 10 edges per vertex. We note that a logarithmic

horizontal scale has been used to resolve the rapid drop in L(p), corresponding to

the onset of the small-world phenomenon. During this drop, C(p) remains almost

constant at its value for the regular lattice, indicating that the transition to a small

world is almost undetectable at the local level.

• Small-worlds have small diameter and large clusteringcoefficient.• They are remarkably easy to generate (just a tiny p required).

Nature © Macmillan Publishers Ltd 1998

8

letters to nature

NATURE | VOL 393 | 4 JUNE 1998 441

removed from a clustered neighbourhood to make a short cut has, atmost, a linear effect on C; hence C(p) remains practically unchangedfor small p even though L(p) drops rapidly. The important implica-tion here is that at the local level (as reflected by C(p)), the transitionto a small world is almost undetectable. To check the robustness ofthese results, we have tested many different types of initial regulargraphs, as well as different algorithms for random rewiring, and allgive qualitatively similar results. The only requirement is that therewired edges must typically connect vertices that would otherwisebe much farther apart than Lrandom.

The idealized construction above reveals the key role of shortcuts. It suggests that the small-world phenomenon might becommon in sparse networks with many vertices, as even a tinyfraction of short cuts would suffice. To test this idea, we havecomputed L and C for the collaboration graph of actors in featurefilms (generated from data available at http://us.imdb.com), theelectrical power grid of the western United States, and the neuralnetwork of the nematode worm C. elegans17. All three graphs are ofscientific interest. The graph of film actors is a surrogate for a socialnetwork18, with the advantage of being much more easily specified.It is also akin to the graph of mathematical collaborations centred,traditionally, on P. Erdos (partial data available at http://www.acs.oakland.edu/!grossman/erdoshp.html). The graph ofthe power grid is relevant to the efficiency and robustness ofpower networks19. And C. elegans is the sole example of a completelymapped neural network.

Table 1 shows that all three graphs are small-world networks.These examples were not hand-picked; they were chosen because oftheir inherent interest and because complete wiring diagrams wereavailable. Thus the small-world phenomenon is not merely acuriosity of social networks13,14 nor an artefact of an idealized

model—it is probably generic for many large, sparse networksfound in nature.

We now investigate the functional significance of small-worldconnectivity for dynamical systems. Our test case is a deliberatelysimplified model for the spread of an infectious disease. Thepopulation structure is modelled by the family of graphs describedin Fig. 1. At time t ! 0, a single infective individual is introducedinto an otherwise healthy population. Infective individuals areremoved permanently (by immunity or death) after a period ofsickness that lasts one unit of dimensionless time. During this time,each infective individual can infect each of its healthy neighbourswith probability r. On subsequent time steps, the disease spreadsalong the edges of the graph until it either infects the entirepopulation, or it dies out, having infected some fraction of thepopulation in the process.

p = 0 p = 1 Increasing randomness

Regular Small-world Random

Figure 1 Random rewiring procedure for interpolating between a regular ring

lattice and a random network, without altering the number of vertices or edges in

the graph. We start with a ring of n vertices, each connected to its k nearest

neighbours by undirected edges. (For clarity, n ! 20 and k ! 4 in the schematic

examples shown here, but much larger n and k are used in the rest of this Letter.)

We choose a vertex and the edge that connects it to its nearest neighbour in a

clockwise sense. With probability p, we reconnect this edge to a vertex chosen

uniformly at random over the entire ring, with duplicate edges forbidden; other-

wise we leave the edge in place. We repeat this process by moving clockwise

around the ring, considering each vertex in turn until one lap is completed. Next,

we consider the edges that connect vertices to their second-nearest neighbours

clockwise. As before, we randomly rewire each of these edges with probability p,

and continue this process, circulating around the ring and proceeding outward to

more distant neighbours after each lap, until each edge in the original lattice has

been considered once. (As there are nk/2 edges in the entire graph, the rewiring

process stops after k/2 laps.) Three realizations of this process are shown, for

different values of p. For p ! 0, the original ring is unchanged; as p increases, the

graph becomes increasingly disordered until for p ! 1, all edges are rewired

randomly. One of our main results is that for intermediate values of p, the graph is

a small-world network: highly clustered like a regular graph, yet with small

characteristic path length, like a random graph. (See Fig. 2.)

Table 1 Empirical examples of small-world networks

Lactual Lrandom Cactual Crandom.............................................................................................................................................................................Film actors 3.65 2.99 0.79 0.00027Power grid 18.7 12.4 0.080 0.005C. elegans 2.65 2.25 0.28 0.05.............................................................................................................................................................................Characteristic path length L and clustering coefficient C for three real networks, comparedto random graphs with the same number of vertices (n) and average number of edges pervertex (k). (Actors: n ! 225;226, k ! 61. Power grid: n ! 4;941, k ! 2:67. C. elegans: n ! 282,k ! 14.) The graphs are defined as follows. Two actors are joined by an edge if they haveacted in a film together. We restrict attention to the giant connected component16 of thisgraph, which includes !90% of all actors listed in the Internet Movie Database (available athttp://us.imdb.com), as of April 1997. For the power grid, vertices represent generators,transformers and substations, and edges represent high-voltage transmission linesbetween them. For C. elegans, an edge joins two neurons if they are connected by eithera synapse or a gap junction. We treat all edges as undirected and unweighted, and allvertices as identical, recognizing that these are crude approximations. All three networksshow the small-world phenomenon: L ! Lrandom but C q Crandom.

0

0.2

0.4

0.6

0.8

1

0.0001 0.001 0.01 0.1 1

p

L(p) / L(0)

C(p) / C(0)

Figure 2 Characteristic path length L(p) and clustering coefficient C(p) for the

family of randomly rewired graphs described in Fig. 1. Here L is defined as the

number of edges in the shortest path between two vertices, averaged over all

pairs of vertices. The clustering coefficient C(p) is defined as follows. Suppose

that a vertex v has kv neighbours; then at most kv"kv " 1#=2 edges can exist

between them (this occurs when every neighbour of v is connected to everyother

neighbour of v). Let Cv denote the fraction of these allowable edges that actually

exist. Define C as the average of Cv over all v. For friendship networks, these

statistics have intuitive meanings: L is the average number of friendships in the

shortest chain connecting two people; Cv reflects the extent to which friends of v

are also friends of each other; and thus C measures the cliquishness of a typical

friendship circle. The data shown in the figure are averages over 20 random

realizations of the rewiring process described in Fig.1, and have been normalized

by the values L(0), C(0) for a regular lattice. All the graphs have n ! 1;000 vertices

and an average degree of k ! 10 edges per vertex. We note that a logarithmic

horizontal scale has been used to resolve the rapid drop in L(p), corresponding to

the onset of the small-world phenomenon. During this drop, C(p) remains almost

constant at its value for the regular lattice, indicating that the transition to a small

world is almost undetectable at the local level.

Network models studied so far

• Erdos-Renyi random graphs, G(N, p)– Initialized with N isolated nodes– Edges arrive in discrete time process with uniform prob.– Poisson degree distribution– No clustering– Emergence of a giant component

• Preferential attachment graphs– Initialized with one (or a small set) of seed nodes– Nodes arrive and attach with m edges choosing “parent” withprob proportional to degree, qk,t = k/2mt.– Power law deg dist with γ = 3– Clustering tuned by setting m– Fully connected network by construction

• Small-world networksMuch more on this later

Difference between ER and PA is not due to edge versusnode arrival

• Node-arrival “Erdos-Renyi graph”Callaway, Hopcroft, Kleinberg, Newman, Strogatz. Phys Rev E 64 (2001).– At each discrete time step a new node arrives, and with probability δ anew randomly selected edge arrives.– Emergence of giant component only if δ ≥ 1/8.– (That “giant” is finite even as n→∞).– Positive degree-degree correlations (higher degree by virtue of age).

• Edge-arrival PA graphK.-I. Goh, B. Kahng, D. Kim, Phys. Rev. Lett. 87, 278701 (2001).F. Chung and L. Lu, Annals of Combinatorics 6, 125 (2002).– Initialized with N isolated nodes, labeled i ∈ {1, 2, ..., N}, where eachnode i has a weight wi = (i+ i0 − 1)−µ.– Two vertices (i, j) selected with probability wi/

∑kwk and wj/

∑kwk

respectively and connected by an edge.

– (Master eqn analysis: Lee, Goh, Kahng and Kim, Nucl. Phys. B 696, 351 (2004).)

Summary of “Master eqn” / rate eqn approach

• Let nk,t denote the expected (i.e. average) number of nodesof degree k expected at time t into the process. (Note nk,t is areal number, not an integer.)

• Write nk,t+1 in terms of the nk,t’s, accounting for the rates atwhich node degree is expected to change.

• Translate from nk,t to pk,t = nk,t/nt, which is equal to nk,t/t forPA.

• Assume pk,t→ pk.

• Solve for a recurrence relation for the pk’s.

Preferential Attachment and “Scale-free networks”Why a power law is “scale-free”

• Power law for “x”, means “scale-free” in x:

p(bx) = (bx)−γ = b−γp(x)

p(bk)p(k) = b−γ regardless of k.

In contrast consider: p(k) = A exp(−k).

So p(bk) = A exp(−bk).

p(bk)p(k) = exp[−k(b− 1)] dependent on k

Self-similar/scale-free fractal structures

Sierpinski Sieve/Gasket/Fractal, N ∼ rd.

When r doubles, N triples: 3 = 2d

d = logN/ log r = log 3/ log 2

“Fractal dimension” of a network

http://en.wikipedia.org/wiki/Fractal dimension on networks

Power law degree distribution 6= “scale-free network”

• Power law for “x”, means “scale-free” in x.

• BUT only for that aspect, “x”. May have a lot of differentstructures at different scales.

• More precise: “network with scale-free degree distribution”

“Scale-rich” networks• L. Li and D. Alderson and W. Willinger and J. Doyle, Proceedings of ACM SIGCOMM, 2004;

• Doyle, Alderson, Li, Low, Roughan, Shalunov, Tanaka, Willinger PNAS, 2005.

the resulting models are widely conjectured to be asymptoticallyequivalent (e.g., see ref. 6 and references therein).

In particular, for a graph g having degree sequence D, wedefine the purely graph-theoretic quantity s(g) ! "(i, j)!E(g)didj,where E(g) is the set of edges in the graph. It is easy to check thathigh s(g) requires high-degree vertices to connect to otherhigh-degree vertices. Normalizing against smax ! max{s(g): g !G(D)}, we define the measure 0 ! S(g) ! 1 of the graph g asS(g) ! s(g)!smax. Although s(g) and S(g) can be computed for anygraph and do not depend on any particular construction mech-anism, they have a special meaning in the context of ensemblesof graphs. Specifically, S(g) has a direct interpretation as therelative log-likelihood of a graph resulting from the generalizedrandom-graph construction (17); thus, all of the SF-model–generation mechanisms generate essentially only high S graphs.The S-metric also potentially unifies other aspects of SF graphs,because it is closely related to betweenness, degree correlation(6), and graph assortativity (18) and captures several notions ofself-similarity related to graph trimming, coarse graining, andrandom rewiring (6).

The focus on ensemble-based methods means that the analysis inSF models has implicitly ignored those graphs that are unlikely toresult from such constructions, in particular graphs with small S.Thus, although power-law degree distributions are unlikely undersome traditional random graph constructions [e.g., Erdos–Renyırandom graphs (19)], there are a multitude of other model-generation mechanisms that give rise to power laws (20). TheSF-generating mechanisms are only one kind, but they tend togenerate only high S graphs, which leaves unexplored an enormousdiversity of low S graphs, as seen in Fig. 1. The graphs in Fig. 1 aand b are relatively likely to result from probabilistic construction,whereas the graphs in Fig. 1 c and d are vanishingly unlikely. ThePA-type graph shown in Fig. 1a has S(ga) ! 0.61 and is typical ofthe graphs that are likely under a variety of random-generationmethods. The graph shown in Fig. 1b is the smax graph and thus bydefinition has S(gb) ! 1.0. It can be thought of both as the mostlikely graph and also (uniquely) as the most ‘‘perfectly’’ SF graphwith this degree sequence. Of course, the sheer enormity of thenumber of different high S graphs means that any particular one

graph, even the relatively most likely, is actually unlikely in absoluteterms to be selected. The graphs in Fig. 1 c and d have the valuesS(gc) ! 0.33 and S(gd) ! 0.34, respectively; furthermore, there arerelatively few graphs with S values this low, and thus any graphssimilar to these are vanishingly unlikely to arise at random (6). Theremainder of this article explains in more detail why the underlyingforces at work in the evolution of the real router-level Internet avoidthe generation of high S graphs and how this feature can becaptured in an optimization-based design framework. We alsoconsider what, if anything, this framework has to say about the RYFnature of the Internet.

A Look at the Actual InternetAn obvious starting point for investigating the structure andunderlying forces at work in the Internet is to inspect detailedrouter-level maps from Internet service providers (ISPs).Abilene, the backbone for the Internet2 academic network, isillustrated in Fig. 1 and is an ideal example for many reasons thatwill be exploited throughout this analysis.** Abilene publishesdetailed hardware specifications for each router and link, so Fig.1 is exact, not an approximation based on indirect measure-ments. Abilene is also a state-of-the-art network with essentiallyno difference between physical (i.e., layer two) and Internet-protocol (IP) (i.e., layer three) connectivity. This simplifies theexposition without loss of generality and also eliminates a sourceof confusion in measured data from networks that use olderlegacy technologies. Using regional academic networks andcommercial ISPs, we verified that all the inferences and conclu-sions based on Abilene hold in general. Commercial ISPs do notallow publishing such details because of proprietary consider-ations, but router-level measurement studies (21, 22, ††) furtherconfirm our analysis (7, 23, 24), although this requires additionalstatistical and Internet-specific expertise beyond the intended scopeof this article.

**Detailed information about the objectives, organization, and development of theAbilene network are available from www.internet2.edu!abilene.

††SKITTER Project. Cooperative Association for Internet Data Analysis, University of Cali-fornia San Diego Supercomputing Center (www.caida.org).

Fig. 1. Diversity among graphs having the same degree sequence D. (a) RNDnet: a network consistent with construction by PA. The two networks representthe same graph, but the figure on the right is redrawn to emphasize the role that high-degree hubs play in overall network connectivity. (b) SFnet: a graph havingthe most preferential connectivity, again drawn both as an incremental growth type of network and in a form that emphasizes the importance of high-degreenodes. (c) BADNet: a poorly designed network with overall connectivity constructed from a chain of vertices. (d) HOTnet: a graph constructed to be a simplifiedversion of the Abilene network shown in Fig. 2. (e) Power-law degree sequence D for networks shown in a–d. Only di # 1 is shown.

14498 " www.pnas.org!cgi!doi!10.1073!pnas.0501426102 Doyle et al.

All these networks have same degree distribution, but verydifferent internal structures.

Robustness of a network

• Robustness/Resilience: A network should be able to absorbdisturbance, undergo change and essentially maintain itsfunctionality despite failure of individual components of thenetwork.

• Often studied as maintaining connectivity despite node andedge deletion.

Albert, Jeong and Barabasi, “Error and attack tolerance ofcomplex networks”, Nature, 406 (27) 2000.

letters to nature

NATURE | VOL 406 | 27 JULY 2000 | www.nature.com 379

called scale-free networks, which include the World-Wide Web3–5,the Internet6, social networks7 and cells8. We find that suchnetworks display an unexpected degree of robustness, the abilityof their nodes to communicate being unaffected even by un-realistically high failure rates. However, error tolerance comes at ahigh price in that these networks are extremely vulnerable toattacks (that is, to the selection and removal of a few nodes thatplay a vital role in maintaining the network’s connectivity). Sucherror tolerance and attack vulnerability are generic properties ofcommunication networks.

The increasing availability of topological data on large networks,aided by the computerization of data acquisition, had led to greatadvances in our understanding of the generic aspects of networkstructure and development9–16. The existing empirical and theo-retical results indicate that complex networks can be divided intotwo major classes based on their connectivity distribution P(k),giving the probability that a node in the network is connected to kother nodes. The first class of networks is characterized by a P(k)that peaks at an average !k" and decays exponentially for large k. Themost investigated examples of such exponential networks are therandom graph model of Erdos and Renyi9,10 and the small-worldmodel of Watts and Strogatz11, both leading to a fairly homogeneousnetwork, in which each node has approximately the same numberof links, k ! !k". In contrast, results on the World-Wide Web(WWW)3–5, the Internet6 and other large networks17–19 indicatethat many systems belong to a class of inhomogeneous networks,called scale-free networks, for which P(k) decays as a power-law,that is P!k""k! g, free of a characteristic scale. Whereas the prob-ability that a node has a very large number of connections (k q !k")is practically prohibited in exponential networks, highly connectednodes are statistically significant in scale-free networks (Fig. 1).

We start by investigating the robustness of the two basic con-nectivity distribution models, the Erdos–Renyi (ER) model9,10 thatproduces a network with an exponential tail, and the scale-freemodel17 with a power-law tail. In the ER model we first define the Nnodes, and then connect each pair of nodes with probability p. Thisalgorithm generates a homogeneous network (Fig. 1), whose con-nectivity follows a Poisson distribution peaked at !k" and decayingexponentially for k q !k".

The inhomogeneous connectivity distribution of many real net-works is reproduced by the scale-free model17,18 that incorporatestwo ingredients common to real networks: growth and preferentialattachment. The model starts with m0 nodes. At every time step t anew node is introduced, which is connected to m of the already-existing nodes. The probability !i that the new node is connectedto node i depends on the connectivity ki of node i such that!i # ki=Sjkj. For large t the connectivity distribution is a power-law following P!k" # 2m2=k3.

The interconnectedness of a network is described by its diameterd, defined as the average length of the shortest paths between anytwo nodes in the network. The diameter characterizes the ability oftwo nodes to communicate with each other: the smaller d is, theshorter is the expected path between them. Networks with a verylarge number of nodes can have quite a small diameter; for example,the diameter of the WWW, with over 800 million nodes20, is around19 (ref. 3), whereas social networks with over six billion individuals

Exponential Scale-free

ba

Figure 1 Visual illustration of the difference between an exponential and a scale-freenetwork. a, The exponential network is homogeneous: most nodes have approximatelythe same number of links. b, The scale-free network is inhomogeneous: the majority ofthe nodes have one or two links but a few nodes have a large number of links,guaranteeing that the system is fully connected. Red, the five nodes with the highestnumber of links; green, their first neighbours. Although in the exponential network only27% of the nodes are reached by the five most connected nodes, in the scale-freenetwork more than 60% are reached, demonstrating the importance of the connectednodes in the scale-free network Both networks contain 130 nodes and 215 links(!k " # 3:3). The network visualization was done using the Pajek program for largenetwork analysis: !http://vlado.fmf.uni-lj.si/pub/networks/pajek/pajekman.htm".

0.00 0.01 0.0210

15

20

0.00 0.01 0.020

5

10

15

0.00 0.02 0.044

6

8

10

12a

b c

f

d

Internet WWW

Attack

Failure

Attack

Failure

SFE

AttackFailure

Figure 2 Changes in the diameter d of the network as a function of the fraction f of theremoved nodes. a, Comparison between the exponential (E) and scale-free (SF) networkmodels, each containing N # 10;000 nodes and 20,000 links (that is, !k " # 4). The bluesymbols correspond to the diameter of the exponential (triangles) and the scale-free(squares) networks when a fraction f of the nodes are removed randomly (error tolerance).Red symbols show the response of the exponential (diamonds) and the scale-free (circles)networks to attacks, when the most connected nodes are removed. We determined the fdependence of the diameter for different system sizes (N # 1;000; 5,000; 20,000) andfound that the obtained curves, apart from a logarithmic size correction, overlap withthose shown in a, indicating that the results are independent of the size of the system. Wenote that the diameter of the unperturbed (f # 0) scale-free network is smaller than thatof the exponential network, indicating that scale-free networks use the links available tothem more efficiently, generating a more interconnected web. b, The changes in thediameter of the Internet under random failures (squares) or attacks (circles). We used thetopological map of the Internet, containing 6,209 nodes and 12,200 links (!k " # 3:4),collected by the National Laboratory for Applied Network Research !http://moat.nlanr.net/Routing/rawdata/". c, Error (squares) and attack (circles) survivability of the World-WideWeb, measured on a sample containing 325,729 nodes and 1,498,353 links3, such that!k " # 4:59.

© 2000 Macmillan Magazines Ltd

N=130, E=215, Red five highest degree nodes; Green their neighbors.

• Exp has 27% of green nodes, SF has 60%.

• PLRG: Connectivity extremely robust to random failure.

• PLRG: Connectivity extremely fragile to targeted attack(removal of highest degree nodes).

Exponential vs scale-free: Robustness

letters to nature








ba


0.00 0.01 0.0210

15

20

0.00 0.01 0.020

5

10

15

0.00 0.02 0.044

6

8

10

12a

b c

f

d

Internet WWW

Attack

Failure

Attack

Failure

SFE

AttackFailure



• (Remember, bigger diameter is worse.)

• SF are extremely robust to random failure (blue squares). Remove fractionof nodes at random, and no change in diameter.

• SF are very fragile to targeted attack (removal of highest degree nodes).

Histogram of a typical PA run

Degree distribution (Here N=500)

In-degree,k

Density

0 10 20 30 40 50 60 70

0.0

0.1

0.2

0.3

0.4

0.5

0.6

• Choosing node at random overwhelmingly leads to low degree node

Degree-targeted removal on real sample topologies

letters to nature








ba


0.00 0.01 0.0210

15

20

0.00 0.01 0.020

5

10

15

0.00 0.02 0.044

6

8

10

12a

b c

f

dInternet WWW

Attack

Failure

Attack

Failure

SFE

AttackFailure



• Used the topological map of the Internet, containing 6,209 nodes and12,200 links < k >= 3.4), collected (in 1999 or 2000) by the NationalLaboratory for Applied Network Researchhttp://moat.nlanr.net/Routing/rawdata/

• World-Wide Web data measured on a sample containing 325,729 nodesand 1,498,353 links, such that < k >= 4.59.

Albert, Jeong and Barabasi, Nature, 406 (27) 2000

“The Achilles Heel of the Internet”

• “How robust is the Internet?” Yuhai Tu,Nature (New and Views) 406 (27) 2000.

• “Scientists spot Achilles heel of the Internet”,CNN, July 26, 2000.

Percolation theory to show the similar results follow in ananalytic mathematical formulation

• R. Cohen, K. Erez, D. ben-Avraham, and S. Havlin,“Resilience of the Internet to Random Breakdowns”,Phys. Rev. Lett. 85, 4626 (2000).

• Callaway, Duncan S.; M. E. J. Newman, S. H. Strogatz and D.J. Watts,“Network Robustness and Fragility: Percolation on RandomGraphs”.Phys. Rev. Lett. 85: 546871 (2000).

• 〈k〉 finite, but⟨k2⟩→ ∞ for 2 < γ < 3, the cornerstone for the

arguments.

Does the ensemble of random graphs reallymodel engineered or biological systems?

(Is the Internet a random scale-free graph?)

Random vs engineered vs evolved (e.g. biological) systems

• REDUNDANCY!!! a key principle in engineering(and evolution?).

• The ‘robust yet fragile’ nature of the InternetDoyle, Alderson, Li, Low, Roughan, Shalunov, Tanaka, Willinger, PNAS 102(4) 2005.

the resulting models are widely conjectured to be asymptoticallyequivalent (e.g., see ref. 6 and references therein).

In particular, for a graph g having degree sequence D, wedefine the purely graph-theoretic quantity s(g) ! "(i, j)!E(g)didj,where E(g) is the set of edges in the graph. It is easy to check thathigh s(g) requires high-degree vertices to connect to otherhigh-degree vertices. Normalizing against smax ! max{s(g): g !G(D)}, we define the measure 0 ! S(g) ! 1 of the graph g asS(g) ! s(g)!smax. Although s(g) and S(g) can be computed for anygraph and do not depend on any particular construction mech-anism, they have a special meaning in the context of ensemblesof graphs. Specifically, S(g) has a direct interpretation as therelative log-likelihood of a graph resulting from the generalizedrandom-graph construction (17); thus, all of the SF-model–generation mechanisms generate essentially only high S graphs.The S-metric also potentially unifies other aspects of SF graphs,because it is closely related to betweenness, degree correlation(6), and graph assortativity (18) and captures several notions ofself-similarity related to graph trimming, coarse graining, andrandom rewiring (6).

The focus on ensemble-based methods means that the analysis inSF models has implicitly ignored those graphs that are unlikely toresult from such constructions, in particular graphs with small S.Thus, although power-law degree distributions are unlikely undersome traditional random graph constructions [e.g., Erdos–Renyırandom graphs (19)], there are a multitude of other model-generation mechanisms that give rise to power laws (20). TheSF-generating mechanisms are only one kind, but they tend togenerate only high S graphs, which leaves unexplored an enormousdiversity of low S graphs, as seen in Fig. 1. The graphs in Fig. 1 aand b are relatively likely to result from probabilistic construction,whereas the graphs in Fig. 1 c and d are vanishingly unlikely. ThePA-type graph shown in Fig. 1a has S(ga) ! 0.61 and is typical ofthe graphs that are likely under a variety of random-generationmethods. The graph shown in Fig. 1b is the smax graph and thus bydefinition has S(gb) ! 1.0. It can be thought of both as the mostlikely graph and also (uniquely) as the most ‘‘perfectly’’ SF graphwith this degree sequence. Of course, the sheer enormity of thenumber of different high S graphs means that any particular one

graph, even the relatively most likely, is actually unlikely in absoluteterms to be selected. The graphs in Fig. 1 c and d have the valuesS(gc) ! 0.33 and S(gd) ! 0.34, respectively; furthermore, there arerelatively few graphs with S values this low, and thus any graphssimilar to these are vanishingly unlikely to arise at random (6). Theremainder of this article explains in more detail why the underlyingforces at work in the evolution of the real router-level Internet avoidthe generation of high S graphs and how this feature can becaptured in an optimization-based design framework. We alsoconsider what, if anything, this framework has to say about the RYFnature of the Internet.

A Look at the Actual InternetAn obvious starting point for investigating the structure andunderlying forces at work in the Internet is to inspect detailedrouter-level maps from Internet service providers (ISPs).Abilene, the backbone for the Internet2 academic network, isillustrated in Fig. 1 and is an ideal example for many reasons thatwill be exploited throughout this analysis.** Abilene publishesdetailed hardware specifications for each router and link, so Fig.1 is exact, not an approximation based on indirect measure-ments. Abilene is also a state-of-the-art network with essentiallyno difference between physical (i.e., layer two) and Internet-protocol (IP) (i.e., layer three) connectivity. This simplifies theexposition without loss of generality and also eliminates a sourceof confusion in measured data from networks that use olderlegacy technologies. Using regional academic networks andcommercial ISPs, we verified that all the inferences and conclu-sions based on Abilene hold in general. Commercial ISPs do notallow publishing such details because of proprietary consider-ations, but router-level measurement studies (21, 22, ††) furtherconfirm our analysis (7, 23, 24), although this requires additionalstatistical and Internet-specific expertise beyond the intended scopeof this article.

**Detailed information about the objectives, organization, and development of theAbilene network are available from www.internet2.edu!abilene.

††SKITTER Project. Cooperative Association for Internet Data Analysis, University of Cali-fornia San Diego Supercomputing Center (www.caida.org).

Fig. 1. Diversity among graphs having the same degree sequence D. (a) RNDnet: a network consistent with construction by PA. The two networks representthe same graph, but the figure on the right is redrawn to emphasize the role that high-degree hubs play in overall network connectivity. (b) SFnet: a graph havingthe most preferential connectivity, again drawn both as an incremental growth type of network and in a form that emphasizes the importance of high-degreenodes. (c) BADNet: a poorly designed network with overall connectivity constructed from a chain of vertices. (d) HOTnet: a graph constructed to be a simplifiedversion of the Abilene network shown in Fig. 2. (e) Power-law degree sequence D for networks shown in a–d. Only di # 1 is shown.

14498 " www.pnas.org!cgi!doi!10.1073!pnas.0501426102 Doyle et al.

• Degree distribution is not the whole story.

Wikipedia entry on “scale-free networks”

• Good discussion of the history and controversy

– Faloutsos SIGCOMM 1999 paper on power law in Internetbased on trace route sampling.

1

10

100

1000

10000

1 10 100

"971108.out"exp(7.68585) * x ** ( -2.15632 )

1

10

100

1000

10000

1 10 100

"980410.out"exp(7.89793) * x ** ( -2.16356 )

1

10

100

1000

10000

1 10 100

"981205.out"exp(8.11393) * x ** ( -2.20288 )

1

10

100

1000

10000

1 10 100

"routes.out"exp(8.52124) * x ** ( -2.48626 )

– Although many real-world networks are thought to be scale-free, the evidence often remains inconclusive, primarily dueto the developing awareness of more rigorous data analysistechniques.

Effectively breaking up different networks

What other types of nodes play key roles?

Other types of important nodesA classic example from Social Network Analysis (SNA)

[http://www.fsu.edu/∼spap/water/network/intro.htm]

The “Kite Network”

Who is important and why?

The Kite Network

• Degree – Diane looks important (a “hub”).

• Betweenness – Heather looks important (a “connector”/“broker”).

• Closeness – Fernando and Garth can access anyone via ashort path.

• Boundary spanners – as Fernando, Garth, and Heather arewell-positioned to be “innovators”.

• Peripheral Players – Ike and Jane may be an importantresources for fresh information.

A contemporary social network

(Taken from http://www.thenetworkthinkers.com/)

Betweenness Centrality

[Freeman, L. C. “A set of measures of centrality based onbetweenness.” Sociometry 40 1977]

A measure of how many shortest paths between all othervertices pass through a given vertex.

Betweenness (formal definition)

For a given vertex i:

B(i) =∑

s 6=t 6=iσst(i)σst

• Where σst is the number of shortest geodesic paths betweens and t.

• And σst(i) are the number of those passing through vertex i.

(Calculating shortest paths efficiently ...http://en.wikipedia.org/wiki/Dijkstra’s algorithm )

Betweenness and eigenvalues(bottlenecks)

●

●●

● ●

●●

●

●

● ●

●●

●●

●

●●●

●

●

●●

●

●

●

●

●● ●

●

●●

● ●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

● ●●●●

●

●

●●

●●

●●●●

●●●

●

●

0 5 10 20

05

1020

X

Y ++

+++

++

+

+ ++

++

++

++

+++++

++

++

++o

o

o

o

o

o

o

oo

o

oo

o

oo

o

o

o

o

oo

oo

oo

oo

o

o o

o o

o

o

o

oo

o

oo

oo

o

R = 7.513τ[min] = 109 to

●●

●

●

●●●

●●●

● ●●

●

●

●

●

●●

●

●●

●

●●●

●●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

0 5 10 20

05

1020

X

Y

+++ +

++

+++

+ ++ +

++

+

+

+

++

+

++

++

++

+

+

+

+

+

+

oo o

oo

o

oo

o

o

oo o

o

oo

oo

o

o

oo

o

o

oo

o

o

o

oo

o

o

oo

o

oo

o

o

o

o

R = 4.61τ[med] = 604 to

●●

●

●

● ●

●●

●●

● ●

●●

●●

●●

●

●

●●

●

●

● ●●

●●

●

● ●

●

●● ●

●

●

●

●

●

●● ●●

●

●

●

●

●

●●●

●●

●

●●

●

●

●

●●

● ●● ●

●●●

●●

●

●●

● ●

●

0 5 10 20

05

1020

X

Y

++++

+

+++++

+++

++

++

++

++

+++

+

+

+

+ +++

++

++ ++

+++

++

oo oo

o o oo oo

o

o ooo

o

ooo

o

o

o

o

o

o

o

o o

o

R = 5.315τ[max] = 5314 to

• Bottlenecks have large betweenness values.

• In social networks betweenness is a measure of a nodes“centrality” and importance (could be a proxy for influence).

• In a road network, high betweenness could indicate wherealternate routes are needed.

• Also a measure of the resilience of a network (next page).

Targeted attack by different metrics

Holme P, Kim BJ, Yoon CN, Han SK (2002) “Attackvulnerability of complex networks”. Phys. Rev. E 65:056109

• Degree centrality

• Betweeness centrality

Typically (but not always) high degree are high betweeness.

High betweeness the more effective strategy to break up anetwork’s connectivity.

But back to Albert, Jeong and Barabasi

letters to nature








ba


0.00 0.01 0.0210

15

20

0.00 0.01 0.020

5

10

15

0.00 0.02 0.044

6

8

10

12a

b c

f

dInternet WWW

Attack

Failure

Attack

Failure

SFE

AttackFailure



So why did Albert, Jeong and Barabasi find thattheir sample of the internet topology was vulnerableto degree targeted attack?

How to measure the structure of the Internet?

The focus of the next lecture (Lecture 5)

“Scale-free”??

• Power law degree distribution→ “scale-free” degree distribution.

• But does that mean there are no scales at all in the system?

• Real-world data is high-dimensional.

• “Scale-free” in one attribute DOES NOT mean scale-free in allattributes.

• Doyle et al, “scale-rich” networks.

Power law degree dist DOES NOT imply scale-free in all attributes!!

Power Law random graphs/Scale-free random graph

• These are random except for degree dist, so scale-free.

• Extremely resilient to random failure.

• Extremely sensitive to targeted attack.

“Power laws, network robustness, and Small World...

Documents

Transcript of “Power laws, network robustness, and Small World...