Yandex wg-talk

Random graph process models of largenetworks

Colin CooperDepartment of Informatics

King’s College London

28th October 2013Yandex

Random graph processGraph process: at each step the existing graph is modified bymaking a small number of structural changes, e.g.

I Add a new vertex with edges incident to existing graphI Add edges within the existing graphI Delete some edges or verticesI Exchange some existing edges for others

If these changes are random then some asymptotic structuralproperties may emerge as the process evolves. For example

The degree sequence has a power law with parameter γ

Outline

Introduction

Various web graph models

Degree distribution: Undirected model

Hub-Authority model: Directed

Web-graphs of increasing degree

Experimental studiesLarge-scale dynamic networks such as the Internet and theWorld Wide Web

I Barabási and Albert, Emergence of scaling in randomnetworks, (1999).

I Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata,Tomkins and Wiener, Graph Structure in the Web, (2000).

I M. Faloutsos and P. Faloutsos and C. Faloutsos, OnPower-law Relationships of the Internet Topology, (1999)

Power law degree sequenceProportion of vertices of a given degree k follows anapproximate inverse power law

nk ∼ Ck−γ

for some constants C, γVarious explanatory models e.g.

I Bollobás, Riordan, Spencer and Tusnády, The degreesequence of a scale-free random graph process, (2001)

I Aiello, Chung and Lu, A random graph model for massivegraphs, (2000)

I Kumar, Raghavan, Rajagopalan, Sivakumar, Tomkins andUpfal. Stochastic models for the web graph, (2000)

I Dorogovtsev, Mendes and Samukhin, Structure of growingnetworks with preferential linking (2000)

Preferential attachmentOne approach: generate graphs via a preferential attachmentPA: attach to a vertex proportional to degreePA gives a power law distribution parameter γ = 3

The preferential attachment model dates back to Yule

G. Yule. A mathematical theory of evolution based on theconclusions of Dr. J.C. Willis, Philosophical Transactions of theRoyal Society of London (Series B) (1924).

Yule model: Random tree. Each point independentlygenerates children with rate 1 in time interval ∆t . Early pointshave most children

PA was proposed as a random graph model for the web byBarabási and Albert. Emergence of scaling in randomnetworks, (1999)

Publications relevant to this talkCooper and Frieze, A general model of web graphs, RSA(2003)An analysis of the recurrence for the expected number ofvertices of degree k , combined with concentration results andbounds for maximum degree.Uses Laplace’s method to solve recurrences with rationalcoefficients

Cooper. The age specific degree distribution of web-graphs,CPC (2006)Derives degree distribution directly, and uses this to obtainexpected number of vertices of degree k

Cooper, Pralat. Scale-free graphs of increasing degree, RSA(2011)Adapts the degree distribution method to obtain results forgrowth model

Web-graph modelsSimple undirected or directed process models where a mixtureof vertices and edges are added at each step eitherpreferentially or uniformly at randomFor undirected web-graph processes, as the degree k tends toinfinity, the expected proportion of vertices of degree k tends toNk ∝ k−γ . The power law parameter is given by

γ = 1 + 1/η.

Here η is the limiting ratio of the expected number of edgeendpoints inserted in the process by preferential attachment tothe expected total degreeThe maximum degree ∆ in this model is a.s.

∆ = O (nη)

where n is the number of verticesSurprisingly, these results seem to hold for other types ofprocess model and can be useful as a general heuristic

Some examples of the power law heuristicStandard preferential attachment: Make G(t) from G(t − 1) byadding a new vertex vt with (an average of) m neighbourschosen preferentially from G(t − 1)

η =m2m

=12

Power law γ = 1 +1η

= 1 + 2 = 3

Maximum degree ∆ = O(

n1/2)

Experimental evidence PA modelRapid convergence for PA graphs γ = 320,000 vertices is enough (see light blue plot data)

Thanks to Yiannis Siantos for the figure

Non-standard triangle closing modelMake G(t) from G(t − 1) by adding a new vertex vtwith one neighbour u chosen u.a.r from G(t − 1)and one edge from vt to a random neighbour w of u

Pr(w chosen) ∝ d(w)

One edge in 4 is chosen preferentially

Proportion of edges added preferentially is

η =14

So heuristically

Power law γ = 1 +1η

= 1 + 4 = 5

Maximum degree ∆ = O(

n1/4)

Experimentally this seems to be true in the limit (see next slide)The model seems difficult to analyze formally

Heuristic gives no information on convergence rateSlow convergence: Large experiments up to 4× 108 verticesStill not quite arrived at γ = 5, ∆ = O

(n1/4)

Thanks to Yiannis Siantos for the figure

Web-graph model generative choices

Web-graph model: Power law degree sequenceFor undirected web-graph process, as the vertex degree ktends to infinity, the expected proportion of vertices of degree ktends to Nk ∝ k−γ . The power law parameter is given by

γ = 1 + 1/η

where η is the limiting ratio of the expected number of edgeendpoints inserted by preferential attachment to the expectedtotal degree

Any γ > 2 can be obtained by suitable choices of parameters

Undirected Web-graph model parameters

I At each step either NEW vertex (+edges) is added withprobability αor extra edges added between OLD vertices with prob.β = 1− αFor convenience edges are regarded as "directed out" fromnew vertex

I The number of edges is sampled from a distributiondepending on the choice made (NEW, OLD)

I Each edge endpoint makes independent UAR or PAchoices:

A. New vertex v , choice for edges directed OUT from vB. Old vertex v , choice for extra edge directed OUT from vC. Old vertex v , choice for extra edge directed IN to v

Undirected model continuedNEW procedure.All edges are "directed out" from new vertex.Each edge of v chooses independently using probabilitymixture (parameter A)

Pr(w is selected) = A1d(w , t)2|E(t)|

+ A21|V (t)|

where ∑w

Pr(w is selected by ei) = A1 + A2 = 1

In all OLD cases Z = A,B,C we have

pZ (v , t) = Z1d(v , t − 1)

2|E(t − 1)|+ Z2

1|V (t − 1)|

Result of these choicesI At each step with prob. α, NEW vertex (+edges) is added,

with prob. β = 1− α extra edges are added between OLDvertices

I The number of edges m,M (NEW, OLD) sampled from aprobability distribution. Expected number of edges m,M

I A. New vertex v , edges directed OUT from vB. Old vertex v , edges directed OUT from vC. Old vertex v , edges directed IN to v

I Degree distribution depends on two parameters η, ν

PA η =αmA1 + βM(B1 + C1)

2(αm + βM)

UAR ν =αmA2 + βM(B2 + C2)

α

Degree distribution: Undirected model

η =αmA1 + βM(B1 + C1)

2(αm + βM)PA

ν =αmA2 + βM(B2 + C2)

αUar

Vertex v of initial degree m added at step vDistribution of degree d(v , t), of v at step t

P(d(v , t) = m+`|m) ∼(`+ m + ν

η − 1`

) (vt

)mη+ν (1−

(vt

)η)`Assumes t →∞ and v is added after time v0 →∞, and` = o(t1/4)

Illustration: Pr (degree increases by 2)Prob. of change p, no change q at step t

p(j , t) ∼ η(m + j)t

+ν

tq(j , t) = 1− p(j , t)

Change points τ1, τ2

v | − − −−−−|τ1 −−−−−−|τ2 −−−−−−−−|t

Prob of exactly 2 changes at τ1, τ2

q(0, v + 1) · · · q(0, τ1 − 1)p(0, τ1) first change at τ1

×q(1, τ1 + 1) · · · q(1, τ2 − 1)p(1, τ2) second change at τ2

×q(2, τ2 + 1) · · · q(2, t) no further changes

This evaluates to

F (τ1, τ2) ∼ ((ηm+ν)(η(m+1)+ν))(v

t

)m+ν(ητη−1

1tη

)(ητη−1

2tη

)

This evaluates to

F (τ1, τ2) ∼ ((ηm+ν)(η(m+1)+ν))(v

t

)m+ν(ητη−1

1tη

)(ητη−1

2tη

)

Add over all possible τ1, τ2

∑F (τ1, τ2) ∼ (ηm+ν)(η(m+1)+ν)

2!

(vt

)m+ν(∫ t

v

(ητη−1

tη

)dτ)2

∼ (ηm+ν)(η(m+1)+ν)

2!

(vt

)m+ν (1−

(vt

)η)2

From deg. distn we can obtain..I n(` | m) expected proportion of vertices of degree m + `

n(` | m) =((`+ m − 1)η + ν) · · · (mη + ν)

((`+ m)η + ν + 1) · · · (mη + ν + 1)

I Proportion, Nt (` | m) of vertices of degree m + `concentrated around n(` | m) provided t →∞, and ` nottoo large

I As `→∞, n(` | m) ∼ K `−(1+1/η)

Range of η is 0 < η < 1. Power law coefficient γ ≥ 2

η =αmA1 + βM(B1 + C1)

2(αm + βM)

I As η → 0. Geometric degree sequence random graph

limη→0

nη(` | m) ∼ 1ν + 1

(ν

ν + 1

)`

Hub-Authority model: DirectedHub: Vertex with a lot of edges directed out (opinionated page)Authority: Vertex with a lot of edges directed in (popular page)The initial in- and out-degree is given by a distribution (P−,P+)

How does a new vertex v added at step t + 1 choose itsIN-neighbours?

Pr(w points to v) = D1d+(w , t)|E(t)|

+ D21|V (t)|

It is most likely a hub vertex will point an edge to vHow does a new vertex added at step t + 1 choose itsOUT-neighbours?

Pr(v points to w) = A1d−(w , t)|E(t)|

+ A21|V (t)|

,

it is most likely v will point to an authority vertex

Results summaryUndirected model

(√

) Age dependent degree distribution(√

) Number of vertices with given degree(√

) Asymptotic degree sequence n(k) ∼ k−x

Hub-Authority model(√

) Age dependent in- and out-degree distribution(√,×) Number of vertices with given in- & out-degree (as an

integral)(√

) Asymptotic degree sequence

n(k , l) ∼ k−x−`−x+

, x = x(k , `)

General Directed model(×) The in- and out-degree distribution is not obtainableexplicitlySum of path dependent integrals (order of events matters)

Directed model. Definition onlyIn general, the choice type can be made on a mixture of IN andOUT degreeE.g. How does a new vertex added at step t choose itsOUT-neighbours?

Pr(v points to w) =

A(1,+)d+(w , t − 1)

|E(t − 1)|+ A(1,−)

d−(w , t − 1)

|E(t − 1)|+ A2

1|V (t − 1)|

,

whereA(1,+) + A(1,+) + A2 = 1

An in-degree of 2 at w could be made up of various choices(++), (+−), (−+), (−−) at w by subsequent vertices t > w

Results: Hub-Authority modelDegree distribution: Explicit distribution (similar to undirected)

Power law: Number of vertices n(r , s) of in-degree r ,out-degree s is of the form

n(r , s | m−,m+) = Cr ,sr−x−s−x+

The parameters x−, x+ depend on the relative sizes of r , sThey change as s increases from 1 to s = Θ(rη

+/η−)Functional form x = f (η+, η−, ν,m+,m−) quotient

η+, η− are the preferential attachment parametersThe parameter η− is the limiting ratio of the expected number ofedges whose terminal vertex was chosen by preferentialattachment, to the expected number of edges of the process

η− =αm+A1 + βMC1

αm+ + γm− + βM

How does degree sequence differ from Undirected?

Pr(d−(v , t) = r ,d+(v , t) = s) ∼ Pr(d−(v , t) = r)Pr(d+(v , t) = s)

Expected proportion of vertices of degree (r , s)

n(r , s) = Cr−(1−ξ−)s−(1−ξ

+)J(r , s)

where ξ+ = m+ + ν+/η+ and

J(r , s) =

∫ 1

0xa(1− x)r (1− xb)sdx

where b = η+/η− and a = η+/η− ξ+ + 1/η− + ξ− − 1

Asymptotics for J(r , s) depend on relative sizes of r , s

Increasing degree model: Preferential AttachmentCan we escape from power law γ = 3 by increasing thenumber of edges added at each step?At each step t add NEW vertex with f (t) edges

f (t) = [tc], 0 < c < 1

For k � tc the power law we get is

nk = C(

t

k3−c1+c

) 1+c1−c

Need c > 0 constant to escape power law γ = 3 given by PAmodelsWhen c = 1 all vertices have degree ∼ t so no power lawanymoreFor 0 < c < 1 the power law is γ(c) = 1 + 2/(1− c) > 3

Concluding remarksGood points of web-graph model

I Method works well for undirected modelsI Provides a heuristic for predicting degree sequence power

law and maximum degree in unrelated modelsI Generalizes to hypergraph models (not covered in this talk)I If 1 ≤ m(t) = to(1) edges added at step t , power law is 3

Not so good points of web-graph modelI Directed models less pleasing, as power law varies as a

function of relative sizes of in-degree and out-degreeI General directed model: no closed form for degree

distribution?I Model does not explain/predict power laws with parameterγ < 2 (As η ≤ 1 it must be that γ = 1 + 1/η ≥ 2)

THANK YOU

QUESTIONS

Yandex wg-talk

Technology

Transcript of Yandex wg-talk