Yandex wg-talk

download Yandex wg-talk

of 30

  • date post

    29-Jun-2015
  • Category

    Technology

  • view

    618
  • download

    0

Embed Size (px)

description

Научно-технический семинар в Яндексе 28 октября (Колин Купер)

Transcript of Yandex wg-talk

  • 1. Random graph process models of large networks Colin Cooper Department of Informatics Kings College London28th October 2013 Yandex

2. Random graph process Graph process: at each step the existing graph is modied by making a small number of structural changes, e.g. Add a new vertex with edges incident to existing graph Add edges within the existing graph Delete some edges or vertices Exchange some existing edges for others If these changes are random then some asymptotic structural properties may emerge as the process evolves. For example The degree sequence has a power law with parameter 3. Outline IntroductionVarious web graph modelsDegree distribution: Undirected modelHub-Authority model: DirectedWeb-graphs of increasing degree 4. Experimental studies Large-scale dynamic networks such as the Internet and the World Wide Web Barabsi and Albert, Emergence of scaling in random networks, (1999). Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata, Tomkins and Wiener, Graph Structure in the Web, (2000).M. Faloutsos and P. Faloutsos and C. Faloutsos, On Power-law Relationships of the Internet Topology, (1999) 5. Power law degree sequence Proportion of vertices of a given degree k follows an approximate inverse power law nk Ck for some constants C, Various explanatory models e.g. Bollobs, Riordan, Spencer and Tusndy, The degree sequence of a scale-free random graph process, (2001) Aiello, Chung and Lu, A random graph model for massive graphs, (2000) Kumar, Raghavan, Rajagopalan, Sivakumar, Tomkins and Upfal. Stochastic models for the web graph, (2000) Dorogovtsev, Mendes and Samukhin, Structure of growing networks with preferential linking (2000) 6. Preferential attachment One approach: generate graphs via a preferential attachment PA: attach to a vertex proportional to degree PA gives a power law distribution parameter = 3 The preferential attachment model dates back to Yule G. Yule. A mathematical theory of evolution based on the conclusions of Dr. J.C. Willis, Philosophical Transactions of the Royal Society of London (Series B) (1924). Yule model: Random tree. Each point independently generates children with rate 1 in time interval t. Early points have most children PA was proposed as a random graph model for the web by Barabsi and Albert. Emergence of scaling in random networks, (1999) 7. Publications relevant to this talk Cooper and Frieze, A general model of web graphs, RSA (2003) An analysis of the recurrence for the expected number of vertices of degree k , combined with concentration results and bounds for maximum degree. Uses Laplaces method to solve recurrences with rational coefcients Cooper. The age specic degree distribution of web-graphs, CPC (2006) Derives degree distribution directly, and uses this to obtain expected number of vertices of degree k Cooper, Pralat. Scale-free graphs of increasing degree, RSA (2011) Adapts the degree distribution method to obtain results for growth model 8. Web-graph models Simple undirected or directed process models where a mixture of vertices and edges are added at each step either preferentially or uniformly at random For undirected web-graph processes, as the degree k tends to innity, the expected proportion of vertices of degree k tends to Nk k . The power law parameter is given by = 1 + 1/. Here is the limiting ratio of the expected number of edge endpoints inserted in the process by preferential attachment to the expected total degree The maximum degree in this model is a.s. = O (n ) where n is the number of vertices Surprisingly, these results seem to hold for other types of process model and can be useful as a general heuristic 9. Some examples of the power law heuristic Standard preferential attachment: Make G(t) from G(t 1) by adding a new vertex vt with (an average of) m neighbours chosen preferentially from G(t 1) =Power law1 m = 2m 2 =1+Maximum degree1 =1+2=3 = O n1/2 10. Experimental evidence PA model Rapid convergence for PA graphs = 3 20, 000 vertices is enough (see light blue plot data)Thanks to Yiannis Siantos for the gure 11. Non-standard triangle closing model Make G(t) from G(t 1) by adding a new vertex vt with one neighbour u chosen u.a.r from G(t 1) and one edge from vt to a random neighbour w of uPr(w chosen) d(w) One edge in 4 is chosen preferentially 12. Proportion of edges added preferentially is =1 4So heuristically Power law =1+Maximum degree1 =1+4=5 = O n1/4Experimentally this seems to be true in the limit (see next slide) The model seems difcult to analyze formally 13. Heuristic gives no information on convergence rate Slow convergence: Large experiments up to 4 108 vertices Still not quite arrived at = 5, = O n1/4Thanks to Yiannis Siantos for the gure 14. Web-graph model generative choices 15. Web-graph model: Power law degree sequence For undirected web-graph process, as the vertex degree k tends to innity, the expected proportion of vertices of degree k tends to Nk k . The power law parameter is given by = 1 + 1/ where is the limiting ratio of the expected number of edge endpoints inserted by preferential attachment to the expected total degree Any > 2 can be obtained by suitable choices of parameters 16. Undirected Web-graph model parameters At each step either NEW vertex (+edges) is added with probability or extra edges added between OLD vertices with prob. =1 For convenience edges are regarded as "directed out" from new vertex The number of edges is sampled from a distribution depending on the choice made (NEW, OLD) Each edge endpoint makes independent UAR or PA choices: A. New vertex v , choice for edges directed OUT from v B. Old vertex v , choice for extra edge directed OUT from v C. Old vertex v , choice for extra edge directed IN to v 17. Undirected model continued NEW procedure. All edges are "directed out" from new vertex. Each edge of v chooses independently using probability mixture (parameter A) Pr(w is selected) = A11 d(w, t) + A2 2|E(t)| |V (t)|where Pr(w is selected by ei ) = A1 + A2 = 1 wIn all OLD cases Z = A, B, C we have pZ (v , t) = Z11 d(v , t 1) + Z2 2|E(t 1)| |V (t 1)| 18. Result of these choices At each step with prob. , NEW vertex (+edges) is added, with prob. = 1 extra edges are added between OLD vertices The number of edges m, M (NEW, OLD) sampled from a probability distribution. Expected number of edges m, M A. New vertex v , edges directed OUT from v B. Old vertex v , edges directed OUT from v C. Old vertex v , edges directed IN to v Degree distribution depends on two parameters , PAUAR=mA1 + M(B1 + C1 ) 2(m + M)=mA2 + M(B2 + C2 ) 19. Degree distribution: Undirected model =mA1 + M(B1 + C1 ) 2(m + M)PA =mA2 + M(B2 + C2 ) UarVertex v of initial degree m added at step v Distribution of degree d(v , t), of v at step t P(d(v , t) = m+ |m) +m+ 1 v tm +1Assumes t and v is added after time v0 , and = o(t 1/4 )v t 20. Illustration: Pr (degree increases by 2) Prob. of change p, no change q at step t (m + j) + t t Change points 1 , 2 p(j, t) q(j, t) = 1 p(j, t)v | |1 |2 |t Prob of exactly 2 changes at 1 , 2 q(0, v + 1) q(0, 1 1)p(0, 1 ) q(1, 1 + 1) q(1, 2 1)p(1, 2 ) q(2, 2 + 1) q(2, t)rst change at 1 second change at 2no further changesThis evaluates to v F (1 , 2 ) ((m+)((m+1)+)) tm+1 1 t1 2 t 21. This evaluates to F (1 , 2 ) ((m+)((m+1)+))v tm+1 1 t1 2 tAdd over all possible 1 , 2 F (1 , 2 ) (m+)((m+1)+) 2! (m+)((m+1)+) 2!v t v tm+m+t 1 d t v v 2 1 t2 22. From deg. distn we can obtain.. n( | m) expected proportion of vertices of degree m + n( | m) =(( + m 1) + ) (m + ) (( + m) + + 1) (m + + 1)Proportion, Nt ( | m) of vertices of degree m + concentrated around n( | m) provided t , and not too large As , n( | m) K (1+1/) Range of is 0 < < 1. Power law coefcient 2 =mA1 + M(B1 + C1 ) 2(m + M)As 0. Geometric degree sequence random graph lim n ( | m) 01 +1 +1 23. Hub-Authority model: Directed Hub: Vertex with a lot of edges directed out (opinionated page) Authority: Vertex with a lot of edges directed in (popular page) The initial in- and out-degree is given by a distribution (P , P + ) How does a new vertex v added at step t + 1 choose its IN-neighbours? Pr(w points to v ) = D11 d + (w, t) + D2 |E(t)| |V (t)|It is most likely a hub vertex will point an edge to v How does a new vertex added at step t + 1 choose its OUT-neighbours? Pr(v points to w) = A1d (w, t) 1 + A2 , |E(t)| |V (t)|it is most likely v will point to an authority vertex 24. Results summary Undirected model ( ) Age dependent degree distribution ( ) Number of vertices with given degree ( ) Asymptotic degree sequence n(k ) k x Hub-Authority model ( ) Age dependent in- and out-degree distribution ( , ) Number of vertices with given in- & out-degree (as an integral) ( ) Asymptotic degree sequence n(k , l) k xx +, x = x(k , )General Directed model () The in- and out-degree distribution is not obtainable explicitly Sum of path dependent integrals (order of events matters) 25. Directed model. Denition only In general, the choice type can be made on a mixture of IN and OUT degree E.g. How does a new vertex added at step t choose its OUT-neighbours?Pr(v points to w) = A(1,+)d (w, t 1) 1 d + (w, t 1) + A(1,) + A2 , |E(t 1)| |E(t 1)| |V (t 1)|where A(1,+) + A(1,+) + A2 = 1 An in-degree of 2 at w could be made up of various choices (++), (+), (+), () at w by subsequent vertices t > w 26. Results: Hub-Authority model Degree distribution: Explicit distribution (similar to undirected) Power law: Number of vertices n(r , s) of in-degree r , out-degree s is of the form n(r , s | m , m+ ) = Cr ,s r x sx+The parameters x , x + depend on the relative sizes of r , s + They change as s increases from 1 to s = (r / ) Functional form x = f ( + , , , m+ , m ) quotient + , are the preferential attachment parameters The parameter is the limiting ratio of the expected number of edges whose termi