[IEEE 2012 15th International Conference on Computer and Information Technology (ICCIT) -...

5
Stochastic Kronecker Graph Revisited Ahmed Mehedi Nizam Department of Computer Science and Engineering Bangladesh University of Engineering and Technology Dhaka, Bangladesh [email protected] Md. Nasim Adnan, Md. Rashedul Islam Department of Computer Science and Engineering University of Liberal Arts Bangladesh Dhaka, Bangladesh {nasim.adnan, rashedul.islam}@ulab.edu.bd Mohammad Akbar Kabir Department of Economics University of Dhaka Dhaka, Bangladesh [email protected] Abstract—Here we calculate the expected number of isolated vertices, edges, self loops and triangles in a random realization of stochastic Kronecker graph. We then establish some bounds on the values of the parameters of the stochastic Kronecker graph which are sufficient to generate large random graph with no isolated vertices, edges, self loops and triangles. Finally we show two phase transitions: one for the emergence of edges and the other for the emergence of self loops under stochastic Kronecker model of graph generation. Keywords—Stochastic Kronecker Grap;, Isolated vertex; Edge count; Self loops; triangles; Phase transitions. I. INTRODUCTION Networks in the real world tend to share a nice set of properties. For example, they exhibit a form of small world phenomenon or six degrees of separation [9]. The small world phenomenon simply claims that any two persons x and y in this world can be connected to each other by using a relation like: x (is a friend of a friend of a friend …….. of a friend of) y where the number of friends in between x and y can be (approximately) as small as six. This phenomenon is observed in many large scale networks including the internet, the web and in various online social networks [6]. Moreover as a real world network evolves over time its diameter is known to shrink, which implies that the graph tends to become more and more connected over the time as new nodes and edges come into existence [10, 11]. Let us now briefly examine this property: We construct a graph by introducing one vertex for every person in this world and we connect two vertices by an edge if and only if the persons corresponding to the two vertices are known to each other. We thus have a graph of over 7 billion vertices and gazillions of edges. We symbolize this graph by G 2012 as this graph is based on the data of the year 2012. Surely the graph G 2013 will not be same as G 2012 as more and more people will be born to this world in between and people will have new friends. So the diameter of G 2013 will be at least equal to that of G 2012 ( if not less ) as adding new nodes and edges without deleting the existing ones, can not possibly increase the diameter. To be more precise, we may say acquaintance instead of friendship as people may unfriend each other ( which eventually imply the deletion of existing edges ) but acquaintance, once established, lasts for ever: either in the form of friend or foe. Another interesting fact of many real world network is its degree distribution which follows a power law pattern: the number of nodes having degree d ( N d ) is proportional to some negative exponent of d ( ן ). Power laws have been found in the internet [2], the web [3], citation graph [4], online social network [5] and in many others. Meanwhile the number of edges E(t) and number of nodes N(t) of an evolving network at timestamp t, obeys the so called Densification Power Law (DPL) which states that ܧݐ ןݐ . The densification exponent a is typically greater than 1 which implies that the average degree of a node in an evolving network increases as the time passes by and the network densifies as it grows [10, 11]. The scree plot ( eigen values of the adjacency matrix of the graph vs their rank in logarithmic scale ) are also found to follow a power law pattern [5, 12]. Finally the number of triangles Δ versus the number of nodes that participate in Δ are seen to be skewed [13]. 978-1-4673-4836-2/12/$31.00 ©2012 IEEE 90

Transcript of [IEEE 2012 15th International Conference on Computer and Information Technology (ICCIT) -...

Page 1: [IEEE 2012 15th International Conference on Computer and Information Technology (ICCIT) - Chittagong, Bangladesh (2012.12.22-2012.12.24)] 2012 15th International Conference on Computer

Stochastic Kronecker Graph Revisited

Ahmed Mehedi Nizam Department of Computer Science and Engineering

Bangladesh University of Engineering and Technology Dhaka, Bangladesh

[email protected]

Md. Nasim Adnan, Md. Rashedul Islam Department of Computer Science and Engineering

University of Liberal Arts Bangladesh Dhaka, Bangladesh

{nasim.adnan, rashedul.islam}@ulab.edu.bd

Mohammad Akbar Kabir Department of Economics

University of Dhaka Dhaka, Bangladesh

[email protected]

Abstract—Here we calculate the expected number of isolated vertices, edges, self loops and triangles in a random realization of stochastic Kronecker graph. We then establish some bounds on the values of the parameters of the stochastic Kronecker graph which are sufficient to generate large random graph with no isolated vertices, edges, self loops and triangles. Finally we show two phase transitions: one for the emergence of edges and the other for the emergence of self loops under stochastic Kronecker model of graph generation.

Keywords—Stochastic Kronecker Grap;, Isolated vertex; Edge count; Self loops; triangles; Phase transitions.

I. INTRODUCTION Networks in the real world tend to share a nice set of

properties. For example, they exhibit a form of small world phenomenon or six degrees of separation [9]. The small world phenomenon simply claims that any two persons x and y in this world can be connected to each other by using a relation like: x (is a friend of a friend of a friend …….. of a friend of) y where the number of friends in between x and y can be (approximately) as small as six. This phenomenon is observed in many large scale networks including the internet, the web and in various online social networks [6].

Moreover as a real world network evolves over time its diameter is known to shrink, which implies that the graph tends to become more and more connected over the time as new nodes and edges come into existence [10, 11]. Let us now briefly examine this property: We construct a graph by introducing one vertex for every person in this world and we connect two vertices by an edge if and only if the persons

corresponding to the two vertices are known to each other. We thus have a graph of over 7 billion vertices and gazillions of edges. We symbolize this graph by G2012 as this graph is based on the data of the year 2012. Surely the graph G2013 will not be same as G2012 as more and more people will be born to this world in between and people will have new friends. So the diameter of G2013 will be at least equal to that of G2012 ( if not less ) as adding new nodes and edges without deleting the existing ones, can not possibly increase the diameter. To be more precise, we may say acquaintance instead of friendship as people may unfriend each other ( which eventually imply the deletion of existing edges ) but acquaintance, once established, lasts for ever: either in the form of friend or foe.

Another interesting fact of many real world network is its degree distribution which follows a power law pattern: the number of nodes having degree d ( Nd ) is proportional to some negative exponent of d ( ). Power laws have been found in the internet [2], the web [3], citation graph [4], online social network [5] and in many others. Meanwhile the number of edges E(t) and number of nodes N(t) of an evolving network at timestamp t, obeys the so called Densification Power Law (DPL) which states that . The densification exponent a is typically greater than 1 which implies that the average degree of a node in an evolving network increases as the time passes by and the network densifies as it grows [10, 11]. The scree plot ( eigen values of the adjacency matrix of the graph vs their rank in logarithmic scale ) are also found to follow a power law pattern [5, 12]. Finally the number of triangles Δ versus the number of nodes that participate in Δ are seen to be skewed [13].

978-1-4673-4836-2/12/$31.00 ©2012 IEEE 90

Page 2: [IEEE 2012 15th International Conference on Computer and Information Technology (ICCIT) - Chittagong, Bangladesh (2012.12.22-2012.12.24)] 2012 15th International Conference on Computer

So we need a suitable graph generator that can generate graphs with these diverse set of desirable properties. Unfortunately, the earliest and most thoroughly investigated model of graph generation, namely the Erdos-Renyi model [22], admits none of these properties. So, as soon as these properties have been discovered, numerous other graph generation models are proposed: Each admitting one or two properties of real world network at the cost of loosing others. For example, the models proposed in [6, 15, 16, 18] follows some sort of preferential attachment and thus attain power law tails in degree distribution at the expense of losing the shrinking diameter property ( the diameter of the generated graph tends to increase over time ). A different family of graph generator strives for small diameter and local clustering in the network. Examples of such models include the small-world model (Watts and Strogatz [19]) and the Waxman generator (Waxman, [20]). In a nutshell, all the above models focuses only one of the desirable properties while neglecting the others. So authors in [1], proposes a graph model that can attain multiple desirable properties of the real world massive network at the same time while remaining analytically tractable. The model starts with an initiator graph G1 having N1 nodes and E1 edges and (gradually) computes the kth Kronecker product Gk of it. The graph Gk would have N1

k nodes and E1k edges and

thus exhibits a version of Densification Power Law. Additionally it will be a graph of small effective diameter. While the Kronecker power construction in the deterministic case yields graphs with a range of desired properties, its discrete nature produces staircase effects in the degrees and spectral quantities, simply because individual values have large multiplicities [1]. So authors in [1] propose a stochastic model of Kronecker product and empirically shows it can create smoother and more realistic graph than can be generated by its deterministic counter-part. Some basic properties (such as connectivity, existence of giant component, small diameter etc) of stochastic Kronecker graph have been thoroughly investigated in [7]. But we believe the theory of stochastic Kronecker Graph is still very young and many obvious questions about it are yet unanswered. Here we try to answer a few question regarding the number of isolated vertices, number of edges, self loops and triangles. In section: 3, we find the expected number of different features as a function of parameters of the stochastic Kronecker graph. Next as obvious corollaries of the above, we establish the sufficient conditions to generate graphs having no isolated vertex, no edges, no self loops and things like these.

II. STOCHASTIC KRONECKER GRAPH MODEL The Kronecker graph model is defined in its full generality in [1]. But here we concentrate on a specific variant of stochastic Kronecker graph with an initiator matrix of size 2. We adopt the definition provided in [7].

Definition: A (stochastic) Kronecker graph is defined by • An integer k • A symmetric 2 2 matrix θ: θ[1,1] = α, θ[1,0] = θ[0,1] = β, θ[0,0] = γ where 0 β 1. We call θ the base or initiator matrix.

• The graph has 2 number of vertices where each of the vertices is labeled with a unique bit vector of length . Given two vectors of label ( , , , … … , , , , … … , the probability that the edge , exists is given by: ∏ , independent of the all other edges. • The weight of a vertex is the number of 1 in its labeling.

The restrictions on the parameters of the base matrix θ has been verified empirically in [1]. If these restrictions are maintained, namely when , then the resultant Kronecker product does give rise to a (statistically) equivalent real world random network.

III. USING THE TEMPLATE

In this section we will find out the expected number of isolated vertices, edges, self loops and triangles. As obvious corollaries to these expected feature count we then establish the sufficient condition to generate large random graphs with no isolated vertex, no edge and no self loops. We start this section with a theorem proved in [7].

Theorem 1: The expected degree of a vertex of weight is .

A. Expected Number of Isolated Vertices

Theorem 2: The expected number of isolated vertices in stochastic Kronecker graph with parameter , , is:

. Proof: For any vertex of weight and any other vertex , let

be the number of bits where and be the number of bits where , . So there will be

bit positions where , and bit positions where , . As a result the probability of edge , to be present is given by: ,

. The probability that edge , is not present is given by: , . Vertex is indeed a member of a class of vertices ( let this class be ) and there are identical vertices [ identical with respect to ] in this class. So the probability that vertex is not connected to any vertex of class is given by: . The value of varies from to and the value of varies from to . As the edges in stochastic Kronecker graph exist (or not) independently of any other edges, the probability that vertex is connected to none of the vertices over all possible class is given by:

978-1-4673-4836-2/12/$31.00 ©2012 IEEE 91

Page 3: [IEEE 2012 15th International Conference on Computer and Information Technology (ICCIT) - Chittagong, Bangladesh (2012.12.22-2012.12.24)] 2012 15th International Conference on Computer

∏ ∏ . ∏ ∏ . [As ] ∏ ∑ ∏

[As ] Now we define indicator random variables , , , … … … , where denotes the event that vertex be isolated and be the total number of isolated vertices in a random realization. Then, … … …

. So, ∑ ∑ ∑

. Corollary 1: If 2 then the stochastic Kronecker graph will have no isolated vertex with high probability1.

B. Expected Number of Edges Theorem 3: The expected number of edges in stochastic Kronecker graph is: . Proof: From theorem: 1, the expected degree of a vertex of weight is given by: . And there are number of vertices of weight . Thus summing it over all possible values of we will be able to calculate the expected total degree of the resultant graph which equals: ∑

Now from the degree sum formula [8] the total (expected) number of edges will be times the total (expected) degree. So

the expected number of edges . Corollary 2: If 1 then the stochastic Kronecker graph will have no edges with high probability.

C. Expected Number of Self Loops

Theorem 4: The expected number of self loops in stochastic Kronecker graph is . Proof: The probability that a vertex of weight is connected to itself is . Summing it over all possible values of we get the total expected number of self loops (as the self loops

1 With high probability, we mean probability 1 1 .

exist independent of each other we can simply sum up their individual probability): ∑ = . Corollary 3: If 1 then the stochastic Kronecker graph will have no self loops with high probability.

D. Expected Number of Triangles Theorem 5: The expected number of triangles in stochastic Kronecker graph is:

. Proof: Let us consider three arbitrary vertices v1, v2, v3 of weight l1, l2, l3 respectively. We now define four variables i1, i2, j1, j2 as follows: i1 = number of those bit positions where both v1, v2 have ‘1’ in their labeling; i2 = number of those bit positions where both v2, v3 have ‘1’ in their labeling; j1 = number of bit positions where v1 = 0, v2 = 1; j2 = number of bit positions where v2 = 0, v3 = 1. Now there will be (l1-i1) bit positions where v1 = 1, v2 = 0 and (l2-i2) bit positions where v2 = 1, v3 = 0; (k-l1-j1) bit positions where both v1, v2 have ‘0’ in their labeling and (k-l2-j2) bit positions where both of v2 and v3 have ‘0’ labeling. So the probability that both the edge (v1, v2) and (v2, v3) be present is given by:

. Also we notice that and . Now summing it over all possible values of , , , we will get the total expected number of two

length paths:

978-1-4673-4836-2/12/$31.00 ©2012 IEEE 92

Page 4: [IEEE 2012 15th International Conference on Computer and Information Technology (ICCIT) - Chittagong, Bangladesh (2012.12.22-2012.12.24)] 2012 15th International Conference on Computer

A two length path can be easily extended to a three length cycle by simply connecting the two end vertices by an edge. From the definition of stochastic Kronecker graph, we know that the maximum probability of the existence of an edge is

. So the expected number of triangles starting at v1 will be:

Now we note that the above quantity indeed counts for some fictitious triangles of the form v1v1v1 or like v1v2v2. So the sign ‘ ’ should be replaced by ‘<’. We also note that every triangle is counted twice in the above: one in the clockwise and the other is the counter clock-wise ordering of its vertices. Now incorporating the above facts and summing it over all possible choices of v1, we will be able to get the total expected number of triangles in a random realization of stochastic Kronecker graph model and this quantity would be:

IV. PHASE TRANSITIONS Mahdian and Xu [7] provided two phase transitions in stochastic Kronecker graph: one for the emergence of giant component and the another for connectivity. To the best of our knowledge, there is no phase transitions other than these in the literature. In this section we show two phase transitions in stochastic Kronecker graph. In proving the existence of phase transitions we need to resort to the second moment argument which simply says that: If X is a random variable, then , in particular, when

.

A. Appearance of Edges

Theorem 5: The appearance of edges in stochastic Kronecker graph exhibits a threshold at . Proof: Let be the total number of edges in a random realization of a stochastic Kronecker graph and ,

be the indicator random variable for the existence of the -th edge. Now from theorem: 3, we know when 1 then . So when 1 we only need to show that at that instance, (which will then complete the proof of existence of a phase transition at

utilizing the second moment argument). Here we notice that all the are independent according to the definition of Kronecker graph. Hence, … … ∑ , ∑ ∑

B. Appearance of Self Loops

Theorem 6: The appearance of self loops in stochastic Kronecker graph exhibits a threshold at .

Proof: Let be the total number of self loops in a random realization and , , be the indicator random variable for the existence of the -the loop. Now from theorem: 4, we know that when 1 then . So to complete the proof of existence of a threshold at

we need to show that when 1 , then . Now we note that all the self loops exist independent

of one another. So we have: … … = ∑ , ∑ ∑

978-1-4673-4836-2/12/$31.00 ©2012 IEEE 93

Page 5: [IEEE 2012 15th International Conference on Computer and Information Technology (ICCIT) - Chittagong, Bangladesh (2012.12.22-2012.12.24)] 2012 15th International Conference on Computer

V. CONCLUSION The stochastic Kronecker model of graph generation is very new and most of the properties of graphs generated by this model are yet to be investigated. Here we try to explore some of its properties namely expected number of isolated vertex, edge, self loop, triangles along with two phase transitions. Based on these expected feature counts, we then establish some of the sufficient conditions to generate graphs with an interesting set of properties.

REFERENCES [1] Leskovec, J., Chakrabarti, D., Kleinberg, J, Faloutsos, C: Realistic,

Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication.Knowledge Discovery in Databases: PKDD 2005.

[2] Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In SIGCOMM, pages 251262, 1999.

[3] Kleinberg, J.M., Kumar, S.R., Raghavan, P., Rajagopalan, S, Tomkins, A: The web as a graph: Measurements, models and methods. In Proceedings of the International Conference on Combinatorics and Computing, 1999.

[4] Redner, S: How popular is your paper? an empirical study of the citation distribution. European Physical Journal B, 4:131134, 1998.

[5] Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. In SDM, 2004.

[6] Albert, R., Barabasi, A.L.: R-MAT: Statistical mechanics of complex networks. Reviews of Modern Physics, 2002.

[7] Mahdian, M., Xu,Y.: Stochastic Kronecker Graph. 6-th International Workshop on Algorithms and Models of Webgraph. WAW-2007.

[8] Euler, L.: Solutio problematis ad geometriam situs pertinentis. journal Commentarii academiae scientiarum Petropolitanae , 1741.

[9] S. Milgram. The small-world problem. Psychology Today, 2:60–67, 1967.

[10] J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):2

[11] J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD ’05: Proceeding of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pages 177–187.

[12] I. Farkas, I. Der´eni, A.-L. Barab´asi, and T. Vicsek. Spectra of “real-world” graphs: beyond the semicircle law. Physical Review E, 64(026704), 2001.

[13] C. E. Tsourakakis. Fast counting of triangles in large real networks, without counting: Algorithms and laws. In ICDM ’08 : IEEE International Conference on Data Mining, 2008.

[14] P. Erdos and A. Renyi. On the evolution of random graphs. Publication of theMathematical Institute of the Hungarian Acadamy of Science, 5:17–67, 1960.

[15] A.-L. Barab´asi and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.

[16] J. M. Kleinberg. The small-world phenomenon: an algorithmic perspective. Technical Report 99-1776, Cornell Computer Science Department, 1999.

[17] S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Extracting large-scale knowledge bases from the web. In Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999.

[18] A. D. Flaxman, A.M. Frieze, and J. Vera. A geometric preferential attachment model of networks II. In WAW ’07: Proceedings of the 5th Workshop On Algorithms And Models For The Web-Graph, pages 41–55, 2007.

[19] D. J. Watts and S. H. Strogatz. Collective dynamics of ’small-world’ networks. Nature, 393:440– 442, 1998.

[20] B. M. Waxman. Routing of multipoint connections. IEEE Journal on Selected Areas in Communications, 6(9):1617–1622, December 1988.

[21] Erdős, Paul; A. Rényi (1960). "On the evolution of random graphs". Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5: 17–61.

978-1-4673-4836-2/12/$31.00 ©2012 IEEE 94