The Block Two-Level Erdös-Rényi (BTER) Graph Model

21
The Block Two-Level Erdös-Rényi (BTER) Graph Model Ali Pinar, C. Seshadri, and Tamara G. Kolda Sandia National Laboratories July 13, 2012 Pinar - MMDS12 1 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. U.S. Department of Energy Office of Advanced Scientific Computing Research U.S. Department of Defense Defense Advanced Research Projects Agency

description

The Block Two-Level Erdös-Rényi (BTER) Graph Model. Ali Pinar, C. Seshadri , and Tamara G. Kolda Sandia National Laboratories. U.S. Department of Energy Office of Advanced Scientific Computing Research. U.S. Department of Defense Defense Advanced Research Projects Agency. - PowerPoint PPT Presentation

Transcript of The Block Two-Level Erdös-Rényi (BTER) Graph Model

Page 1: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

The Block Two-Level Erdös-Rényi (BTER) Graph Model

Ali Pinar, C. Seshadri, and Tamara G. KoldaSandia National Laboratories

July 13, 2012 Pinar - MMDS12 1

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration

under contract DE-AC04-94AL85000.

U.S. Department of EnergyOffice of Advanced Scientific Computing Research

U.S. Department of DefenseDefense Advanced Research Projects Agency

Page 2: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Pinar - MMDS12 2

Why model massive graphs? • Enable sharing of surrogate data

– Computer network traffic– Social networks– Financial transactions

• Testing graph algorithms– Scalability– Versatility (e.g., vary degree distributions)– Characterizing algorithm performance– Verification & validation

• Insight into…– Generative process– Community structure– Comparison– Evolution– Uncertainty

July 13, 2012

Block Two-Level Erdös-Rényi (BTER) graph; image courtesy of Nurcan Durak.

Page 3: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Pinar - MMDS12 3

Model Desiderata• Captures heavy-tailed degree

distribution– Not necessarily power law

• Captures community structure– Measured indirectly through

clustering coefficient, and other measures

• Able to “fit” real-world data– Reproduce degree distribution– Reproduce community structure

• Scales up to ~240 nodes – Motivated by GRAPH500

benchmark

July 13, 2012

A.-L. Barabasi and R. Albert. Emergence of scalingin random networks. Science, 286(5349):509-512, 1999.

Actor Collaboration WWW Power Grid

M.E.J. Newman and M. Girvan, Finding and evaluating community structure in networks,

Phys. Rev. E 69, 026113, 2004.http://www.graph500.org/

Page 4: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Preserving Degree Distribution

July 13, 2012 Pinar - MMDS12 4

• Majority of models are amazingly bad at matching degree distributions

– Tend to focus on matching “spirit” of the distribution– May not even have the capability of matching some

distributions

• Exception is the Chung-Lu (CL) model [Chung & Lu, 2002]

– Also known as configuration model [Newman, 2003] or weighted Erdős-Rényi (ER) model

• Premise: Probability of edge is proportional to degrees of endpoints

– Pr(eij) = didj/(2m)

• Can be implemented in a scalable way by repeated edge insertions.

The CL model is nearly perfect at matching desired degree

distributions.

100 101 102 103100

101

102

103

104

Degree

Cou

nt

Degree Distribution

cit-HepPhCL

Page 5: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Pinar - MMDS12 5

Community Structure in Graphs• Numerous community finding algorithms exist

– Difficult to validate– Trouble in finding full range of sizes

• Instead, use related measures like clustering coefficient– Triangles arise because of community structure

• What “community” structure must be presentto ensure a high clustering coefficient, especially for low-degree nodes?– How many communities?– What do they look like?

July 13, 2012

M.E.J. Newman and M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69, 026113,

2004.

Page 6: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Pinar - MMDS12 6

Clustering Coefficients

July 13, 2012

ti = # triangles at vertex idi = degree of vertex i

Clustering Coefficient

Global Clustering Coeff.

Page 7: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Pinar - MMDS12 7

Clustering coefficients are higher for lower degree vertices

July 13, 2012

Page 8: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Model building approach • Find features that restrict the space and identify the

structure imposed by these features • Given

• a skewed degree distribution• high clustering coefficients

• What structure should we expect? • Let’s take it to an extreme, i.e., clustering coefficients ~1. • Implication 1: The graph must be a union of cliques.• Implication 2: The sizes of the cliques (communities) will have the

same distribution as the degree distribution. • The challenge is in characterizing the gray area:

clustering coefficients are high but not 1

July 13, 2012 Pinar - MMDS12 8

Page 9: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Pinar - MMDS12 9

Building the basis for a modelEmpirical Observations• Degree distributions are heavy tailed.• Clustering coefficients are highest for

small degree vertices.

July 13, 2012

We are not only trying to build a formal model, we are trying to formalize the model building process itself.

Seshadhri, Kolda, & Pinar, Phys. Rev. Let. E 2012

Theoretical Analysis Theorem: If a community has s edges then there must be Ω(√s) vertices with degree Ω(√s).

Hypothesis: Real-world interaction networks consist of a scale –free collection of relatively dense Erdős-Rényi graphs.

Page 10: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Pinar - MMDS12 10

Verifying the model

July 13, 2012

Prediction: Within a community, the degree variance should be small.

Prediction: the number of communities grows with the number of nodes.

Verification: Vertex degree is correlated to the average degree of neighbors.

And degrees of triangle vertices are close.

Verification: The sizes of the communities are small, and do not grow (or grow very slowly) for larger graphs. (e.g., Dunbar number).

Hypothesis: Real-world interaction networks consist of a scale –free collection of dense Erdős-Rényi graphs.

Page 11: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Matching the Clustering Coefficients

July 13, 2012 Pinar - MMDS12 11

• Random pairings cannot close wedges!– Models that achieve high clustering coefficients require

history to guide future edge insertions

• To get high clustering coefficient without history, we have to pre-group the nodes into affinity blocks

– Our theory: CL graph with high clustering coefficient must have dense ER block at its core

– Each affinity block is a near-clique

• Simplifying assumption: Non-overlapping blocks– Implies blocks must comprise nodes of similar degree

– Can tune clustering coefficient to be appropriate for the degree by changing connectivity of the block If nodes of different degrees are

in the same block, then the high-degree nodes will have low

clustering coefficients.

4

2

3

1

Page 12: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Pinar - MMDS12 12

Preprocessing: Determining Blocks

July 13, 2012

• Input: Desired degree-distribution– Assume nodes sorted by degree, least to greatest

– Ignore degree-1 vertices in this phase

• Algorithm:– Group nodes in order

– Bulk assign, if possible

• Output: Compressed information about blocks– Size of each block

– Starting index of each block

– “Weight” of each block (depends on connectivity)

– O(duniq) information, if compressed

• Observe: If degree distribution is power law, then so is the distribution of the block sizes

– Evocative of Dunbar’s number: Max block size = 100 for γ=2

2

2

2

2

2

2

2

2

2

2

2

2

2

3

3

3

3

3

3

3

3

3

3

3

homogeneous

heterogeneous

Page 13: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

BTER: Block Two-Level Erdős-Rényi

July 13, 2012 Pinar - MMDS12 13

Preprocessing:Create explicit

affinity blocks of nodes with same (or nearly same)

degree. The blocks are

determined by the degree

distribution.

Phase 1: Erdős-Rényi graphs in

each block. User-specified connectivity determines number of links within each block. A formula is given that

depends on lowest-degree node in block.

Phase 2: CL (aka weighted ER)

model on “excess” degree, creates

connections across blocks. Can run in tandem to Phase 1

by worked with expected excess

degree.

Page 14: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Scalable BTER: Independent Edges

July 13, 2012 Pinar - MMDS12 14

Choose phase 1 or 2?

Create Phase 2 edgeusing CL model on

“excess degree”

Create Block 1 edge per ER model with

connectivity ρ1

Choose 1st endpoint

Choose 2nd endpoint

Create Block K edge per ER model with

connectivity ρK

Choose 1st endpoint

Choose 2nd endpoint

Choose 1st endpoint

Choose 2nd endpoint

Choose block proportional to number of “samples” per block

Requires O(n) data to determine the various probabilities, and can be compressed to O(duniq).

Page 15: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Visualization of BTER Adjacency Matrix

July 13, 2012 Pinar - MMDS12 15

Red = Phase 1Blue = Phase 2

Page 16: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Visualization of BTER Graph

July 13, 2012 Pinar - MMDS12 16

Nodes colored by degree

(darker=higher)

Phase 1 = BluePhase 2 = Green

Image courtesy of Nurcan

Durak.

Page 17: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Pinar - MMDS12 17

Co-authorship (ca-AstroPh)

July 13, 2012

• 18,771 nodes 396,100 edges; based on arxiv• BTER

Phase 1 edges: 290,268 Phase 2 edges: 112,808

• Global clustering coefficient Original: 0.32, BTER: 0.31, CL: 0.01

• Normalized size of the largest connected component: Original: 0.95, BTER: 0.86, CL: 1.0

Eigenvalues are not determined by the degree distribution.

Page 18: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Pinar - MMDS12 18

Trust Network

July 13, 2012

• 75,879 vertices, 811,480 edges; based on Epinion web site; edges represent trust between two users

• BTER Phase 1 Edges: 300,162 Phase 2 Edges: 515,192

• Global clustering coefficientOriginal: 0.07, BTER: 0.07, CL: 0.03

• Normalized size of the largest connected componentOriginal: 1.0, BTER: 0.96, CL: 0.98

BTER provides good matches to real data, even when the clustering coefficients are not too high

Page 19: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Pinar - MMDS12 19

Concluding Remarks• Models are a critical challenge in network analysis• The challenge is not in building a formal model, but

formalizing the modeling process itself • We propose the Block Two-Level Erdös-Rényi (BTER)

– New theory says there must be many dense subgraphs for high clustering coefficient

– New BTER model explicitly creates dense communities using ER

– Exceptional similarities to real data in terms of clustering coefficients and eigenvalues

• The code is available at http://www.sandia.gov/~tgkolda/bter_supplement/.• Scalable versions (Hadoop and MPI) will be

available soon• For more information,

– Ali Pinar [email protected]

July 13, 2012

Page 20: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

A new workshop• SIAM Workshop on Network Science

– Proposed; under review• Dates: July 7-8, 2013• Place: San Diego, CA• Co-located with SIAM Annual Meeting • Contact:

– Ali Pinar ([email protected]), Sandia National Labs– Madhav Marathe ([email protected]), Virginia Tech

July 13, 2012 Pinar - MMDS12 20

Page 21: The Block Two-Level  Erdös-Rényi  (BTER) Graph Model

Pinar - MMDS12 21

Relevant Publications• Modeling

– C. Seshadhri, T. Kolda, and A. Pinar, “The Blocked Two-Level Erdos Renyi Graph Model,” Phys. Rev. E 2012.

– C. Seshadhri, A. Pinar, and T. Kolda, “An In Depth analysis of Stochastic Kronecker Graphs," submitted.– A. Pinar, C. Seshadhri, and T. Kolda, “The Similarity of Stochastic Kronecker Graphs to Edge-

Configuration Models,” SDM’12– C. Seshadhri, A. Pinar, and T. Kolda, “An In Depth study of Stochastic Kronecker Graphs,” ICDM’12

• Generating a random graph– J. Ray, A. Pinar, and C. Sehadhri, Are we there yet? When to stop a Markov chain while generating

random graphs," submitted.– I. Stanton and A. Pinar, “Constructing and uniform sampling graphs with prescribed joint degree

distribution using Markov Chains,” to appear in ACM JEA.– I. Stanton and A. Pinar, “Sampling graphs with prescribed joint degree distribution using Markov

Chains,” ALENEX’11.• Community structure

– C. Seshadhri, A. Pinar, and T. Kolda, “Fast Triangle Counting through Wedge Sampling," submitted. – M. Rocklin and A. Pinar, “On Clustering on Graphs with Multiple Edge Types,” Internet Math. 2012. – M. Rocklin and A. Pinar, “Latent Clustering on Graphs with Multiple Edge Types,” Proc. 8th Workshop on

Algorithms and Models for the Web Graph WAW’ 11.– M. Rocklin and A. Pinar, “Computing an Aggregate Edge-weight function for Clustering Graphs with

Multiple Edge Types,” WAW’10.July 13, 2012