2.5K-Graphs: from Sampling to Generation

26
1 2.5K-Graphs: from Sampling to Generation Minas Gjoka, Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡

description

2.5K-Graphs: from Sampling to Generation. Minas Gjoka , Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡. Motivation. Generate synthetic topologies that resemble real ones development and testing of algorithms, protocols anonymization - PowerPoint PPT Presentation

Transcript of 2.5K-Graphs: from Sampling to Generation

Page 1: 2.5K-Graphs: from Sampling   to Generation

1

2.5K-Graphs: from Sampling to Generation

Minas Gjoka, Maciej Kurant ‡,

Athina Markopoulou UC Irvine, ETZH ‡

Page 2: 2.5K-Graphs: from Sampling   to Generation

Motivation

• Generate synthetic topologies that resemble real ones– development and testing of algorithms, protocols– anonymization

• Obtaining full topology is not always possible– data owner not willing to release it– exhaustive measurement impractical due to size

2

Page 3: 2.5K-Graphs: from Sampling   to Generation

Our methodology

3

Node Samples

Synthetic Graph

Real Graph(unknown)

Crawl using

UIS, WIS, RW

EstimatedGraph Properties

Estimate

Generate

Page 4: 2.5K-Graphs: from Sampling   to Generation

Which graph properties?

• dk-series framework [Mahadevan et al, Sigcomm ’06]– 0K specifies the average node degree– 1K specifies the node degree distribution– 2K specifies the joint node degree distribution (JDD)– 3K specifies the distribution of subgraphs of 3 nodes– ..– nK specifies the entire graph

• Accuracy vs practicality

4

2.5K

Page 5: 2.5K-Graphs: from Sampling   to Generation

2.5K-Graphs

• Joint Node Degree Distribution (2K):

• Degree-dependent Average Clustering Coefficient Distribution:

5

all triangles using node uall possible triangles

1a

2a

4a

3b

3a

1b

4b

k

1 11 1

1 1 41 1 4 2

1 2 3 4

1

2

3

4

all nodes of degree k

c3a =

2 3

Page 6: 2.5K-Graphs: from Sampling   to Generation

Related Work• Network Modeling

– Exponential Random Graph Models [Handcock et al, ‘10]– Kronecker graphs [Kim & Leskovec, ‘11]– dk-Series [Mahadevan et al, Sigcomm ’06]

• Graph Generation– Simple 1K graphs [Molloy & Reed ’95]– 1K + [Bansal et. al. ’09]– 1K + Triangle Sequence [Newman ’09]– 1K + [Serrano & Boguna, ’05]– 2K [Krioukov et. al. ‘06, Stanton & Pinar ’11]– 3K [Krioukov et. al. ’06]

6

c2.5K

Page 7: 2.5K-Graphs: from Sampling   to Generation

7

Node Samples

Synthetic Graph

Real Graph(unknown)

Crawl using

UIS, WIS, RW

Estimatedc(k) & JDD

Estimate

Generate

Page 8: 2.5K-Graphs: from Sampling   to Generation

EstimationNode Samples

8

UIS – Uniform Independence Sample WIS – Weighted Independence Sample

sampling probability w(v) proportional to node degree

RW – Random Walk Sample

Page 9: 2.5K-Graphs: from Sampling   to Generation

Estimationc(k)

9

all triangles using node aall possible triangles

UIS WIS

- all nodes of degree k- neighbors of node a- # shared partners for nodes (a,b)

- all sampled nodes of degree k

- number of samples for node b

- sampling weight of node a

when ((RW)

Page 10: 2.5K-Graphs: from Sampling   to Generation

EstimationJDD

10

UIS WIS

- all nodes of degree k- all edges

- all sampled nodes of degree k

- sampling weight of node a

when (RW)

Page 11: 2.5K-Graphs: from Sampling   to Generation

EstimationRandom Walks

RW

Traversed Edges

Induced Edges

Induced Edges with safety margin M=1

Hybrid estimator combines Traversed Edges + Induced Edges with margin M

with safety margin M=0= WIS

Page 12: 2.5K-Graphs: from Sampling   to Generation

12

Node SamplesReal Graph(unknown)

Crawl using

UIS, WIS, RW

Estimatedc(k) & JDDGenerate

Estimate

SyntheticGraph

Page 13: 2.5K-Graphs: from Sampling   to Generation

Node Samples

2.5K Graph

Real Graph(unknown)

Crawl using

UIS, WIS, RW

Estimate

Targetc(k)

TargetJDD

simple graph exact JDD

many triangles

2K ConstructionSmart

DoubleEdge

Swaps

Generate

Page 14: 2.5K-Graphs: from Sampling   to Generation

2K Construction (1)Step 1

14

2Kk

1 1

1 1

1 1 4

1 1 4 2

1 2 3 4

1

2

3

4

2

1

2

2

k #

1

2

3

4

12

1),1(2i iK

22

2),2(2i iK

36

3),3(2i iK

48

4),4(2i iK

1K

Page 15: 2.5K-Graphs: from Sampling   to Generation

2K Construction (2)Step 1

15

1 11 1

1 1 41 1 4 2

1 2 3 4

1

2

3

4

2122

k #

1

2

3

4

1123344

v kv

1a

1b

2a

3a

3b

4a

4b

1a

2a

4a

3b

3a

1b

4b

2K

1K

Page 16: 2.5K-Graphs: from Sampling   to Generation

2K Construction (1)Step 2

16

1 11 1

1 1 41 1 4 2

1 2 3 4

1

2

3

4

1a

2a

4a

3b

3a

1b

4b

1 201 902 153 703 5

4 255 0

v kv rv1a

1b

2a

3a

3b

4a

4b

0

25

75

50

20

1a 2a

155

3b4b

90

1b

4a

70

3a

k

Page 17: 2.5K-Graphs: from Sampling   to Generation

2K Construction (2)Step 2

17

1 11 1

1 1 41 1 4 2

1 2 3 4

1

2

3

4

1a

2a

4a

3b

3a

1b

4b

1123344

v kv r1a

1b

2a

3a

3b

4a

4b

(3b,4b)(1a,4a)(1a,2a)

0

25

75

50

20

1a 2a

155

3b4b

90

1b

4a

70

3a

k

(1b,2a)

(1b,4b)

(2a,4b)(2a,3b)

(1b,3a)

(2a,4a)(1a,3b)

(1b,3b)(4a,4b)(3b,4a) (1a,4b)

(2a,3a)(3a,3b)(3a,4b)

(3a,4a)(1a,1b) (1b,4a)

(1a,3a)

1 0

1 0

1 0

4 34

3

4 33 2

4 3

1 01

0

2 1

2 1

1 0

3 2

1 0

1 0

1 0

1 0

1 0

1 0

2 0

2 13 2

3 2

2 1

3 2

3 2

2 1

1 0

2 12

1

Page 18: 2.5K-Graphs: from Sampling   to Generation

2K ConstructionStep 3

18

1 11 1

1 1 41 1 4 2

1 2 3 4

1

2

3

4

1a

2a

4a

3b

3a

1b

4b

1123344

v kv r1a

1b

2a

3a

3b

4a

4b

k

1 0

1 0

1 0

4 34

3

4 33 2

4 3

1 01

0

2 1

2 1

1 0

3 2

1 0

1 0

1 0

1 0

1 0

1 0

2 0

2 13 2

3 2

2 1

3 2

3 2

2 1

1 0

2 12

1 4b 4a

3a3bk=3

k=4

Degree pair (4,3)

• Construction stuck.• Iterate over all unsaturated degree

pairs• Iterate over all unsaturated node

pairs for the degree pair (4,3)• Move 1: Add edge (4b, 3a)• Move 2: Add edge (4b,3*) or (3a,4*)• Move 3: Add edge (4*, 3*)

Page 19: 2.5K-Graphs: from Sampling   to Generation

Node Samples

2.5K Graph

Real Graph(unknown)

Crawl using

UIS, WIS, RW

Estimate

Targetc(k)

TargetJDD

simple graph exact JDD

many triangles

2K ConstructionSmart

DoubleEdge

Swaps

Generate

Page 20: 2.5K-Graphs: from Sampling   to Generation

“Smart” double edge swaps2K-preserving c(k)-targeting

20

degree k

1

Target

Destroy triangles

Create triangles

k1

k2

k1

k3

Double edge swap [Mahadevan et. al., Sigcomm ‘06]

How we select edges for double edge swaps– Triangle creation

» select edges with low number of shared partners– Triangle destruction

» select random edges

Current at end of 2K

JDD(k1,k2) JDD(k1,k3)unchanged

Page 21: 2.5K-Graphs: from Sampling   to Generation

21

Node SamplesReal Graph(unknown)

Crawl using

UIS, WIS, RW

Generate

Estimate

Evaluation

2.5K Graph Estimatedc(k) & JDD

Page 22: 2.5K-Graphs: from Sampling   to Generation

Evaluation

Total Number of Triangles

Average Clustering Coefficient

Number ofNodes

Number ofEdges

AverageDegree

Page 23: 2.5K-Graphs: from Sampling   to Generation

Evaluation Efficiency of Estimators (RW)

23

Error Metric :

Hybrid combines advantages of IE & TE

Facebook New Orleans

Page 24: 2.5K-Graphs: from Sampling   to Generation

EvaluationSpeed of 2.5K Graph Generation

24

our 2.5K prior 2.5K

Higher gain for graphs with more

triangles

Page 25: 2.5K-Graphs: from Sampling   to Generation

EvaluationFrom Sampling to Generation

25

Facebook New Orleans

Page 26: 2.5K-Graphs: from Sampling   to Generation

Conclusion

• Complete and practical methodology to generate synthetic graphs from node samples– novel estimators that measure c(k) and JDD– novel and fast 2.5K generator

• Python implementation– http://tinyurl.com/2-5K-Generator

26

Node Samples

Synthetic Graph

Real Graph(unknown)

Crawl usingUIS, WIS, RW

c(k) & JDD

Estimate

Generate