Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of...

55
A Sampling-Based Tool for Scaling Graph Datasets ICPE2020 11 th ACM / SPEC International Conference on Performance Engineering Ahmed Musaafir, Alexandru Uta, Henk Dreuning, Ana-Lucia Varbanescu Vrije Universiteit Amsterdam & University of Amsterdam

Transcript of Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of...

Page 1: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

A Sampling-Based Tool for Scaling Graph Datasets

ICPE2020 11th ACM / SPECInternational Conference on Performance Engineering

Ahmed Musaafir, Alexandru Uta, Henk Dreuning, Ana-Lucia Varbanescu

Vrije Universiteit Amsterdam & University of Amsterdam

Page 2: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Context

- Graph datasets- Used in different domains (e.g., logistics, biology, social networks, infrastructure networks)

- Graph processing- Different graph processing platforms: Giraph, GraphMat, Gunrock, etc.

- Graph analytics benchmarking- Platform, Algorithm, Dataset, Hardware

- No in-depth evaluation or performance analysis

- Which properties of the graph dataset affect performance?

2

Page 3: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Context

3

Correlated datasets Uncorrelated datasets

Page 4: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Problem

- Lack of representative graph datasets

- Synthetic graph generators- Generate a graph from scratch

- Allow controlling specific graph properties only

- Graph archives- Few types of graphs

- Small collection and size

4

Page 5: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Solution

- Graph scaling

- Control certain graph properties- Predict and tune the properties of scaled-up graphs based on models, guidelines

- Tool to generate diverse families of graphs fast

5

Page 6: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Solution

- Graph scaling

- Control certain graph properties- Predict and tune the properties of scaled-up graphs based on models, guidelines

- Tool to generate diverse families of graphs fast

Graph Scaling ToolInput● Graph G

● Scaling factor s● Additional parameters

OutputScaled graph G

e (s times)

6

Page 7: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down

7

Page 8: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Graph Sampling

- Node-based Sampling- Node Sampling

- Edge-based Sampling- Random Edge Sampling

- Totally-Induced Edge Sampling (TIES)

- Traversal-based Sampling- Random Walk

- Forest Fire

8

Page 9: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Graph Sampling

- Node-based Sampling- Node Sampling

- Edge-based Sampling- Random Edge Sampling

- Totally-Induced Edge Sampling (TIES)

- Traversal-based Sampling- Random Walk

- Forest Fire

9

Property preservation quality per sampling algorithm, represented as likelihood from low (--) to high (++)

Page 10: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Results

10

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3

#Nodes 3,072,441 2,457,952 1,536,220 921,733

#Edges 117,185,083 108,686,099 73,626,482 42,194,208

Avg. degree 76.28 88.44 95.85 91.55

Diameter 9 9 10 8

Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05

Components 1 7 17 36

Avg. Clustering Coeff. 0.16 0.15 0.15 0.14

Avg. Shortest path 4.19 4.05 3.97 3.95

Page 11: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Results

11

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3

#Nodes 3,072,441 2,457,952 1,536,220 921,733

#Edges 117,185,083 108,686,099 73,626,482 42,194,208

Avg. degree 76.28 88.44 95.85 91.55

Diameter 9 9 10 8

Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05

Components 1 7 17 36

Avg. Clustering Coeff. 0.16 0.15 0.15 0.14

Avg. Shortest path 4.19 4.05 3.97 3.95

Page 12: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Results

12

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3

#Nodes 3,072,441 2,457,952 1,536,220 921,733

#Edges 117,185,083 108,686,099 73,626,482 42,194,208

Avg. degree 76.28 88.44 95.85 91.55

Diameter 9 9 10 8

Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05

Components 1 7 17 36

Avg. Clustering Coeff. 0.16 0.15 0.15 0.14

Avg. Shortest path 4.19 4.05 3.97 3.95

Page 13: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Results

13

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3

#Nodes 3,072,441 2,457,952 1,536,220 921,733

#Edges 117,185,083 108,686,099 73,626,482 42,194,208

Avg. degree 76.28 88.44 95.85 91.55

Diameter 9 9 10 8

Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05

Components 1 7 17 36

Avg. Clustering Coeff. 0.16 0.15 0.15 0.14

Avg. Shortest path 4.19 4.05 3.97 3.95

Page 14: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Results

14

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3

#Nodes 3,072,441 2,457,952 1,536,220 921,733

#Edges 117,185,083 108,686,099 73,626,482 42,194,208

Avg. degree 76.28 88.44 95.85 91.55

Diameter 9 9 10 8

Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05

Components 1 7 17 36

Avg. Clustering Coeff. 0.16 0.15 0.15 0.14

Avg. Shortest path 4.19 4.05 3.97 3.95

Page 15: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Results

15

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3

#Nodes 3,072,441 2,457,952 1,536,220 921,733

#Edges 117,185,083 108,686,099 73,626,482 42,194,208

Avg. degree 76.28 88.44 95.85 91.55

Diameter 9 9 10 8

Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05

Components 1 7 17 36

Avg. Clustering Coeff. 0.16 0.15 0.15 0.14

Avg. Shortest path 4.19 4.05 3.97 3.95

Page 16: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Results

16

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3

#Nodes 3,072,441 2,457,952 1,536,220 921,733

#Edges 117,185,083 108,686,099 73,626,482 42,194,208

Avg. degree 76.28 88.44 95.85 91.55

Diameter 9 9 10 8

Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05

Components 1 7 17 36

Avg. Clustering Coeff. 0.16 0.15 0.15 0.14

Avg. Shortest path 4.19 4.05 3.97 3.95

Page 17: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Results

17

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3

#Nodes 3,072,441 2,457,952 1,536,220 921,733

#Edges 117,185,083 108,686,099 73,626,482 42,194,208

Avg. degree 76.28 88.44 95.85 91.55

Diameter 9 9 10 8

Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05

Components 1 7 17 36

Avg. Clustering Coeff. 0.16 0.15 0.15 0.14

Avg. Shortest path 4.19 4.05 3.97 3.95

Page 18: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Results

18

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3

#Nodes 3,072,441 2,457,952 1,536,220 921,733

#Edges 117,185,083 108,686,099 73,626,482 42,194,208

Avg. degree 76.28 88.44 95.85 91.55

Diameter 9 9 10 8

Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05

Components 1 7 17 36

Avg. Clustering Coeff. 0.16 0.15 0.15 0.14

Avg. Shortest path 4.19 4.05 3.97 3.95

Page 19: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Results

19

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3

#Nodes 3,072,441 2,457,952 1,536,220 921,733

#Edges 117,185,083 108,686,099 73,626,482 42,194,208

Avg. degree 76.28 88.44 95.85 91.55

Diameter 9 9 10 8

Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05

Components 1 7 17 36

Avg. Clustering Coeff. 0.16 0.15 0.15 0.14

Avg. Shortest path 4.19 4.05 3.97 3.95

Page 20: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Results

20

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3

#Nodes 3,072,441 2,457,952 1,536,220 921,733

#Edges 117,185,083 108,686,099 73,626,482 42,194,208

Avg. degree 76.28 88.44 95.85 91.55

Diameter 9 9 10 8

Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05

Components 1 7 17 36

Avg. Clustering Coeff. 0.16 0.15 0.15 0.14

Avg. Shortest path 4.19 4.05 3.97 3.95

Page 21: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Down: Results

21

Com-Orkut G (original) Gs 0.8 Gs 0.5 Gs 0.3

#Nodes 3,072,441 2,457,952 1,536,220 921,733

#Edges 117,185,083 108,686,099 73,626,482 42,194,208

Avg. degree 76.28 88.44 95.85 91.55

Diameter 9 9 10 8

Density 2.48e-05 3.59e-05 6.24e-05 9.93e-05

Components 1 7 17 36

Avg. Clustering Coeff. 0.16 0.15 0.15 0.14

Avg. Shortest path 4.19 4.05 3.97 3.95

Page 22: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up

22

Page 23: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: Existing work

- Graph generators- Datagen, Graph500, R-MAT

- Graph evolution algorithms- Focus on evolving the graph

- Graph scalers- GScaler, ReCoN, Musketeer

23

Page 24: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

- Obtain samples Gi of the original graph G

- Interconnect the different samples

Scaling Up: Method

24

Page 25: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

- Obtain samples Gi of the original graph G

- Interconnect the different samples

- Example: scale up a graph 4.5 times- Sample size: 0.5

- Results in 9 different samples

Scaling Up: Method

25

Example of scaling up a graphGs 0...8 = Sampled versions of the graph

Page 26: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

- Interconnection topologies- Star; Chain; Ring; Fully-connected

- Selecting bridge vertices- Random; High-degree

- Multi-edge interconnections- n number of interconnections

- Directed; undirected

Scaling Up: Method

26

Page 27: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: Impact on properties

- Different parameters- Interconnection topologies

- Selecting bridge vertices

- Multi-edge interconnections

- Sampling algorithm- Sample size

- Scaling factor

- Dataset

27

Page 28: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: Measuring the quality of graph output

- Given the same parameters, the properties of the expanded graph should be

predictable.

- Models & guidelines

- "In case you want to have the scaled-up graph with a larger diameter, choose a chain

topology with a single random bridge".

28

Page 29: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: Measuring the quality of graph output

- Given the same parameters, the properties of the expanded graph should be

predictable.

- Models & guidelines

- "In case you want to have the scaled-up graph with a larger diameter, choose a chain

topology with a single random bridge".

Maximum diameter:

29

Page 30: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

30

Page 31: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

31

Page 32: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

32

Page 33: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

33

Page 34: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

34

Page 35: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

35

Page 36: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

36

Page 37: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

37

Page 38: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

38

Page 39: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

39

Page 40: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

40

Page 41: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

41

Page 42: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: ResultsFB G (original) G x3 G x3 G x3 G x3 G x3

Sample size - 0.5 0.5 0.5 0.5 0.5

Topology - Star Chain Fully Connected Star Star

Bridge - Random Random Random Random High-degree

#Interconnection - 1 1 1 45,000 45,000

#Nodes 4,039 12,117 12,114 12,114 12,114 12,115

#Edges 88,234 339,497 340,091 339,777 559,798 560,168

Avg. degree 43.69 56.04 56.15 56.09 92.42 92.48

Diameter 8 19 31 15 6 6

Density 1.10e-2 4.62e-3 4.63e-3 4.63e-3 7.62e-3 7.63e-3

Components 1 7 9 7 2 10

Avg. Clustering Coeff. 0.62 0.63 0.63 0.63 0.31 0.46

Avg. Shortest path 3.69 9.26 11.79 6.35 2.65 2.92

42

Page 43: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Scaling Up: Results

43

Page 44: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

- Generate families of scaled-up graphs- Used for performance analysis

- How do properties of the scaled-up graphs impact performance?

- Processing time of Breadth-First Search and PageRank- Using GraphMat and the Graphalytics benchmark suite

- Two datasets - Available in graph archives: "com-livejournal" and "12month1"

Case studies

44

Page 45: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

- Fixed parameters- 0.5 sample size

- Random bridge interconnections

- Combining parameters- Scaling factor: 2, 4, 8*

- Topologies: star, chain, ring, fully-connected

- Number of interconnections: 1, 20,000

- Different sampling algorithms and graph copies

- Sampling algorithms- Total-Induced Edge Sampling (TIES)

- Forest-Fire

*only for graph copies

Case studies: Scale-up Configuration

45

Page 46: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Case studies: Results BFS

46

Page 47: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Case studies: Results BFS

47

Forest Fire

Page 48: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Case studies: Results BFS

48

Forest Fire

Page 49: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Case studies: Results BFS

49

TIES

Page 50: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Case studies: Results BFS

50

Graph Copy

Page 51: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Case studies: Results PR

51

Forest Fire

Page 52: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Case studies: Results PR

52

TIES

Page 53: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Case studies: Results PR

53

Graph Copy

Page 54: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Tool Implementation

- Performance- Single node/thread

- Parallel & Distributed Implementation

- Auto-tuner

- Tool publicly available

github.com/amusaafir/graph-scaling

54

Page 55: Vrije Universiteit Amsterdam & University of …Vrije Universiteit Amsterdam & University of Amsterdam Context - Graph datasets - Used in different domains (e.g., logistics, biology,

Conclusion

55

- Sampling-based method for scaling graph datasets- Obtain families of similar graphs

- Certain properties controlled by user requirements

- Validated our tool on a set of different graph datasets

- Diverse graph families used for understanding graph processing behavior