Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph...
Transcript of Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph...
![Page 1: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/1.jpg)
Manuel Then, Moritz Kaufmann, Alfons Kemper, Thomas Neumann
Technical University of Munich
Chair of Database Systems
Evaluation of Parallel Graph Loading Techniques
![Page 2: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/2.jpg)
3Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
![Page 3: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/3.jpg)
4Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
![Page 4: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/4.jpg)
Goal: Efficiently load a given graph dataset for explorative analytics
5Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
![Page 5: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/5.jpg)
Problem: The optimal way of loading the graph depends on various factors:
• Format of the graph data
• Source of the data
• Properties of the input data
• Target graph data structure
• Execution machine
Graph loading pipeline must be adapted to the scenario at hand
6Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Scenario-specific Graph Loading
![Page 6: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/6.jpg)
Goal: Efficiently load a given graph dataset for explorative analytics
7Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
![Page 7: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/7.jpg)
Goal: Efficiently load a given graph dataset for explorative analytics
8Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
![Page 8: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/8.jpg)
Goal: Efficiently load a given graph dataset for explorative analytics
9Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Can input data
be read multiple
times?
![Page 9: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/9.jpg)
Goal: Efficiently load a given graph dataset for explorative analytics
10Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Random
access
possible?
Can input data
be read multiple
times?
![Page 10: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/10.jpg)
Goal: Efficiently load a given graph dataset for explorative analytics
11Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Random
access
possible?
Can input data
be read multiple
times?
Explicit vertex
list available?
![Page 11: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/11.jpg)
Goal: Efficiently load a given graph dataset for explorative analytics
12Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Random
access
possible?
Can input data
be read multiple
times?
Explicit vertex
list available?
![Page 12: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/12.jpg)
Goal: Efficiently load a given graph dataset for explorative analytics
13Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Random
access
possible?
Can input data
be read multiple
times?
Explicit vertex
list available?
Which data
structure to
generate?
![Page 13: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/13.jpg)
Goal: Efficiently load a given graph dataset for explorative analytics
14Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Random
access
possible?
Can input data
be read multiple
times?
Explicit vertex
list available?
Which data
structure to
generate?
![Page 14: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/14.jpg)
Goal: Efficiently load a given graph dataset for explorative analytics
15Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
![Page 15: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/15.jpg)
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
16Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
![Page 16: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/16.jpg)
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
17Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
![Page 17: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/17.jpg)
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
18Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
![Page 18: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/18.jpg)
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
19Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
![Page 19: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/19.jpg)
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
20Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
![Page 20: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/20.jpg)
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
21Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
![Page 21: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/21.jpg)
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
Vectorized decimal parsing
• Leverage wide vector units for identifier parsing
22Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
![Page 22: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/22.jpg)
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
Vectorized decimal parsing
• Leverage wide vector units for identifier parsing
23Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
2x 20x 200x
Parsers
T. Muhlbauer, W. Rodiger, R. Seilbeck, A. Reiser, A. Kemper, and T. Neumann
Instant loading for main memory databases.
Proceedings of the VLDB Endowment, 2013.
![Page 23: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/23.jpg)
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
Vectorized decimal parsing
• Leverage wide vector units for identifier parsing
24Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
![Page 24: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/24.jpg)
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
Vectorized decimal parsing
• Leverage wide vector units for identifier parsing
25Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
![Page 25: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/25.jpg)
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
Vectorized decimal parsing
• Leverage wide vector units for identifier parsing
Parser code generation
26Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
![Page 26: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/26.jpg)
Goal: Efficiently load a given graph dataset for explorative analytics
27Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
![Page 27: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/27.jpg)
Closely related areas
28Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
![Page 28: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/28.jpg)
Closely related areas
Map of Neighbor Lists => No relabeling (Identity)
• Directly use dataset identifiers
• Runtime overhead for neighbor and property accesses
• Simple and efficient to load
29Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
1
1 2
0 2
![Page 29: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/29.jpg)
Closely related areas
Map of Neighbor Lists => No relabeling (Identity)
• Directly use dataset identifiers
• Runtime overhead for neighbor and property accesses
• Simple and efficient to load
30Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
1
1 2
0 2
Hash-based
access
![Page 30: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/30.jpg)
Closely related areas
Map of Neighbor Lists => No relabeling (Identity)
• Directly use dataset identifiers
• Runtime overhead for neighbor and property accesses
• Simple and efficient to load
Compressed Sparse Row (CSR) => Dense relabeling
• Dense identifiers [0, |V|-1]
• Packed, sequential memory layout
• Allows offset-based data structure access
• e.g. for neighbor lists, or properties
• Overhead during loading
31Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
1
1 2
0 2
1 1 2 0 2
Hash-based
access
![Page 31: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/31.jpg)
Closely related areas
No relabeling (Identity) => Map of Neighbor Lists
• Directly use dataset identifiers
• Runtime overhead for neighbor and property accesses
• Simple and efficient to load
Dense relabeling => Compressed Sparse Row (CSR)
• Dense identifiers [0, |V|-1]
• Packed, sequential memory layout
• Allows offset-based data structure access
• e.g. for neighbor lists, or properties
• Overhead during loading
32Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
1
1 2
0 2
1 1 2 0 2
Hash-based
access
Offset-based
access
![Page 32: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/32.jpg)
Mapping
• Assign dense identifiers while reading the input data
• Global: All workers use a shared map
• Local: Each worker creates a local relabeling
33Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies
![Page 33: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/33.jpg)
Mapping
• Assign dense identifiers while reading the input data
• Global: All workers use a shared map
• Local: Each worker creates a local relabeling
Collection
• Gather unique identifiers while reading the input
• Assign dense identifiers at the end
• Global: Shared identifier set for all workers
• Local: Use a local set per worker
34Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies
∪ ∪ ∪
![Page 34: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/34.jpg)
Mapping
• Assign dense identifiers while reading the input data
• Global: All workers use a shared map
• Local: Each worker creates a local relabeling
Collection
• Gather unique identifiers while reading the input
• Assign dense identifiers at the end
• Global: Shared identifier set for all workers
• Local: Use a local set per worker
Relabeling is finalized/applied when the graph data structure is written
35Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies
∪ ∪ ∪
![Page 35: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/35.jpg)
Graph loading times for various relabeling strategies
No further dataset properties leveraged
36Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies - Measurements
![Page 36: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/36.jpg)
Graph loading times for various relabeling strategies
No further dataset properties leveraged
37Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies - Measurements
![Page 37: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/37.jpg)
Goal: Efficiently load a given graph dataset for explorative analytics
38Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
![Page 38: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/38.jpg)
Explicit vertex lists
• All unique vertices in the dataset are known beforehand
• No need to find and count vertices => improves loading efficiency
39Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties
![Page 39: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/39.jpg)
Explicit vertex lists
• All unique vertices in the dataset are known beforehand
• No need to find and count vertices => improves loading efficiency
Partitioned edge list
• Edge list partitioned by source vertex
• Each source vertex has a responsible worker thread
• determined by the input data chunk
• Significantly reduces worker communication overhead
40Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties
![Page 40: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/40.jpg)
Explicit vertex lists
• All unique vertices in the dataset are known beforehand
• No need to find and count vertices => improves loading efficiency
Partitioned edge list
• Edge list partitioned by source vertex
• Each source vertex has a responsible worker thread
• determined by the input data chunk
• Significantly reduces worker communication overhead
41Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties
Partitioned
1 2
1 3
1 4
2 1
2 4
3 1
3 2
4 3
Unpartitioned
4 3
1 3
3 1
1 4
2 1
1 2
3 2
2 4
![Page 41: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/41.jpg)
42Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties - Measurements
Graphs
• LDBC-1000, |V| = 3.6M, |E| = 447M
• Twitter , |V| = 41.6M, |E| = 1.5B
![Page 42: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/42.jpg)
43Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties - Measurements
Graphs
• LDBC-1000, |V| = 3.6M, |E| = 447M
• Twitter , |V| = 41.6M, |E| = 1.5B
![Page 43: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/43.jpg)
44Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties - Measurements
Graphs
• LDBC-1000, |V| = 3.6M, |E| = 447M
• Twitter , |V| = 41.6M, |E| = 1.5B
![Page 44: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/44.jpg)
45Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties - Measurements
Graphs
• LDBC-1000, |V| = 3.6M, |E| = 447M
• Twitter , |V| = 41.6M, |E| = 1.5B
![Page 45: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/45.jpg)
46Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Comparison with Existing Systems
Twitter LDBC
Oracle PGX 2153s 632s
GraphBIG out of memory 1682s
Ours non-partitioned 88s 24s
Ours partitioned 34s 7s
Graphs
• LDBC-1000, |V| = 3.6M, |E| = 447M
• Twitter , |V| = 41.6M, |E| = 1.5B
Machine:
• 2x Intel Xeon E5-2660 v2 2 × 20 @ 2.2GHz)
• 256GB, Ubuntu 15.10, kernel 4.2.0
![Page 46: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/46.jpg)
CSR (relabeled)
Load + Run = Total
Neighbors Map (identity)
Load + Run = Total
PageRank 37s 33s 70s---- 25s 194s 219s----
Triangle Counting 37s 49s 86s---- 25s 66s 92s----
47Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Influence on Analytics
Graphs
• Twitter , |V| = 41.6M, |E| = 1.5B
Machine:
• 2x Intel Xeon E5-2660 v2 2 × 20 @ 2.2GHz)
• 256GB, Ubuntu 15.10, kernel 4.2.0
![Page 47: Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •](https://reader034.fdocuments.in/reader034/viewer/2022042316/5f05868f7e708231d41365a7/html5/thumbnails/47.jpg)
Optimal loading pipeline for a graph dataset is highly dependent on the
• Data format
• Source of the data
• Properties of the dataset
• Algorithm-dependent graph data structure
• Target machine
Custom iterative identifier parsing always beneficial
Concurrent identifier relabeling mostly beneficial
• More challenging than identity mapping, but usually worth it
Leveraging properties of the dataset can lead to enormous speedups
48Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Summary