GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

33
Marcus Paradies , Michael Rudolf, Christof Bornhoevd, Wolfgang Lehner GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores GRADES’14 Workshop June 22, 2014

Transcript of GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Page 1: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Marcus Paradies, Michael Rudolf, Christof Bornhoevd, Wolfgang Lehner

GRATIN: Accelerating Graph Traversalsin Main-Memory Column StoresGRADES’14 Workshop

June 22, 2014

Page 2: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graphs from an Enterprise View

Relational+ Application Logic

Application Layer

RDBMS

Graph Processing

Data Data • Recursive Queries

• Chained Joins

: Data already in RDBMS

6 SQL as interface

6 Data transfer to application

Relational + Graph+ Application Logic

Application Layer

RDBMS GDBMS

Replicate

Data Data Data Data

: Efficient processing in GDBMS

6 Processing on replicated data

6 No combination with relational data

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2

Page 3: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graphs from an Enterprise View

Relational+ Application Logic

Application Layer

RDBMS

Graph Processing

Data Data • Recursive Queries

• Chained Joins

: Data already in RDBMS

6 SQL as interface

6 Data transfer to application

Relational + Graph+ Application Logic

Application Layer

RDBMS GDBMS

Replicate

Data Data Data Data

: Efficient processing in GDBMS

6 Processing on replicated data

6 No combination with relational data

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2

Page 4: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graphs from an Enterprise View

Relational+ Application Logic

Application Layer

RDBMS

Graph Processing

Data Data • Recursive Queries

• Chained Joins

: Data already in RDBMS

6 SQL as interface

6 Data transfer to application

Relational + Graph+ Application Logic

Application Layer

RDBMS GDBMS

Replicate

Data Data Data Data

: Efficient processing in GDBMS

6 Processing on replicated data

6 No combination with relational data

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2

Page 5: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graphs from an Enterprise View

Relational+ Application Logic

Application Layer

RDBMS

Graph Processing

Data Data • Recursive Queries

• Chained Joins

: Data already in RDBMS

6 SQL as interface

6 Data transfer to application

Relational + Graph+ Application Logic

Application Layer

RDBMS GDBMS

Replicate

Data Data Data Data

: Efficient processing in GDBMS

6 Processing on replicated data

6 No combination with relational data

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2

Page 6: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graphs from an Enterprise View

Relational+ Application Logic

Application Layer

RDBMS

Graph Processing

Data Data • Recursive Queries

• Chained Joins

: Data already in RDBMS

6 SQL as interface

6 Data transfer to application

Relational + Graph+ Application Logic

Application Layer

RDBMS GDBMS

Replicate

Data Data Data Data

: Efficient processing in GDBMS

6 Processing on replicated data

6 No combination with relational data

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2

Page 7: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graph Processing (The New World)

Graph Representation

id=1name=Johntype=User

id=2title=The Shiningtype=Product

id=3title=The Standtype=Product

id=4name=Horrortype=Category

id=5name=Literaturetype=Category

type=category

type=similar

type=belongs

type=belongs

type=ratedrating=4.0

type=ratedrating=5.0

Example graph

id type name . . . title

1 User John . . . ?2 Product ? . . . The Shining3 Product ? . . . The Stand4 Category Horror . . . ?5 Category Literature . . . ?

Vertex table

Vs Vt type . . . rating

3 2 similar . . . ?2 3 similar . . . ?2 4 belongs . . . ?3 4 belongs . . . ?1 3 rated . . . 5.01 2 rated . . . 4.04 5 category . . . ?

Edge table

• Each vertex/edge represented as a single record in universal tables

• Support for transactions and compression

• Combination with other data models (spatial, text, temporal)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3

Page 8: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graph Processing (The New World)

Graph Representationid=1name=Johntype=User

id=2title=The Shiningtype=Product

id=3title=The Standtype=Product

id=4name=Horrortype=Category

id=5name=Literaturetype=Category

type=category

type=similar

type=belongs

type=belongs

type=ratedrating=4.0

type=ratedrating=5.0

Example graph

id type name . . . title

1 User John . . . ?2 Product ? . . . The Shining3 Product ? . . . The Stand4 Category Horror . . . ?5 Category Literature . . . ?

Vertex table

Vs Vt type . . . rating

3 2 similar . . . ?2 3 similar . . . ?2 4 belongs . . . ?3 4 belongs . . . ?1 3 rated . . . 5.01 2 rated . . . 4.04 5 category . . . ?

Edge table

• Each vertex/edge represented as a single record in universal tables

• Support for transactions and compression

• Combination with other data models (spatial, text, temporal)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3

Page 9: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graph Processing (The New World)

Graph Representationid=1name=Johntype=User

id=2title=The Shiningtype=Product

id=3title=The Standtype=Product

id=4name=Horrortype=Category

id=5name=Literaturetype=Category

type=category

type=similar

type=belongs

type=belongs

type=ratedrating=4.0

type=ratedrating=5.0

Example graph

id type name . . . title

1 User John . . . ?2 Product ? . . . The Shining3 Product ? . . . The Stand4 Category Horror . . . ?5 Category Literature . . . ?

Vertex table

Vs Vt type . . . rating

3 2 similar . . . ?2 3 similar . . . ?2 4 belongs . . . ?3 4 belongs . . . ?1 3 rated . . . 5.01 2 rated . . . 4.04 5 category . . . ?

Edge table

• Each vertex/edge represented as a single record in universal tables

• Support for transactions and compression

• Combination with other data models (spatial, text, temporal)© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3

Page 10: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graph Processing (The New World)

Query Execution

EDGES

σ (Selection)

� (Traversal)

π (Projection)

EDGES

σ (Selection)

� (Traversal)

π (Projection)

Graph TraversalsExample query:{ id:D }-a->(1,*)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4

Page 11: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graph Processing (The New World)

Query Execution

EDGES

σ (Selection)

� (Traversal)

π (Projection)

EDGES

σ (Selection)

� (Traversal)

π (Projection)

Graph TraversalsExample query:{ id:D }-a->(1,*)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4

Page 12: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graph Processing (The New World)

Query Execution

EDGES

σ (Selection)

� (Traversal)

π (Projection)

EDGES

σ (Selection)

� (Traversal)

π (Projection)

Graph TraversalsExample query:{ id:D }-a->(1,*)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4

Page 13: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graph Processing (The New World)

Query Execution

EDGES

σ (Selection)

� (Traversal)

π (Projection)

EDGES

σ (Selection)

� (Traversal)

π (Projection)

Graph TraversalsExample query:{ id:D }-a->(1,*)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4

Page 14: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graph Processing (The New World)

Query Execution

EDGES

σ (Selection)

� (Traversal)

π (Projection)

EDGES

σ (Selection)

� (Traversal)

π (Projection)

Graph TraversalsExample query:{ id:D }-a->(1,*)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4

Page 15: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graph Processing (The New World)

Query Execution

EDGES

σ (Selection)

� (Traversal)

π (Projection)

EDGES

σ (Selection)

� (Traversal)

π (Projection)

Graph TraversalsExample query:{ id:D }-a->(1,*)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4

Page 16: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graph Processing (The New World)

Query Execution

EDGES

σ (Selection)

� (Traversal)

π (Projection)

EDGES

σ (Selection)

� (Traversal)

π (Projection)

Graph TraversalsExample query:{ id:D }-a->(1,*)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4

Page 17: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graph Processing (The New World)

Query Execution

EDGES

σ (Selection)

� (Traversal)

π (Projection)

EDGES

σ (Selection)

� (Traversal)

π (Projection)

Graph TraversalsExample query:{ id:D }-a->(1,*)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4

Page 18: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Graph Processing (The New World)

Query Execution

EDGES

σ (Selection)

� (Traversal)

π (Projection)

EDGES

σ (Selection)

� (Traversal)

π (Projection)

Graph TraversalsExample query:{ id:D }-a->(1,*)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4

Page 19: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Edge Clustering

• Clustering by edge type preservessubgraph meaning

• Clustering by edge source preserves vertexneighborhood

• Increases spatial locality in memory

• Allows reducing scan to range in column

Type ClusteringEdge Clustering

Vs Vt Type

D F aA D aA B aA C aE B aE G aD B bB E bF G b

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 5

Page 20: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

GRATIN

Column

2

2

2

1

3

1

2

3

4

5

B1

B2

Minimal blocksize: 2

2

1

2

1

2

3

Value Blocks

Block ranges

2

1 3

5

• gratin replaces full column scans by block scans

• gratin indexes column with variable block size

• Allows efficient handling of vertices with high outdegree

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6

Page 21: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

GRATIN

Column

2

2

2

1

3

1

2

3

4

5

B1

B2

Minimal blocksize: 2

2

1

2

1

2

3

Value Blocks

Block ranges

2

1 3

5

• gratin replaces full column scans by block scans

• gratin indexes column with variable block size

• Allows efficient handling of vertices with high outdegree

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6

Page 22: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

GRATIN

Column

2

2

2

1

3

1

2

3

4

5

B1

B2

Minimal blocksize: 2

2

1

2

1

2

3

Value Blocks

Block ranges

2

1 3

5

• gratin replaces full column scans by block scans

• gratin indexes column with variable block size

• Allows efficient handling of vertices with high outdegree

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6

Page 23: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

GRATIN

Column

2

2

2

1

3

1

2

3

4

5

B1

B2

Minimal blocksize: 2

2

1

2

1

2

3

Value Blocks

Block ranges

2

1 3

5

• gratin replaces full column scans by block scans

• gratin indexes column with variable block size

• Allows efficient handling of vertices with high outdegree

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6

Page 24: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

GRATIN

Column

2

2

2

1

3

1

2

3

4

5

B1

B2

Minimal blocksize: 2

2

1

2

1

2

3

Value Blocks

Block ranges

2

1 3

5

• gratin replaces full column scans by block scans

• gratin indexes column with variable block size

• Allows efficient handling of vertices with high outdegree

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6

Page 25: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

GRATIN

Column

2

2

2

1

3

1

2

3

4

5

B1

B2

Minimal blocksize: 2

2

1

2

1

2

3

Value Blocks

Block ranges

2

1 3

5

• gratin replaces full column scans by block scans

• gratin indexes column with variable block size

• Allows efficient handling of vertices with high outdegree

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6

Page 26: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Experiments on Static Graphs

ID |V | |E | d̄out

Amazon 0.4 M 3.3 M 16.8California-Roads 1.9 M 2.7 M 2.8

1 2 3 4 5 6 70

5

10

15

20

# of Traversal Iterations

Exe

cutio

nTi

me

(ms)

Amazon

SCAN gratin-512

gratin-4096 gratin-32768

Figure: Comparison for different block sizes.

2 4 6 8 10

10

20

30

40

50

Traversal Iteration

Que

rytim

e(m

s)

Amazon

2 4 6 8 10

5

10

15

20

25

Traversal Iteration

California-Roads

Figure: Query time for scan-based traversal ( ) andgratin-based traversal ( ).

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 7

Page 27: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Handling Updates

Column

21

22

23

14

35

26

Minimal blocksize: 2

B1

B2

B3

2

1

2

3

1

2

3

Value Blocks

Health factor

hB1 = 1.0

hB2 = 1.0hB2 = 0.5

hB3 = 1.0

GRATIN health

h = 1.0h = 0.83

Block ranges

3

2

1 3

5

6

• Health factor describes viability of gratin

• If global health factor below threshold→ rebuild index

• gratin allows updates in constant time (append-only)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8

Page 28: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Handling Updates

Column

21

22

23

14

35

26

Minimal blocksize: 2

B1

B2

B3

2

1

2

3

1

2

3

Value Blocks Health factor

hB1 = 1.0

hB2 = 1.0

hB2 = 0.5

hB3 = 1.0

GRATIN health

h = 1.0

h = 0.83

Block ranges

3

2

1 3

5

6

• Health factor describes viability of gratin

• If global health factor below threshold→ rebuild index

• gratin allows updates in constant time (append-only)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8

Page 29: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Handling Updates

Column

21

22

23

14

35

26

Minimal blocksize: 2

B1

B2

B3

2

1

2

3

1

2

3

Value Blocks Health factor

hB1 = 1.0

hB2 = 1.0

hB2 = 0.5

hB3 = 1.0

GRATIN health

h = 1.0

h = 0.83

Block ranges

3

2

1 3

5

6

• Health factor describes viability of gratin

• If global health factor below threshold→ rebuild index

• gratin allows updates in constant time (append-only)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8

Page 30: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Handling Updates

Column

21

22

23

14

35

26

Minimal blocksize: 2

B1

B2

B3

2

1

2

3

1

2

3

Value Blocks Health factor

hB1 = 1.0

hB2 = 1.0

hB2 = 0.5

hB3 = 1.0

GRATIN health

h = 1.0

h = 0.83

Block ranges

3

2

1 3

5

6

• Health factor describes viability of gratin

• If global health factor below threshold→ rebuild index

• gratin allows updates in constant time (append-only)

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8

Page 31: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Experiments on Dynamic Graphs

0 +20K +20K +20K +20K +20K

1

1.2

1.4

1.6

1.8

2

2.2

∆Batch Insertions

Slo

wdo

wn

Fact

or(

)

Amazon

0

0.2

0.4

0.6

0.8

1

Hea

lthFa

ctor

()

0 +20K +20K +20K +20K +20K

1

1.2

1.4

1.6

∆Batch Insertions

Slo

wdo

wn

Fact

or(

)

California-Roads

0

0.2

0.4

0.6

0.8

1

Hea

lthFa

ctor

()

Figure: Query time for gratin-based traversal on dynamic graphs. Slowdown factor describesthe relative execution time in multiples of the execution time on a static graph.

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 9

Page 32: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Summary

• Tight integration of traversal operatorinto main-memory column store

• gratin is a lightweight secondaryindex structure

• Handles dynamic graphs inpredictable time

• Experiments show a diversespectrum of performanceimprovements

• Performance of gratin depends ongraph topology

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 10

Page 33: GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Contact

Marcus ParadiesPhD Student at Database Technology Group, TU Dresdenhttps://wwwdb.inf.tu-dresden.de/team/external-members/marcus-paradies/

[email protected]

© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 11