Hornet: An Efficient Data Structure for Dynamic Sparse...

27
Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices Oded Green

Transcript of Hornet: An Efficient Data Structure for Dynamic Sparse...

Hornet: An Efficient Data Structure for

Dynamic Sparse Graphs and Matrices

Oded Green

Hornet

• A scalable and dynamic data structure for

– Sparse data

– Graph algorithms

– Linear algebra based problems

• Formerly known as cuSTINGER

– Hornet initialization is hundreds of times faster

– Hornet updates are 4X-10X faster

– The Hornet data structure offers is more robust and

scalable than cuSTINGER.

• Essentially a dynamic CSR data structure

• Easy to use

Oded Green, GTC-18 2

“Separation of powers”

• Dynamic graph data structure and dynamic

graph algorithms are in two different

repositories

– Easy to integrate with external library

– Can also be used with matrices

• This talk focuses on the data structure

Oded Green, GTC-18 3

Graph Primitives – Upfront summary

• Great performance for static and dynamic

graph algorithms

• Scalable

• Simple to use

• Will discuss algorithm framework later today

– 1:00pm

– Same room as this talk

Oded Green, GTC-18 4

Hornet – Upfront Summary

• Can support over 150 million updates per second

• Can easily scale to graphs with billions of vertices

• CSR comparison

– Initializing is also relatively in-expensive – usually less than

3X slower

– Hornet requires 30% more storage

– Identical performance

• COO (edge-list) comparison

– Hornet requires 20% less storage

– Hornet has better locality

Oded Green, GTC-18 5

Big Data problems

need Graph Analysis

Communication networks:

• World-wide connectivity

• High velocity changes

• Different types of extracted

data:– Physical communication network.

– Person-to-person communication

network.

Oded Green, GTC-18 6

Health-Care networks:

• Various players.

• Pattern matching and

epidemic monitoring.

• Problem sizes have

doubled in last 5 years.

Financial networks:

• Transactions between

players.

• Different transactions

types (property graph)

Hornet Properties

✓ A Simple programming model

✓ Enable algorithm designers to implement dynamic & streaming graph

algorithms with ease.

✓ Can easily grows 1000X initial size (no restart needed)

✓ Millions of updates per second to graph

✓Updates are not bottlenecks for analytics.

✓ Automated data management

✓ Transfers data between host and device automatically

✓ Reduces fragmentation

✓ Supports memory reclamation

• Scalable data structurecuSTINGER paper: [Green&Bader; HPEC, 2016]:

cuSTINGER: Supporting dynamic graph algorithms for GPUs

Oded Green, GTC-18 7

Definitions

• Dynamic graphs

– Graph can change over time.

– Changes can be to topology, edges, or vertices.

• For example new edges between two vertices.

– Changes to edge or vertex weights

• Streaming graphs:

– Graphs changing at high rates.

– 100s of thousands of updates per second.

• Dynamic matrices

– Adding a perturbation to the matrix

Oded Green, GTC-18 8

Dynamic graph example

• Only a subset of the entire

graph…

• Dynamic:

– At time 𝑡:• 𝑣 and 𝑤 become friends.

• 𝑖𝑛𝑠𝑒𝑟𝑡_𝑒𝑑𝑔𝑒 (𝑣, 𝑤)

– At time Ƹ𝑡:• 𝑢 and 𝑣 no longer friends

• d𝑒𝑙𝑒𝑡𝑒𝑒𝑑𝑔𝑒 𝑢,𝑣

• Additional operations

include vertex insertions &

deletions

Oded Green, GTC-18 9

𝑣𝑢

𝑤

Widely used graph data structures

10

Names Pros ConsDense Adjacency

Matrix

• Supports updates • Poor locality

• Massive storage

requirements

Linked lists • Flexible • Poor locality

• Limited parallelism

• Allocation time is costly

COO (Edge list) -

unsorted

• Has some flexibility

• Updates are simple

• Lots of parallelism

• Poor locality

• Stores both the source and

destination

CSR • Uses exact amount of

memory

• Good locality

• Lots of parallelism

• Inflexible

These data structures don’t cut itOded Green, GTC-18

Compressed Sparse Row (CSR)

Pros:

• Uses precise storage

requirements

• Great locality

– Good for GPUs

• Handful of arrays

– Simple to use and

manage

Cons:

• Inflexible.

• Network growth

unsupported

• Topology changes

unsupported

• Property graphs not

supportedOded Green, GTC-18 11

0 1 2 3 4 5 6 7

0 2 4 7 9 11 13 14 14

Src/Row

Offset

1 2 0 5 0 3 4 2 6 2 5 1 4 3

2 5 2 7 4 1 4 1 2 4 1 7 1 2

Dest./Col.

Value

Hornet – A High Level View

• Every vertex points at its own array

• Many edges array (blocks)

• Block size is determined by the number of neighbors (always powers of 2)

• Extra space left at the end of the blockOded Green, GTC-18 12

1 2

2 5

0 5

2 7

0 3 4

4 1 4

2 6

1 2

2 5

4 1

1 4

7 1

3

2

Over-allocated space

Dest./Col.

Value

0 1 2 3 4 5 6 7

2 2 3 2 2 2 1 0

Vertex Id

Used

Pointer

USER-INTERFACE

Hornet – Property Graph Support

Oded Green, GTC-18 13

0 1 2 3 4 5 6 7

2 2 3 2 2 2 1 0

Vertex Id

Used

Pointer

1 2

2 5

0 5

2 7

0 3 4

4 1 4

2 6

1 2

2 5

4 1

1 4

7 1

3

2

USER-INTERFACE

Dest./Col.

Weight

Type

Time 1

User 1

User 2

….

• Programmers can add fields per edge

• Easy to mange for static graph data structures

• Hornet manages the data movement

Hornet in Detail

14Oded Green, GTC-18

1 00 1 1 10 0 0 00 1 1 1 1 1 1 1

0 1 2 3 4 5 6 7

2 2 3 2 2 2 1 0

Vertex Id

Used (#Neighbors/nnz)

Pointer

1 2

5 2

0 5

5 7

0 3 4

2 1 4

2 6

1 2

2 5

4 1

1 4

7 1

3

2

𝑩𝑨𝟎,𝟏 𝑩𝑨𝟏,𝟏 𝑩𝑨𝟏,𝟐 𝑩𝑨𝟐,𝟏

Bit status

Over-allocated spacefor vertex insertions

USER-INTERFACE

Dest./Col.

Weight

MEMORY MANAGER

bsi

ze=1

bsi

ze=2

bsi

ze=2

bsi

ze=4

Vec-Tree

Over-allocated spacefor power-of-two rule

Hornet Performance

• Memory Utilization

– Independent of the GPU being used

• Initialization overhead

• Update rate

15Oded Green, GTC-18

Hornet Performance Analysis

• All performance analysis is for the P100

– 56 SMs

– 3584 SPs

– 16GB HBM2 memory

Oded Green, GTC-18 16

Inputs Graphs

• DIMACS 10 Graph Implementation Challenge

• SNAP – Stanford Network Analysis Project

• Florida Matrix Collection

The following is only a subset of these graphs:

Oded Green, GTC-18 17

Name Type |𝑽| |𝑬|* Source

𝑐𝑜𝐴𝑢𝑡ℎ𝑜𝑟𝑠𝐷𝐵𝐿𝑃 Collaboration 299𝑘 1.95𝑀 DIMACS

𝑎𝑠 − 𝑠𝑘𝑖𝑡𝑡𝑒𝑟 Trace route 1.69𝑀 11.1𝑀 SNAP

𝑘𝑟𝑜𝑛_21 Random 2𝑀 201𝑀 DIMACS

𝑐𝑖𝑡 − 𝑝𝑎𝑡𝑒𝑛𝑡𝑠 Citation 3.77𝑀 16.5𝑀 SNAP

𝑐𝑎𝑔𝑒15 Matrix 5.15𝑀 94𝑀 DIMACS

𝑢𝑘 − 2002 Webcrawl 18.52𝑀 523𝑀 DIMACS

Memory Utilization - Overall

• BlockArrays of size 216

• 70% average utilization of CSR

• Better utilization then: COO, cuSTINGER, AIM

– AIM allocates all GPU memory

Oded Green, GTC-18 18

0%

20%

40%

60%

80%

100%

Spac

e E

ffic

ien

cy

Hornet COO cuSTINGER AIM216

Initialization overhead

• Time to initialize data structure in comparison to CSR

• In most cases 2X-3X slower

– One time penalty

• Much faster than cuSTINGER

Oded Green, GTC-18 19

1

10

100

1,000

Slo

wd

ow

n v

ers

us

CSR

Hornet cuSTINGER

Insertion Rates

• Supports over 150M updates per second

• Hornet

– 4𝑋 − 10𝑋 faster than cuSTINGER

– Does not have 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑑𝑖𝑝 like cuSTINGER

• Scalable growth in update rate

Oded Green, GTC-18 20

cuSTINGER Hornet

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

Up

dat

e R

ate

(ed

ges

per

se

con

d)

in-2004 soc-LiveJournal1 cage15 kron_g500-logn21

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

Up

dat

e R

ate

(ed

ges

per

se

con

d)

in-2004 soc-LiveJournal1 cage15 kron_g500-logn21

103

104

105

106

107

108

109

103

104

105

106

107

108

109

Take away

• Anything you can do with CSR you can also do with

Hornet (other way is not true)

• Supports high update rates

• Scalable in both data size and in performance

• Simple and high-level programming model

– See you at 1:00pm

• Also, look for James Fox’s talk on a cool algorithm

for finding the maximal K-Truss in a graph– Uses dynamic triangle counting and the Hornet’s deletion…

Oded Green, GTC-18 21

Hornet Team (Current & Alumni)

Oded Green, GTC-18 22

Thank you

Oded Green, GTC-18 23

•Email: [email protected]

• Hornet:

– https://github.com/hornet-gt/hornet

• HornetsNest:

– https://github.com/hornet-gt/hornetsnest

Backup slides

24Oded Green, GTC-18

Memory Utilization - Overall

• 70% average utilization of CSR

• Better utilization in comparison to: COO,

cuSTINGER, AIMS

Oded Green, GTC-18 25

0%

20%

40%

60%

80%

100%

Spac

e E

ffic

ien

cy

Hornet Hornet Hornet COO cuSTINGER AIM216 222218

Part 2: HornetsNest

• Algorithm framework for Hornet data

structure

– We support CSR as well

• All algorithms are implemented using a small

set of operations

– We show that these operators are efficient for

static graph algorithms and can be used for

dynamic graph algorithms

• Uses features from C++11 and C++14

Oded Green, GTC-18 26

Sparse Matrix Vector Multiplication

• In comparison to DCSR [King et al; 2016; ISC]

– DCSR requires customized SpMV

• Hornet uses identical algorithm code as CSR.

27Oded Green, GTC-18

1

10

100

Spee

du

p v

ers

us

DC

SR

CSR Hornet