Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

30
http://bigdata.com http://mapgraph.io SYSTAP, LLC Graphs Graph Databases Graph Analytics on GPUs SYSTAP™, LLC © 2006-2014 All Rights Reserved 1 9/19/2014

description

I will discuss current research on the MapGraph platform. MapGraph is a new and disruptive technology for ultra-fast processing of large graphs on commodity many-core hardware. On a single GPU you can analyze the bitcoin transaction graph in .35 seconds. With MapGraph on 64 NVIDIA K20 GPUs, you can traverse a scale-free graph of 4.3 billion directed edges in .13 seconds for a throughput of 32 Billion Traversed Edges Per Second (32 GTEPS). I will explain why GPUs are an interesting option for data intensive applications, how we map graphs onto many-core processors, and what the future looks like for the MapGraph platform. MapGraph provides a familiar vertex-centric abstraction, but its GPU acceleration is 100s of times faster than main memory CPU-only technologies and up to 100,000 times faster than graph technologies based on MapReduce or key-value stores such as HBase, Titan, and Accumulo. Learn more at http://MapGraph.io.

Transcript of Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

Page 1: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

SYSTAP, LLC

GraphsGraph Databases

Graph Analytics on GPUs

SYSTAP™, LLC© 2006-2014 All Rights Reserved

19/19/2014

Page 2: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Graphs• This talk is about recent advances in large scale graph processing

on GPUs. – The motivation is extreme performance.– Everything we do (as a company) is focused on graphs.

• Graph Database• Graph processing

• Common characteristics:– irregular data shape, irregular access patterns, and irregular parallelism.

• A lot of data can be mapped onto graphs– Sparse matrices and graphs are very close data structures– Graphs, as we deal with them, have attributes on vertices and edges.

• A lot of algorithms can be mapped onto graphs– Including many machine learning algorithms.

SYSTAP™, LLC© 2006-2014 All Rights Reserved

2http://www.bigdata.com/blog

Page 3: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

SYSTAP, LLC

Graph Database• High performance, Scalable

– 50B edges/node– High level query language– Efficient Graph Traversal– High 9s solution

• Open Source– Subscriptions

GPU Analytics• Extreme Performance

– 5-100x faster than graphlab– 10,000x faster than graphdbs

• DARPA funding• Disruptive technology

– Early adopters– Huge ROIs

• Open Source

Small Business, Founded 2006 100% Employee Owned

• SYSTAP™, LLC• © 2006-2014 All Rights Reserved

39/19/2014

Page 4: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Related “Graph” Technologies

BigdataGraph Query (RDF/SPARQL)

Embedded HASingle Server

Scale-Out

MapGraphGraph Traversal & Mining

Redpoint“Graph Database”

Single GPU

2DCluster

SPARQL

Redpoint repositions existing technology, adding interoperability for blueprints and gremlin.

Scale-Out

MapGraph compares favorably with high end hardware solutions from YARC, Oracle, and SAP, but is open source and uses commodity hardware.

Pair up bigdata and MapGraph

STTR

SYSTAP™, LLC© 2006-2014 All Rights Reserved

49/19/2014

Page 5: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Embedded, Single Server, HA, Scale-out

• RDF/SPARQL• Property graphs

– Blueprints, gremlin, rexter• REST API (NSS)• Extension points

– Stored queries for custom application logic on the server.

– Custom services & indices– Custom functions– Vertex-centric programs

• Embedded Server

• Standalone Server

JVM

Journal

WAR

Journal

SYSTAP™, LLC© 2006-2014 All Rights Reserved

59/19/2014

Page 6: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

High Availability• Shared nothing architecture

– Same data on each node– Coordinate only at commit– Transparent load balancing

• Scaling– 50 billion triples or quads– Query throughput scales linearly

• Self healing– Automatic failover– Automatic resync after disconnect– Online single node disaster recovery

• Online Backup– Online snapshots (full backups)– HA Logs (incremental backups)

• Point in time recovery (offline)

HAService

Quorumk=3

size=3

follower

leader

HAService

HAService

SYSTAP™, LLC© 2006-2014 All Rights Reserved

69/19/2014

Page 7: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Embedded, Single Server, HA, Scale-out

Distributed Index Management and Query

RDF Data and SPARQL Query

Managem

ent Functions

Client Service

Registrar

Data Service

Client Service

Client Service

Data Service Data Service Data Service

Data Service Data Service Data Service

Zookeeper

Shard Locator

Transaction Mgr

Load Balancer

Unified API

ApplicationClient

ApplicationClient

ApplicationClient

ApplicationClient

ApplicationClient

Client Service

SPARQL XMLSPARQL JSON

RDF/XMLN-TriplesN-Quads

TurtleTriG

RDF/JSON

SYSTAP™, LLC© 2006-2014 All Rights Reserved

79/19/2014

Page 8: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

And now on GPUs

SYSTAP™, LLC© 2006-2014 All Rights Reserved

89/19/2014

Page 9: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Similar models, different problems• Graph query and graph analytics (traversal/mining)

– Related data models– Very different computational requirements

• Many technologies are a bad match or limited solution– Key-value stores (bigtable, Accumulo, Cassandra, HBase)– Map-reduce

• Anti-pattern– Dump all data into “big bucket”

SYSTAP™, LLC© 2006-2014 All Rights Reserved

99/19/2014

Page 10: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Similar models, different problems• Graph query and graph analytics (traversal/mining)

– Related data models– Very different computational requirements

• Many technologies are a bad match or limited solution– Key-value stores (bigtable, Accumulo, Cassandra, HBase)– Map-reduce

• Anti-pattern– Dump all data into “big bucket”

Storage and computation patterns must be correctly matched for high performance.

SYSTAP™, LLC© 2006-2014 All Rights Reserved

109/19/2014

Page 11: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Optimize for the right problem• Graph analytics

– Parallelism – work must be distributed and balanced.– Memory bandwidth – memory, not disk, is the bottleneck– 2D partitioning – O(log(N)) communications pattern (versus O(N*N))

• 1D design looses locality when updating link weights for reverse indices.

• Storage and computation patterns must be correctly matched for high performance.

BFS PR

SYSTAP™, LLC© 2006-2014 All Rights Reserved

119/19/2014

Page 12: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

• Graphs are a hard problem• Non-locality• Data dependent parallelism• Memory, PCIe bus and network are

bottlenecks• Recent performance gains driven by

innovations in bottom-up search, data layout, and partitioning.

• GPUs deliver effective parallelism• 10x CPU FLOPS• 10x CPU/RAM bandwidth

• Significant speeds up over CPU• 3 GTEPS on one GPU• 32 GTEPS on 64 GPU cluster

GPUs – A Game Changer for Graph Analytics

1 10 100 1000 10000 1000000

500

1000

1500

2000

2500

3000

3500NVIDIA Tesla C2050 Multicore per socketSequential

Average Traversal Depth

Mill

ion

Trav

erse

d Ed

ges

per

Seco

nd

0

1 12

1

1

2

22

2

1

3

2

3

2

1 2

2

Breadth-First Search on Graphs10x Speedup on GPUs

Page 13: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

GPU Hardware Trends

• K40 GPU (today)• 12G RAM/GPU• 288 GB/s bandwidth• PCIe Gen 3

• Pascal GPU (Q1 2016)• 24G RAM/GPU• 1 TB/s bandwidth• Unified memory

across CPU, GPUs

SYSTAP™, LLC© 2006-2014 All Rights Reserved

139/19/2014

Page 14: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Full Bandwidth Access to CPU RAM

SYSTAP™, LLC© 2006-2014 All Rights Reserved

149/19/2014

Page 15: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Architecture shapes performance• The data was a scale-free graph with 2.7M vertices and 5.6M

– MapGraph used a larger version of the graph (24M vertices, 25M edges)• The query was a 5-degree subgraph (depth-limited BFS)• Two main takeaways

– Horizontal scaling for titan is very expensive – wrong abstraction.– GPUs are ridiculously fast.

platform load (s) query (ms) commentstitan 497.00 935 4 node cluster using Cassandraneo4j 608.00 668 single node community editionbigdata 396.00 281 single node (open source)MapGraph 0.08 27 NVIDIA K20 GPU

SYSTAP™, LLC© 2006-2014 All Rights Reserved

159/19/2014

Page 16: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

MapGraph

Graph Processing on GPUs

http://MapGraph.io

SYSTAP™, LLC© 2006-2014 All Rights Reserved

169/19/2014

Page 17: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Think Like a Vertex• Simple APIs

pageRank(Message m) { total = m.value(); vertex.val = .15 * .85 + total; for(nbr : out_neighbors) { SendMsg(nbr, vertex.val/num_out_nbrs); }}

• Lots of algorithms– BFS, SSSP, Page Rank, Connected Components, Louvain Modularity,

Jaccard Distance, k-means clustering, Betweenness-Centrality, Personalized Page Rank, Loopy Belief Propagation, Graph search (crisp and approximate), etc.

SYSTAP™, LLC© 2006-2014 All Rights Reserved

179/19/2014

Page 18: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

GAS – a Graph-Parallel Abstraction• Graph-Parallel Vertex-Centric API ala GraphLab• “Think like a vertex”

• Gather: collect information about my neighborhood

• Apply: update my value

• Scatter: signal adjacent vertices• Can write all sorts of graph algorithms this way

– BFS, PageRank, Connected Component, Triangle Counting, Max Flow, etc.

SYSTAP™, LLC© 2006-2014 All Rights Reserved

189/19/2014

Page 19: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

MapGraph

• High-level graph processing framework• High programmability GPU architecture Optimization techniques CUDA• High performance Comparable to low-level approach

SYSTAP™, LLC© 2006-2014 All Rights Reserved

199/19/2014

Page 20: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

MapGraph

• High-level graph processing framework• High programmability GPU architecture Optimization techniques CUDA• High performance Comparable to low-level approach

SYSTAP™, LLC© 2006-2014 All Rights Reserved

209/19/2014

Page 21: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Single GPU MapGraph (BFS)Dataset #vertices #edges Max Degree Milliseconds

Webbase 1,000,005 3,105,536 23 1.2Delaunay 2,097,152 6,291,408 4,700 24.5

Bitcoin 6,297,539 28,143,065 4,075,472 345.3Wiki 3,566,907 45,030,389 7,061 51.0Kron 1,048,576 89,239,674 131,505 47.7

Webbase Delaunay Bitcoin Wiki Kron0

200

400

600

800

1,000

1,200

1,400

1,600

1,800

2,000

154.0

513.6

74.8

821.3

1870.9

MTE

PS

• SYSTAP™, LLC• © 2006-2014 All Rights Reserved

219/19/2014

Page 22: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

BFS Results : MapGraph vs GraphLab

Webbase Delaunay Bitcoin Wiki Kron 0.10

1.00

10.00

100.00

1,000.00

MapGraph Speedup vs GraphLab (BFS)

GL-2GL-4GL-8GL-12MPG

Spee

dup

SYSTAP™, LLC© 2006-2014 All Rights Reserved

229/19/2014

Page 23: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

PageRank : MapGraph vs GraphLab

Webbase Delaunay Bitcoin Wiki Kron 0.10

1.00

10.00

100.00

MapGraph Speedup vs GraphLab (Page Rank)

GL-2GL-4GL-8GL-12MPG

Spee

dup

SYSTAP™, LLC© 2006-2014 All Rights Reserved

239/19/2014

Page 24: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Graph Mining on GPU Clusters

• 2D partitioning (aka vertex cuts)• Minimizes the communication volume.• Batch parallel Gather in row, Scatter in

column.

SYSTAP™, LLC© 2006-2014 All Rights Reserved

249/19/2014

Page 25: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Accelerated Graph Analytics

SYSTAP™, LLC© 2006-2014 All Rights Reserved

259/19/2014

Page 26: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Scale 25 Traversal• Work spans multiple orders of magnitude.

SYSTAP™, LLC© 2006-2014 All Rights Reserved

269/19/2014

Page 27: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Strong Scaling• Speedup on a constant problem size with more GPUs• Problem scale 25

– 2^25 vertices (33,554,432)– 2^26 directed edges (1,073,741,824)

Strong scalingGPUs GTEPS Time (s)

16 14.3 0.07525 16.4 0.06636 18.1 0.05964 22.7 0.047

SYSTAP™, LLC© 2006-2014 All Rights Reserved

279/19/2014

Page 28: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Weak Scaling• Scaling the problem size with more GPUs

Weak scaling

GPUs Scale Vertices Edges Time (s) GTEPS

1 21 2,097,152 67,108,864 0.0254 3

4 23 8,388,608 268,435,456 0.0429 6

16 25 33,554,432 1,073,741,824 0.0715 15

64 27 134,217,728 4,294,967,296 0.1478 29

SYSTAP™, LLC© 2006-2014 All Rights Reserved

289/19/2014

Page 29: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Highlights• For algorithms on large graphs

– Memory is the bottleneck• CPUs quickly saturate the memory bus.• CPU cache thrashing limits scaling for graph traversal.• Continued performance gains for CPUs focus on reducing the #of visited edges to reduce

bandwidth.• Hybrid CPU/GPU architectures offload either small degree vertices (reduce cache thrashing) or

high degree vertices (if the algorithm is FLOPS bound on the CPU, e.g., BC)– Many core is the future. – GPUs are primarily known for their FLOPS, but they have high memory bandwidth and can

deliver effective parallelism on parallel graph problems (with sophisticated kernels).• Scaling to very large graphs on large compute clusters

– Communications bound. • Communication must be constant for perfect scaling

– Hybrid partitioning seeks to reduce #of messages, size of messages, and optimize for asynchronous communications and degree-aware layouts for bottom-up search to reduce memory bandwidth.

SYSTAP™, LLC© 2006-2014 All Rights Reserved

29http://www.bigdata.com/blog

Page 30: Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL

http://bigdata.com http://mapgraph.io

Bryan ThompsonSYSTAP, LLC

[email protected]

http://bigdata.com http://mapgraph.io

SYSTAP™, LLC© 2006-2014 All Rights Reserved

309/19/2014