Cost Efficient Large-Scale Graph Analytics · 2015. 3. 17. · of graph algorithms and flexibility...
Transcript of Cost Efficient Large-Scale Graph Analytics · 2015. 3. 17. · of graph algorithms and flexibility...
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
Cost Efficient Large-Scale Graph Analytics
Dr. Joseph Schneible
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
2
Applications of Graph Analysis
■ Social Networks
■ WWW
■ Medicine
■ Natural Language
■ Cybersecurity
■ Homeland Security
■ Local Government
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
OUTLINE:
Performance
System Design
Graph Analysis
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
OUTLINE:
Performance
System Design
Graph Analysis
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
5
How to bring meaning to this?
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
6
Example Algorithms
PageRank Find Influential Nodes Within a Network
Community Detection Find Dense Sub-graphs
Belief Propagation Perform Inference on a Graph
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
7
PageRank
■ PageRank trends linearly with degree
■ Anomalous nodes are above this trend line
■ Used to find mastermind of 9/11 attacks
■ Can be applied to biological networks, etc
Degree
Page
Ran
k
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
OUTLINE:
Performance
System Design
Graph Analysis
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
9
Goals
Affordability
Time Efficiency
Customizability
Meaning
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
10
System Approach
■ All of these are affected by design choices:
GPU Utilization
Memory Efficiency
Commodity Hardware
I/O Efficiency
Graph Construction
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
11
Challenges
Parallelization Memory Limitations
Irregular Graph Structure
Edges or Vertices? GB of RAM and vRAM
TB Graphs
Many Nodes with Few Connections
Few Nodes with Many Connections
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
12
Parallelization Strategies
■ Edge-wise Distribution: One Operation per Edge
Memory for Temporary Data Structures
Even Load Balance
■ Vertex-wise Distribution:
Multiple Operations per Vertex
Uneven Load Balance
1
27
5
4
6
3
8
Threads
432 651 87
E
D
A
C
B1
27
5
4
6
3
8
Threads
432
6
51
8
7
A B C D E
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
13
Load Balancing Graph Analysis on the GPU
■ High degree vertices will dominate computation time
■ Created multiple kernels
■ Threshold between high and low degree
Load Balancing
Time
v4v3
v5
v2
v6
v8 v10v9
v7
e3e2
e4
e1
e5
e7 e9e8
e6
v2 – v10v1
Multiple Kernels
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
14
Out-of-Core Graph Processing
■ Divide graph into intervals (sets) of vertices
■ Gather associated in-edges into a shard
■ Order edges in shards such that out-edges are located together in windows
Interval 4Interval 3Interval 2Interval 1
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
15
Compression
■ Power-law distribution is common in natural graphs
■ Compression scheme exploits distribution
Destination Vertices So
urc
e V
erti
ces
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
16
Task Analysis
■ Use: Graph Meta-data
Performance Models
Micro-benchmarks
■ To: Divide work between CPU and GPU
Divide work between kernels
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
OUTLINE:
Performance
System Design
Graph Analysis
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
18
PageRank: 5 Iterations
LiveJournal Graph: Vertices: 4.6 Million
Edges: 77.4 Million
PageRank Performance
FUNL Desktop System: GPU — GeForce GTX TITAN, 2688 CUDA
Cores, 928MHz, 6GB vRAM
CPU — Core i7, Quad Core, 3.40GHz
RAM — 16GB (4x4GB), 1333MHz
Storage — HDD, 180MB/s
Spark Cluster: System: AWS EC2 m1.large
Number of Nodes: 10
Network: Moderate Performance
9.5 Seconds
110.4 Seconds
FUNL Desktop System 9.5 Seconds
Spark Cluster 110.4 Seconds
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
19
Belief Propagation Performance
FUNL/Quad Core CPU: GPU — GeForce GTX TITAN, 2688 CUDA
Cores, 928MHz, 6GB vRAM
CPU — Core i7, Quad Core, 3.40GHz
RAM — 16GB (4x4GB), 1333MHz
Storage — HDD, 180MB/s
16 Core Server: CPU — Xeon E5-2690, 16 Cores, 2.9GHz
RAM — 64GB, 1600MHz
Storage — HDDx6, RAID0, 690MB/s
- https://github.com/GraphChi
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
Additional Slides
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
21
Advantages
■ Unique Features: Large scale Graph Analysis on the GPU Task Analysis for Efficient Parallel Processing (on CPU’s and GPU’s) UI and Interactive Visualization to bring Meaning to Big Data
■ Benefits: Big Data Graph Analysis on a Budget Customizability Ease of Use (You don’t have to be a data scientist) Reduction in Infrastructure and Energy Needs
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
22
Data-center-friendly appliance with a suite of graph algorithms and flexibility to add custom solutions.
Purpose-built to solve big data graph problems with commodity hardware.
Graph analytics solution that supports pattern discovery and inferencing on large scale data sets.
Big Data Appliance for Graph Analytics
Gain insight by discovering unknown relationships in big data.
Achieve a competitive advantage without a large budget.
Ease adoption with a small footprint solution and customizability.
-
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.
23
Point of Contact
Joe Schneible
Enterprise Software Solutions Engineering Group Manager Email: [email protected]
LinkedIn: linkedin.com/in/jschneible
Technica Corporation 22970 Indian Creek Dr., Suite 500 Dulles, VA 20166
703.662.2000 technicacorp.com