The Energy Case for Graph Processing on Hybrid Platforms
description
Transcript of The Energy Case for Graph Processing on Hybrid Platforms
![Page 1: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/1.jpg)
The Energy Case for Graph Processing on Hybrid
Platforms
Abdullah Gharaibeh, Lauro Beltrão Costa, Elizeu Santos-Neto
and Matei Ripeanu
NetSysLabThe University of British Columbia
http://netsyslab.ece.ubc.ca
![Page 2: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/2.jpg)
2
Graphs are Everywhere
1.4B pages, 6.6B links
1B users150B friendships
![Page 3: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/3.jpg)
3
Challenges and Opportunities
Data-dependent memory access
patterns
Caches + summary data structures
Large memory footprint as large as 1TB
CPUs
Poor locality
Low compute-to-memory access
ratio
Varying degrees of parallelism(both intra- and inter- stage)
![Page 4: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/4.jpg)
4
Challenges and Opportunities
Data-dependent memory access
patterns
Caches + summary data structures
Large memory footprint as large as 1TB
CPUs
Poor locality
Assemble a hybrid platform
Massive hardware multithreading
up to 12GB!
GPUs
Low compute-to-memory access
ratio
Caches + summary data structures
Varying degrees of parallelism(both intra- and inter- stage)
![Page 5: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/5.jpg)
55
Performance Modeling Predicts speedup Intuitive
Totem A graph processing engine for hybrid
systems Applies algorithm-agnostic optimizations
Partitioning Strategies Workload to processor matchmaking
Past Work
A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing, Gharaibeh et al., PACT 2012
On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest, Gharaibeh et al., IPDPS 2013
Main outcome: hybrid platforms enable significant performance gains
![Page 6: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/6.jpg)
66
Performance Modeling Predicts speedup Intuitive
Totem A graph processing engine for hybrid
systems Applies algorithm-agnostic optimizations
Partitioning Strategies Workload to processor matchmaking
Past Work
Main outcome: hybrid platforms enable significant performance gains
A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing, Gharaibeh et al., PACT 2012
On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest, Gharaibeh et al., IPDPS 2013
Focused on time to solution
as a success metric
![Page 7: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/7.jpg)
7
Motivating Question
Is it energy efficient to use GPU-accelerated platforms for large-scale graph processing?
![Page 8: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/8.jpg)
Evaluation Platform
CharacteristicSandyBridge(Xeon 2650)
Kepler (K20)
Hardware Threads / Proc. 16 2496
Frequency / Core (MHz) 2000 705
LLC / Proc. (MB) 20 2
Main Memory / Proc. (GB) 256 5
TDP / Proc. (Watts) 95 225
8
S = CPU Socket G = GPU
GPU has double TDP
Power is measured at the wall AC outlet
![Page 9: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/9.jpg)
Evaluation Platform
CharacteristicSandyBridge(Xeon 2650)
Kepler (K20)
Hardware Threads / Proc. 16 2496
Frequency / Core (MHz) 2000 705
LLC / Proc. (MB) 20 2
Main Memory / Proc. (GB) 256 5
TDP / Proc. (Watts) 95 225
9
S = CPU Socket G = GPU
High idle power is due to large DRAM space
The CPU is relatively power efficient at peak utilization!
At peak utilization, RAM consumes as much as
dual CPUs!
GPUs are power hungry at peak
utilization
Power is measured at the wall AC outlet
![Page 10: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/10.jpg)
10
Challenges and Opportunities
GPU draws significant amount of powerThe workload is Irregular and memory-bound
GPU has low idle power (25W)
Offloading to GPU enables faster “race-to-idle”
![Page 11: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/11.jpg)
Evaluation Study WorkloadsReal and syntheticLarge: can not fit on GPU
memory
1111
Workload |V| |E|Twitter 41M 1.5B
UK-Web 105M 3.7BRMAT27 128M 2.0BRMAT28 256M 4.0BRMAT29 512M 8.0BRMAT30 1,024M 16.0B
BenchmarksBreadth-First Search (BFS)PageRank
MetricsRaw performance (TEPS)Raw power (Watts)Power normalized by processing rate
(Watts/TEPS)
![Page 12: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/12.jpg)
GPU-offloading is useful for large graphs
Raw Performance
12S = CPU Socket G = GPU
1S1G > 2SPerformance scales with more processors
![Page 13: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/13.jpg)
Power Consumption
13
BFS
1S1G ≤ 2S!
![Page 14: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/14.jpg)
Power Consumption
14
BFS
load imbalance more variability
![Page 15: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/15.jpg)
Power Consumption
15
BFS PageRank
PageRank draws more power
![Page 16: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/16.jpg)
Normalizing by Processing Rate
16In most cases, 1S1G > 2S
![Page 17: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/17.jpg)
Normalizing by Processing Rate
17
Energy efficiency scales with more processors
![Page 18: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/18.jpg)
Normalizing by Processing Rate
18Similar results on PageRank
![Page 19: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/19.jpg)
Conclusions A hybrid configuration is more
energy and power efficient than a symmetric one
A “race-to-idle” strategy leads to better energy efficiency
RAM is a major power consuming component
19
![Page 20: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/20.jpg)
20
Questions
code@: netsyslab.ece.ubc.ca
![Page 21: The Energy Case for Graph Processing on Hybrid Platforms](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816662550346895dd9efff/html5/thumbnails/21.jpg)
Energy-Delay Product (EDP)
21Higher relative advantage