GraphP: Reducing Communication for PIM-based Graph...
Transcript of GraphP: Reducing Communication for PIM-based Graph...
GraphP: Reducing Communication for PIM-based
Graph Processingwith Efficient Data Partition
Mingxing Zhang, Youwei Zhuo (equal contribution),
Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen,
Christos Kozyrakis, Xuehai Qian
Tsinghua University
University of Southern California
Stanford University
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Outline
• Motivation
• Graph applications• Processing-In-Memory• The drawbacks of the current solution
• GraphP
• Evaluation
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Graph Applications
• Social network analytics
• Recommendation system
• Bioinformatics
• …
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Challenges
• High bandwidth requirement
• Small amount of computation per vertex• Data movement overhead
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Challenges
• High bandwidth requirement
• Small amount of computation per vertex• Data movement overhead
mem
comp
L1
L3
L2
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
PIM: Processing-In-Memory
• Idea: Computation logic inside memory
• Advantage: High memory bandwidth
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
PIM: Processing-In-Memory
• Idea: Computation logic inside memory
• Advantage: High memory bandwidth
• Example: Hybrid Memory Cubes (HMC)
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
PIM: Processing-In-Memory
• Idea: Computation logic inside memory
• Advantage: High memory bandwidth
• Example: Hybrid Memory Cubes (HMC)
comp
320GB/s intra-cube
4x120GB/sinter-cube
mem
mem
mem
mem
…..
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
HMC: Hybrid Memory Cubes
320
Intra-cube
bandwidth(GB/s)
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
HMC: Hybrid Memory Cubes
320
120
Intra-cube
Inter-cube
bandwidth(GB/s)
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
HMC: Hybrid Memory Cubes
320
120
Intra-cube
Inter-cube
bandwidth(GB/s)
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
HMC: Hybrid Memory Cubes
320
120
Intra-cube
Inter-cube
bandwidth(GB/s)
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
HMC: Hybrid Memory Cubes
320
120 120
Intra-cube
Inter-cube
Inter-group
bandwidth(GB/s)
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
HMC: Hybrid Memory Cubes
320
120 120
Intra-cube
Inter-cube
Inter-group
bandwidth(GB/s)
Bottleneck: Inter-cube communication
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Outline
• Motivation
• Graph applications• Processing-In-Memory• The drawbacks of the current solution
• GraphP
• Evaluation
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-in-
memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
Current Solution: Tesseract
• First PIM-based graph processing architecture
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-in-
memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
Current Solution: Tesseract
• First PIM-based graph processing architecture
• Programming model
• Vertex program
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-in-
memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
Current Solution: Tesseract
• First PIM-based graph processing architecture
• Programming model
• Vertex program
• Partition
• Based on vertex program
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
PageRank in Vertex Programfor (v: vertices) {
}
update = 0.85 * v.rank / v.out_degree;
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
PageRank in Vertex Programfor (v: vertices) {
for (w: edges.destination) {
}
}
update = 0.85 * v.rank / v.out_degree;
put(w.id, function{ w.next_rank += update; });
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
PageRank in Vertex Programfor (v: vertices) {
for (w: edges.destination) {
}
}
update = 0.85 * v.rank / v.out_degree;
put(w.id, function{ w.next_rank += update; });
barrier();
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Graph Partition
0
3 4 5
21
0 21
3 54
hmc0
hmc1
1 vertex
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Graph Partition
0
3 4 5
21
0 21
3 54
hmc0
hmc1
1 vertex intraedge
interedge
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Graph Partition
0
3 4 5
21
0 21
3 54
hmc0
hmc1
put(w.id, function{ w.next_rank += update; });
1 vertex intraedge
interedge
comm
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Graph Partition
0
3 4 5
21
0 21
3 54
hmc0
hmc1
1 vertex intraedge
interedge
comm
communication = # of cross-cube edges
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Drawback of Tesseract
• Excessive data communication
• Why?
Programming
Model
Graph
Partition
Data Communication
Tesseract
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Drawback of Tesseract
• Excessive data communication
• Why?
Programming
Model
Graph
Partition
Data Communication
Tesseract ?
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Drawback of Tesseract
• Excessive data communication
• Why?
Programming
Model
Graph
Partition
Data Communication
Tesseract ?
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Drawback of Tesseract
• Excessive data communication
• Why?
Programming
Model
Graph
Partition
Data Communication
Tesseract ?
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Outline
• Motivation
• GraphP
• Evaluation
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
GraphP
• Consider graph partition first.
• Graph Partition
• Source-Cut
• Programming model
• Two-phase vertex program
• Reduces inter-cube communication
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Source-Cut Partition
0
3 4 5
21 0 21
3 54
1 vertex
hmc0
hmc1
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Source-Cut Partition
0
3 4 5
21 0 21
3 54
1 vertex intraedge
interedge
hmc0
hmc1
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Source-Cut Partition
0
3 4 5
21 0 21
3 54
1 vertex intraedge
interedge
22 replica
hmc0
hmc1
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Source-Cut Partition
0
3 4 5
21 0 21
3 54
1 vertex intraedge
interedge
22 replica
hmc0
hmc1
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Source-Cut Partition
0
3 4 5
21 0 21
3 54
1 vertex intraedge
interedge
22 replica
hmc0
hmc1
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Two-Phase Vertex Program
for (r: replicas) {
}
r.next_rank = 0.85 * r.next_rank / r.out_degree;
2
3 4 5
//apply updates from previous iterations
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Two-Phase Vertex Program
for (r: replicas) {
}
r.next_rank = 0.85 * r.next_rank / r.out_degree;
2
3 4 5
//apply updates from previous iterations
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Two-Phase Vertex Program
2
3 4 5
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Two-Phase Vertex Programfor (v: vertices) {
for (u: edges.sources) {
}
2
3 4 5
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Two-Phase Vertex Programfor (v: vertices) {
for (u: edges.sources) {
}
update += u.rank;
2
3 4 5
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Two-Phase Vertex Programfor (v: vertices) {
for (u: edges.sources) {
}
update += u.rank;
2
3 4 5
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
for (r: replicas) {
}
}
barrier();
put(r.id, function { r.next_rank = update});
20
3 4 5
4
Two-Phase Vertex Program
3
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Benefits
• Strictly less data communication
• Enables architecture optimizations
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Less Communication
2
54
Tesseract GraphP
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Less Communication
2
54
2
54
2
Tesseract GraphP
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Broadcast Optimization
for (r: replicas) {
}
}
barrier();
put(r.id, function { r.next_rank = update});
broadcast4
44 4
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Naïve Broadcast
• 15 point to point messages
src
dst dst
dst dst
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Hierarchical communication
• 3 intergroup messages
src
dst dst
dst dst
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Other Optimizations
• Computation/communication overlap
• Leveraging low-power state of SerDes
Please see the paper for more details
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Outline
• Motivation
• GraphP
• Evaluation
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Evaluation Methodology
• Simulation Infrastructure
• zSim with HMC support• ORION for NOC Energy modeling
• Configurations
• Same as Tesseract• 16 HMCs• Interconnection: Dragonfly and Mesh2D• 512 CPUs
• Single-issue in-order cores• Frequency: 1GHz
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Workloads
• 4 graph algorithms
• 5 real-world graphs
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Workloads
• 4 graph algorithms
• Breadth First Search• Single Source Shortest Path• Weakly Connected Component• PageRank
• 5 real-world graphs
• Wiki-Vote (WV)• ego-Twitter (TT)• Soc-Slashdot0902 (SD)• Amazon0302 (AZ) • ljournal-2008 (LJ)
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Performance
memorybandwidth
Tesseract
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
0
5
10
15
20
DDR3 SOTA GraphP-SC GraphP-SC-BRD
Sp
ee
du
pPerformance
data partition
memorybandwidth
1.7x
Tesseract
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
0
5
10
15
20
DDR3 SOTA GraphP-SC GraphP-SC-BRD
Sp
ee
du
pPerformance
data partition
memorybandwidth
1.7x
<1.1x
Tesseract
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Communication Amount
48.2%
7.0%
1.7%
51.8%
7.1% 0.4%
0%
25%
50%
75%
100%
Tesseract GraphP-SC GraphP-SC-BRD
No
rma
liz
ed
to
Te
ss
era
ct Intra-group
Inter-group
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Energy consumption
100.0%
24.9%
15.9%
0%
25%
50%
75%
100%
Tesseract GraphP-SC GraphP-SC-BRD
No
rma
liz
ed
to
Te
ss
era
ct
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Other results
• Bandwidth utilization
• Scalability
• Replication overhead
Please see the paper for more details
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Conclusions
• We propose GraphP
• A new PIM-based graph processing framework
• Key contributions
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Conclusions
• We propose GraphP
• A new PIM-based graph processing framework
• Key contributions
• Data partition as first-order design consideration
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Conclusions
• We propose GraphP
• A new PIM-based graph processing framework
• Key contributions
• Data partition as first-order design consideration
• Source-cut partition
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Conclusions
• We propose GraphP
• A new PIM-based graph processing framework
• Key contributions
• Data partition as first-order design consideration
• Source-cut partition• Two-phase vertex program
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Conclusions
• We propose GraphP
• A new PIM-based graph processing framework
• Key contributions
• Data partition as first-order design consideration
• Source-cut partition• Two-phase vertex program• Enable additional architecture optimizations
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Conclusions
• We propose GraphP
• A new PIM-based graph processing framework
• Key contributions
• Data partition as first-order design consideration
• Source-cut partition• Two-phase vertex program• Enable additional architecture optimizations
• GraphP drastically reduces inter-cube communication and improves energy efficiency.
GraphP: Reducing Communication for PIM-based
Graph Processingwith Efficient Data Partition
Mingxing Zhang, Youwei Zhuo (equal contribution),
Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen,
Christos Kozyrakis, Xuehai Qian
Tsinghua University
University of Southern California
Stanford University
ALCHEMalchem.usc.edu
GraphP: A PIM-based Graph Processing Framework
Workload Size & Capacity
• 128 GB (16 * 8GB)
• ~16 billion edges
• ~400 million edges (SNAP)
• ~7 billion edges (WebGraph)
https://snap.stanford.edu/data/http://law.di.unimi.it/datasets.php