GraphP: Reducing Communication for PIM-based Graph...

70
GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition Mingxing Zhang, Youwei Zhuo (equal contribution ), Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, Xuehai Qian Tsinghua University University of Southern California Stanford University

Transcript of GraphP: Reducing Communication for PIM-based Graph...

GraphP: Reducing Communication for PIM-based

Graph Processingwith Efficient Data Partition

Mingxing Zhang, Youwei Zhuo (equal contribution),

Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen,

Christos Kozyrakis, Xuehai Qian

Tsinghua University

University of Southern California

Stanford University

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Outline

• Motivation

• Graph applications• Processing-In-Memory• The drawbacks of the current solution

• GraphP

• Evaluation

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Graph Applications

• Social network analytics

• Recommendation system

• Bioinformatics

• …

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Challenges

• High bandwidth requirement

• Small amount of computation per vertex• Data movement overhead

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Challenges

• High bandwidth requirement

• Small amount of computation per vertex• Data movement overhead

mem

comp

L1

L3

L2

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

PIM: Processing-In-Memory

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

PIM: Processing-In-Memory

• Idea: Computation logic inside memory

• Advantage: High memory bandwidth

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

PIM: Processing-In-Memory

• Idea: Computation logic inside memory

• Advantage: High memory bandwidth

• Example: Hybrid Memory Cubes (HMC)

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

PIM: Processing-In-Memory

• Idea: Computation logic inside memory

• Advantage: High memory bandwidth

• Example: Hybrid Memory Cubes (HMC)

comp

320GB/s intra-cube

4x120GB/sinter-cube

mem

mem

mem

mem

…..

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

HMC: Hybrid Memory Cubes

320

Intra-cube

bandwidth(GB/s)

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

HMC: Hybrid Memory Cubes

320

120

Intra-cube

Inter-cube

bandwidth(GB/s)

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

HMC: Hybrid Memory Cubes

320

120

Intra-cube

Inter-cube

bandwidth(GB/s)

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

HMC: Hybrid Memory Cubes

320

120

Intra-cube

Inter-cube

bandwidth(GB/s)

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

HMC: Hybrid Memory Cubes

320

120 120

Intra-cube

Inter-cube

Inter-group

bandwidth(GB/s)

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

HMC: Hybrid Memory Cubes

320

120 120

Intra-cube

Inter-cube

Inter-group

bandwidth(GB/s)

Bottleneck: Inter-cube communication

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Outline

• Motivation

• Graph applications• Processing-In-Memory• The drawbacks of the current solution

• GraphP

• Evaluation

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-in-

memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

Current Solution: Tesseract

• First PIM-based graph processing architecture

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-in-

memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

Current Solution: Tesseract

• First PIM-based graph processing architecture

• Programming model

• Vertex program

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-in-

memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

Current Solution: Tesseract

• First PIM-based graph processing architecture

• Programming model

• Vertex program

• Partition

• Based on vertex program

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

PageRank in Vertex Programfor (v: vertices) {

}

update = 0.85 * v.rank / v.out_degree;

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

PageRank in Vertex Programfor (v: vertices) {

for (w: edges.destination) {

}

}

update = 0.85 * v.rank / v.out_degree;

put(w.id, function{ w.next_rank += update; });

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

PageRank in Vertex Programfor (v: vertices) {

for (w: edges.destination) {

}

}

update = 0.85 * v.rank / v.out_degree;

put(w.id, function{ w.next_rank += update; });

barrier();

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Graph Partition

0

3 4 5

21

0 21

3 54

hmc0

hmc1

1 vertex

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Graph Partition

0

3 4 5

21

0 21

3 54

hmc0

hmc1

1 vertex intraedge

interedge

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Graph Partition

0

3 4 5

21

0 21

3 54

hmc0

hmc1

put(w.id, function{ w.next_rank += update; });

1 vertex intraedge

interedge

comm

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Graph Partition

0

3 4 5

21

0 21

3 54

hmc0

hmc1

1 vertex intraedge

interedge

comm

communication = # of cross-cube edges

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Drawback of Tesseract

• Excessive data communication

• Why?

Programming

Model

Graph

Partition

Data Communication

Tesseract

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Drawback of Tesseract

• Excessive data communication

• Why?

Programming

Model

Graph

Partition

Data Communication

Tesseract ?

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Drawback of Tesseract

• Excessive data communication

• Why?

Programming

Model

Graph

Partition

Data Communication

Tesseract ?

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Drawback of Tesseract

• Excessive data communication

• Why?

Programming

Model

Graph

Partition

Data Communication

Tesseract ?

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Outline

• Motivation

• GraphP

• Evaluation

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

GraphP

• Consider graph partition first.

• Graph Partition

• Source-Cut

• Programming model

• Two-phase vertex program

• Reduces inter-cube communication

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Source-Cut Partition

0

3 4 5

21 0 21

3 54

1 vertex

hmc0

hmc1

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Source-Cut Partition

0

3 4 5

21 0 21

3 54

1 vertex intraedge

interedge

hmc0

hmc1

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Source-Cut Partition

0

3 4 5

21 0 21

3 54

1 vertex intraedge

interedge

22 replica

hmc0

hmc1

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Source-Cut Partition

0

3 4 5

21 0 21

3 54

1 vertex intraedge

interedge

22 replica

hmc0

hmc1

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Source-Cut Partition

0

3 4 5

21 0 21

3 54

1 vertex intraedge

interedge

22 replica

hmc0

hmc1

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Two-Phase Vertex Program

for (r: replicas) {

}

r.next_rank = 0.85 * r.next_rank / r.out_degree;

2

3 4 5

//apply updates from previous iterations

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Two-Phase Vertex Program

for (r: replicas) {

}

r.next_rank = 0.85 * r.next_rank / r.out_degree;

2

3 4 5

//apply updates from previous iterations

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Two-Phase Vertex Program

2

3 4 5

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Two-Phase Vertex Programfor (v: vertices) {

for (u: edges.sources) {

}

2

3 4 5

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Two-Phase Vertex Programfor (v: vertices) {

for (u: edges.sources) {

}

update += u.rank;

2

3 4 5

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Two-Phase Vertex Programfor (v: vertices) {

for (u: edges.sources) {

}

update += u.rank;

2

3 4 5

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

for (r: replicas) {

}

}

barrier();

put(r.id, function { r.next_rank = update});

20

3 4 5

4

Two-Phase Vertex Program

3

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Benefits

• Strictly less data communication

• Enables architecture optimizations

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Less Communication

2

54

Tesseract GraphP

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Less Communication

2

54

2

54

2

Tesseract GraphP

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Broadcast Optimization

for (r: replicas) {

}

}

barrier();

put(r.id, function { r.next_rank = update});

broadcast4

44 4

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Naïve Broadcast

• 15 point to point messages

src

dst dst

dst dst

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Hierarchical communication

• 3 intergroup messages

src

dst dst

dst dst

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Other Optimizations

• Computation/communication overlap

• Leveraging low-power state of SerDes

Please see the paper for more details

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Outline

• Motivation

• GraphP

• Evaluation

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Evaluation Methodology

• Simulation Infrastructure

• zSim with HMC support• ORION for NOC Energy modeling

• Configurations

• Same as Tesseract• 16 HMCs• Interconnection: Dragonfly and Mesh2D• 512 CPUs

• Single-issue in-order cores• Frequency: 1GHz

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Workloads

• 4 graph algorithms

• 5 real-world graphs

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Workloads

• 4 graph algorithms

• Breadth First Search• Single Source Shortest Path• Weakly Connected Component• PageRank

• 5 real-world graphs

• Wiki-Vote (WV)• ego-Twitter (TT)• Soc-Slashdot0902 (SD)• Amazon0302 (AZ) • ljournal-2008 (LJ)

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Performance

memorybandwidth

Tesseract

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

0

5

10

15

20

DDR3 SOTA GraphP-SC GraphP-SC-BRD

Sp

ee

du

pPerformance

data partition

memorybandwidth

1.7x

Tesseract

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

0

5

10

15

20

DDR3 SOTA GraphP-SC GraphP-SC-BRD

Sp

ee

du

pPerformance

data partition

memorybandwidth

1.7x

<1.1x

Tesseract

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Communication Amount

48.2%

7.0%

1.7%

51.8%

7.1% 0.4%

0%

25%

50%

75%

100%

Tesseract GraphP-SC GraphP-SC-BRD

No

rma

liz

ed

to

Te

ss

era

ct Intra-group

Inter-group

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Energy consumption

100.0%

24.9%

15.9%

0%

25%

50%

75%

100%

Tesseract GraphP-SC GraphP-SC-BRD

No

rma

liz

ed

to

Te

ss

era

ct

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Other results

• Bandwidth utilization

• Scalability

• Replication overhead

Please see the paper for more details

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Conclusions

• We propose GraphP

• A new PIM-based graph processing framework

• Key contributions

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Conclusions

• We propose GraphP

• A new PIM-based graph processing framework

• Key contributions

• Data partition as first-order design consideration

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Conclusions

• We propose GraphP

• A new PIM-based graph processing framework

• Key contributions

• Data partition as first-order design consideration

• Source-cut partition

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Conclusions

• We propose GraphP

• A new PIM-based graph processing framework

• Key contributions

• Data partition as first-order design consideration

• Source-cut partition• Two-phase vertex program

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Conclusions

• We propose GraphP

• A new PIM-based graph processing framework

• Key contributions

• Data partition as first-order design consideration

• Source-cut partition• Two-phase vertex program• Enable additional architecture optimizations

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Conclusions

• We propose GraphP

• A new PIM-based graph processing framework

• Key contributions

• Data partition as first-order design consideration

• Source-cut partition• Two-phase vertex program• Enable additional architecture optimizations

• GraphP drastically reduces inter-cube communication and improves energy efficiency.

GraphP: Reducing Communication for PIM-based

Graph Processingwith Efficient Data Partition

Mingxing Zhang, Youwei Zhuo (equal contribution),

Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen,

Christos Kozyrakis, Xuehai Qian

Tsinghua University

University of Southern California

Stanford University

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Workload Size & Capacity

• 128 GB (16 * 8GB)

• ~16 billion edges

• ~400 million edges (SNAP)

• ~7 billion edges (WebGraph)

https://snap.stanford.edu/data/http://law.di.unimi.it/datasets.php

ALCHEMalchem.usc.edu

GraphP: A PIM-based Graph Processing Framework

Two-phase vertex program

• Equivalent Expressiveness as vertex programs