Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson...
Transcript of Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson...
![Page 1: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/1.jpg)
JosephGonzalez
YuchengLow
DannyBickson
DistributedGraph-ParallelComputationonNaturalGraphs
HaijieGu
Jointworkwith:
CarlosGuestrin
![Page 2: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/2.jpg)
Graphs areubiquitous..
2
![Page 3: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/3.jpg)
SocialMedia
• Graphs encode relationships between:
• Big:billions ofvertices andedges andrichmetadata
AdvertisingScience Web
PeopleFacts
ProductsInterests
Ideas
3
![Page 4: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/4.jpg)
GraphsareEssentialtoData-Mining andMachineLearning
• Identifyinfluentialpeopleandinformation• Findcommunities• Targetadsandproducts• Modelcomplexdatadependencies
4
![Page 5: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/5.jpg)
5
Natural GraphsGraphsderivedfromnatural
phenomena
![Page 6: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/6.jpg)
6
Problem:
Existingdistributed graphcomputationsystemsperformpoorlyonNatural Graphs.
![Page 7: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/7.jpg)
PageRankonTwitterFollowerGraphNaturalGraphwith40MUsers,1.4BillionLinks
Hadoop results from [Kang et al. '11]Twister (in-memory MapReduce) [Ekanayake et al. ‘10]
7
0 50 100 150 200
Hadoop
GraphLab
Twister
Piccolo
PowerGraph
RuntimePerIteration
Orderofmagnitude byexploiting propertiesofNaturalGraphs
![Page 8: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/8.jpg)
PropertiesofNaturalGraphs
8
Power-LawDegreeDistribution
![Page 9: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/9.jpg)
Power-LawDegreeDistribution
100 102 104 106 108100
102
104
106
108
1010
degree
count
Top1%ofverticesareadjacentto
50%oftheedges!
High-DegreeVertices
9
Num
bero
fVertices
AltaVistaWebGraph1.4BVertices,6.6BEdges
Degree
Morethan108 verticeshaveoneneighbor.
![Page 10: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/10.jpg)
Power-LawDegreeDistribution
10
“StarLike”Motif
PresidentObama Followers
![Page 11: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/11.jpg)
Power-LawGraphsareDifficulttoPartition
• Power-Lawgraphsdonothavelow-cost balancedcuts[Leskovec etal.08,Lang04]
• Traditionalgraph-partitioningalgorithmsperformpoorlyonPower-LawGraphs.[Abou-Rjeili etal.06]
11
CPU 1 CPU 2
![Page 12: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/12.jpg)
PropertiesofNaturalGraphs
12
High-degreeVertices
LowQualityPartition
Power-LawDegreeDistribution
![Page 13: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/13.jpg)
Machine 1 Machine 2
• Split High-Degreevertices• NewAbstractionà Equivalence onSplitVertices
13
ProgramForThis
RunonThis
![Page 14: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/14.jpg)
Howdoweprogramgraphcomputation?
“ThinklikeaVertex.”-Malewicz etal.[SIGMOD’10]
14
![Page 15: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/15.jpg)
TheGraph-Parallel Abstraction• Auser-defined Vertex-Program runsoneachvertex• Graph constrainsinteraction alongedges
– Usingmessages(e.g.Pregel [PODC’09,SIGMOD’10])
– Throughsharedstate(e.g.,GraphLab [UAI’10,VLDB’12])
• Parallelism:runmultiplevertexprogramssimultaneously
15
![Page 16: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/16.jpg)
Example
What’s the popularityof this user?
Popular?
Depends on popularityof her followers
Depends on the popularity their followers
16
![Page 17: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/17.jpg)
PageRankAlgorithm
• Updateranksinparallel• Iterateuntilconvergence
Rankofuseri Weightedsumof
neighbors’ranks
17
R[i] = 0.15 +X
j2Nbrs(i)
wjiR[j]
![Page 18: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/18.jpg)
ThePregel AbstractionVertex-Programsinteractbysendingmessages.
iPregel_PageRank(i, messages) : // Receive all the messagestotal = 0foreach( msg in messages) :
total = total + msg
// Update the rank of this vertexR[i] = 0.15 + total
// Send new messages to neighborsforeach(j in out_neighbors[i]) :
Send msg(R[i] * wij) to vertex j
18Malewicz etal.[PODC’09,SIGMOD’10]
![Page 19: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/19.jpg)
TheGraphLab AbstractionVertex-Programsdirectlyread theneighborsstate
iGraphLab_PageRank(i) // Compute sum over neighborstotal = 0foreach( j in in_neighbors(i)):
total = total + R[j] * wji
// Update the PageRankR[i] = 0.15 + total
// Trigger neighbors to run againif R[i] not converged then
foreach( j in out_neighbors(i)): signal vertex-program on j
19Lowetal.[UAI’10,VLDB’12]
![Page 20: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/20.jpg)
AsynchronousExecutionrequiresheavylocking(GraphLab)
ChallengesofHigh-DegreeVertices
Touchesalargefractionofgraph
(GraphLab)
Sequentiallyprocessedges
Sendsmanymessages(Pregel)
Edgemeta-datatoolargeforsingle
machine
SynchronousExecutionpronetostragglers(Pregel)
20
![Page 21: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/21.jpg)
CommunicationOverheadforHigh-DegreeVertices
Fan-Invs.Fan-Out
21
![Page 22: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/22.jpg)
PregelMessageCombinersonFan-In
Machine1 Machine2
+B
A
C
DSum
• Userdefinedcommutative associative (+)messageoperation:
22
![Page 23: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/23.jpg)
Pregel StruggleswithFan-Out
Machine1 Machine2
B
A
C
D
• Broadcast sendsmanycopiesofthesamemessagetothesamemachine!
23
![Page 24: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/24.jpg)
Fan-InandFan-OutPerformance• PageRankonsyntheticPower-LawGraphs– PiccolowasusedtosimulatePregel withcombiners
0246810
1.8 1.9 2 2.1 2.2
TotalC
omm.(GB)
Power-LawConstantα
Morehigh-degreevertices 24
![Page 25: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/25.jpg)
GraphLab Ghosting
• Changestomasteraresyncedtoghosts
Machine1
A
B
C
Machine2
DD
A
B
CGhost
25
![Page 26: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/26.jpg)
GraphLab Ghosting
• Changestoneighbors ofhighdegreeverticescreatessubstantialnetworktraffic
Machine1
A
B
C
Machine2
DD
A
B
C Ghost
26
![Page 27: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/27.jpg)
Fan-InandFan-OutPerformance
• PageRankonsyntheticPower-LawGraphs• GraphLab isundirected
0246810
1.8 1.9 2 2.1 2.2
TotalC
omm.(GB)
Power-LawConstantalphaMorehigh-degreevertices 27
![Page 28: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/28.jpg)
GraphPartitioning• Graphparallelabstractionsrelyonpartitioning:– Minimizecommunication– Balancecomputationandstorage
Y
Machine1 Machine228
Data transmittedacross network
O(# cut edges)
![Page 29: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/29.jpg)
Machine1 Machine2
RandomPartitioning
• BothGraphLabandPregel resorttorandom(hashed)partitioningonnaturalgraphs
3"2"
1"
D
A"
C"
B" 2"3"
C"
D
B"A"
1"
D
A"
C"C"
B"
(a) Edge-Cut
B"A" 1"
C" D3"
C" B"2"
C" D
B"A" 1"
3"
(b) Vertex-Cut
Figure 4: (a) An edge-cut and (b) vertex-cut of a graph intothree parts. Shaded vertices are ghosts and mirrors respectively.
5 Distributed Graph Placement
The PowerGraph abstraction relies on the distributed data-graph to store the computation state and encode the in-teraction between vertex programs. The placement ofthe data-graph structure and data plays a central role inminimizing communication and ensuring work balance.
A common approach to placing a graph on a cluster of pmachines is to construct a balanced p-way edge-cut (e.g.,Fig. 4a) in which vertices are evenly assigned to machinesand the number of edges spanning machines is minimized.Unfortunately, the tools [21, 31] for constructing balancededge-cuts perform poorly [1, 26, 23] or even fail on power-law graphs. When the graph is difficult to partition, bothGraphLab and Pregel resort to hashed (random) vertexplacement. While fast and easy to implement, hashedvertex placement cuts most of the edges:
Theorem 5.1. If vertices are randomly assigned to pmachines then the expected fraction of edges cut is:
E|Edges Cut|
|E|
�= 1� 1
p(5.1)
For example if just two machines are used, half of theof edges will be cut requiring order |E|/2 communication.
5.1 Balanced p-way Vertex-CutThe PowerGraph abstraction enables a single vertex pro-gram to span multiple machines. Hence, we can ensurework balance by evenly assigning edges to machines.Communication is minimized by limiting the number ofmachines a single vertex spans. A balanced p-way vertex-cut formalizes this objective by assigning each edge e2 Eto a machine A(e) 2 {1, . . . , p}. Each vertex then spansthe set of machines A(v)✓ {1, . . . , p} that contain its ad-jacent edges. We define the balanced vertex-cut objective:
minA
1|V | Â
v2V|A(v)| (5.2)
s.t. maxm
|{e 2 E | A(e) = m}|< l |E|p
(5.3)
where the imbalance factor l � 1 is a small constant. Weuse the term replicas of a vertex v to denote the |A(v)|copies of the vertex v: each machine in A(v) has a replicaof v. The objective term (Eq. 5.2) therefore minimizes the
average number of replicas in the graph and as a conse-quence the total storage and communication requirementsof the PowerGraph engine.
Vertex-cuts address many of the major issues associatedwith edge-cuts in power-law graphs. Percolation theory[3] suggests that power-law graphs have good vertex-cuts.Intuitively, by cutting a small fraction of the very highdegree vertices we can quickly shatter a graph. Further-more, because the balance constraint (Eq. 5.3) ensuresthat edges are uniformly distributed over machines, wenaturally achieve improved work balance even in the pres-ence of very high-degree vertices.
The simplest method to construct a vertex cut is torandomly assign edges to machines. Random (hashed)edge placement is fully data-parallel, achieves nearly per-fect balance on large graphs, and can be applied in thestreaming setting. In the following we relate the expectednormalized replication factor (Eq. 5.2) to the number ofmachines and the power-law constant a .
Theorem 5.2 (Randomized Vertex Cuts). Let D[v] denotethe degree of vertex v. A uniform random edge placementon p machines has an expected replication factor
E"
1|V | Â
v2V|A(v)|
#=
p|V | Â
v2V
1�✓
1� 1p
◆D[v]!. (5.4)
For a graph with power-law constant a we obtain:
E"
1|V | Â
v2V|A(v)|
#= p� pLia
✓p�1
p
◆/z (a) (5.5)
where Lia (x) is the transcendental polylog function andz (a) is the Riemann Zeta function (plotted in Fig. 5a).
Higher a values imply a lower replication factor, con-firming our earlier intuition. In contrast to a random 2-way edge-cut which requires order |E|/2 communicationa random 2-way vertex-cut on an a = 2 power-law graphrequires only order 0.3 |V | communication, a substantialsavings on natural graphs where E can be an order ofmagnitude larger than V (see Tab. 1a).
5.2 Greedy Vertex-CutsWe can improve upon the randomly constructed vertex-cut by de-randomizing the edge-placement process. Theresulting algorithm is a sequential greedy heuristic whichplaces the next edge on the machine that minimizes theconditional expected replication factor. To construct thede-randomization we consider the task of placing the i+1edge after having placed the previous i edges. Using theconditional expectation we define the objective:
argmink
E"
Âv2V
|A(v)|
����� Ai,A(ei+1) = k
#(5.6)
6
10Machinesà 90%ofedgescut100Machinesà 99%ofedgescut!
29
![Page 30: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/30.jpg)
InSummary
GraphLab andPregel arenotwellsuitedfornaturalgraphs
• Challengesofhigh-degreevertices• Lowqualitypartitioning
30
![Page 31: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/31.jpg)
• GASDecomposition:distributevertex-programs– Movecomputationtodata– Parallelizehigh-degreevertices
• VertexPartitioning:– Effectivelydistributelargepower-lawgraphs
31
![Page 32: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/32.jpg)
GatherInformationAboutNeighborhood
UpdateVertex
SignalNeighbors&ModifyEdgeData
ACommonPattern forVertex-Programs
GraphLab_PageRank(i) // Compute sum over neighborstotal = 0foreach( j in in_neighbors(i)):
total = total + R[j] * wji
// Update the PageRankR[i] = 0.1 + total
// Trigger neighbors to run againif R[i] not converged then
foreach( j in out_neighbors(i)) signal vertex-program on j
32
![Page 33: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/33.jpg)
GASDecompositionY
+…+à
Y
ParallelSum
UserDefined:Gather()à ΣY
Σ1 + Σ2 à Σ3
Y
Gather(Reduce)Applytheaccumulatedvaluetocentervertex
ApplyUpdateadjacentedges
andvertices.
Scatter
⌃
Accumulateinformationaboutneighborhood
Y
+
UserDefined:Apply(,Σ)à Y
’Y
Y
Σ Y’
UpdateEdgeData&ActivateNeighbors
UserDefined:Scatter()àY’
Y’
33
![Page 34: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/34.jpg)
PowerGraph_PageRank(i)
Gather(j à i ): returnwji * R[j]sum(a,b) :returna+b;
Apply(i, Σ) : R[i] = 0.15 + Σ
Scatter( i à j ) :ifR[i] changedthentriggerj toberecomputed
PageRankinPowerGraph
34
R[i] = 0.15 +X
j2Nbrs(i)
wjiR[j]
![Page 35: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/35.jpg)
Machine2Machine1
Machine4Machine3
DistributedExecutionofaPowerGraphVertex-Program
Σ1 Σ2
Σ3 Σ4
+++
YYYY
Y’
ΣY’Y’Y’Gather
Apply
Scatter
35
Master
Mirror
MirrorMirror
![Page 36: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/36.jpg)
MinimizingCommunicationinPowerGraph
YYY
Avertex-cutminimizesmachineseachvertexspans
Percolationtheorysuggeststhatpowerlawgraphshavegoodvertexcuts.[Albertetal.2000]
Communicationislinearinthenumberofmachines
eachvertexspans
36
![Page 37: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/37.jpg)
NewApproachtoPartitioning
• Ratherthancutedges:
• wecutvertices:CPU 1 CPU 2
YY Mustsynchronize
many edges
CPU 1 CPU 2
Y Y Mustsynchronizeasingle vertex
NewTheorem:Forany edge-cut wecandirectlyconstructavertex-cutwhichrequiresstrictlylesscommunicationandstorage.
37
![Page 38: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/38.jpg)
ConstructingVertex-Cuts
• Evenly assignedges tomachines– Minimizemachinesspannedbyeachvertex
• Assigneachedgeasit isloaded– Toucheachedgeonlyonce
• Proposethreedistributedapproaches:– Random EdgePlacement– CoordinatedGreedyEdgePlacement– ObliviousGreedy EdgePlacement
38
![Page 39: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/39.jpg)
Machine2Machine1 Machine3
Random Edge-Placement• Randomlyassignedgestomachines
YYYY ZYYYY ZY ZY Spans3Machines
Z Spans2Machines
BalancedVertex-Cut
Notcut!
39
![Page 40: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/40.jpg)
AnalysisRandomEdge-Placement
• Expectednumberofmachinesspannedbyavertex:
2468
101214161820
8 28 48Exp.#ofM
achine
sSpa
nned
NumberofMachines
Predicted
Random
TwitterFollowerGraph41MillionVertices1.4BillionEdges
AccuratelyEstimateMemoryandComm.
Overhead
40
![Page 41: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/41.jpg)
RandomVertex-Cutsvs.Edge-Cuts
• Expectedimprovementfromvertex-cuts:
1
10
100
0 50 100 150
Redu
ctionin
Comm.and
Storage
NumberofMachines41
OrderofMagnitudeImprovement
![Page 42: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/42.jpg)
GreedyVertex-Cuts
• Placeedgesonmachineswhichalreadyhavetheverticesinthatedge.
Machine1 Machine 2
BA CB
DA EB42
![Page 43: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/43.jpg)
GreedyVertex-Cuts
• De-randomizationà greedilyminimizestheexpectednumberofmachinesspanned
• Coordinated EdgePlacement– Requirescoordinationtoplaceeachedge– Slower:higherqualitycuts
• Oblivious EdgePlacement– Approx.greedyobjectivewithoutcoordination– Faster:lowerqualitycuts
43
![Page 44: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/44.jpg)
PartitioningPerformanceTwitterGraph: 41Mvertices,1.4Bedges
Obliviousbalancescostandpartitioningtime.
2468
1012141618
8 16 24 32 40 48 56 64
Avg#ofM
achine
sSpa
nned
NumberofMachines
0
200
400
600
800
1000
8 16 24 32 40 48 56 64Partition
ingTime(Secon
ds)
NumberofMachines
44
Cost ConstructionTime
Better
![Page 45: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/45.jpg)
GreedyVertex-CutsImprovePerformance
00.10.20.30.40.50.60.70.80.91
PageRank CollaborativeFiltering
ShortestPath
Runtim
eRe
lativ
etoRan
dom
RandomObliviousCoordinated
Greedypartitioningimprovescomputationperformance. 45
![Page 46: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/46.jpg)
OtherFeatures(SeePaper)
• Supportsthreeexecutionmodes:– Synchronous: Bulk-SynchronousGASPhases– Asynchronous: InterleaveGASPhases– Asynchronous+Serializable:Neighboringverticesdonotrunsimultaneously
• DeltaCaching– Accelerategatherphasebycachingpartialsumsforeachvertex
46
![Page 47: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/47.jpg)
SystemEvaluation
47
![Page 48: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/48.jpg)
SystemDesign
• ImplementedasC++API• UsesHDFSforGraphInputandOutput• Fault-toleranceisachievedbycheck-pointing– Snapshot time<5secondsfortwitternetwork
48
EC2 HPCNodes
MPI/TCP-IP PThreads HDFS
PowerGraph(GraphLab2)System
![Page 49: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/49.jpg)
ImplementedManyAlgorithms
• CollaborativeFiltering– AlternatingLeastSquares– StochasticGradientDescent
– SVD– Non-negativeMF
• StatisticalInference– LoopyBeliefPropagation– Max-ProductLinearPrograms
– GibbsSampling
• GraphAnalytics– PageRank– TriangleCounting– ShortestPath– GraphColoring– K-coreDecomposition
• ComputerVision– Imagestitching
• LanguageModeling– LDA
49
![Page 50: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/50.jpg)
ComparisonwithGraphLab &Pregel• PageRankonSyntheticPower-LawGraphs:
RuntimeCommunication
0246810
1.8
TotalN
etwork(GB)
Power-LawConstantα
051015202530
1.8Second
s
Power-LawConstantα
Pregel (Piccolo)
GraphLab
Pregel (Piccolo)
GraphLab
50
High-degreevertices High-degreevertices
PowerGraphisrobusttohigh-degree vertices.
![Page 51: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/51.jpg)
PageRankontheTwitterFollowerGraph
010203040506070
GraphLab Pregel(Piccolo)
PowerGraph
51
05
10152025303540
GraphLab Pregel(Piccolo)
PowerGraph
TotalN
etwork(GB)
Second
s
Communication RuntimeNaturalGraphwith40MUsers,1.4BillionLinks
ReducesCommunication RunsFaster32Nodesx8Cores(EC2HPCcc1.4x)
![Page 52: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/52.jpg)
PowerGraphisScalableYahooAltavista WebGraph(2002):
Oneofthelargestpubliclyavailablewebgraphs1.4 BillionWebpages,6.6BillionLinks
1024Cores(2048HT)64HPCNodes
7SecondsperIter.1Blinksprocessedpersecond
30linesofusercode52
![Page 53: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/53.jpg)
TopicModeling• EnglishlanguageWikipedia
– 2.6MDocuments,8.3MWords,500MTokens
– Computationallyintensivealgorithm
53
0 20 40 60 80 100 120 140 160
Smolaetal.
PowerGraph
MillionTokensPerSecond
100Yahoo!MachinesSpecificallyengineeredforthistask
64cc2.8xlargeEC2Nodes200linesofcode& 4humanhours
![Page 54: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/54.jpg)
Counted:34.8BillionTriangles
54
TriangleCountingonTheTwitterGraphIdentifyindividualswithstrongcommunities.
64Machines1.5Minutes
1536Machines423Minutes
Hadoop[WWW’11]
S.Suri andS.Vassilvitskii,“Countingtrianglesandthecurseofthelastreducer,”WWW’11
282xFaster
Why?WrongAbstractionàBroadcastO(degree2)messagesperVertex
![Page 55: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/55.jpg)
Summary• Problem: ComputationonNaturalGraphs ischallenging– High-degreevertices– Low-qualityedge-cuts
• Solution:PowerGraphSystem– GASDecomposition:splitvertex programs– Vertex-partitioning:distributenaturalgraphs
• PowerGraphtheoretically andexperimentallyoutperformsexistinggraph-parallelsystems.
55
![Page 56: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/56.jpg)
PowerGraph(GraphLab2)System
GraphAnalytics
GraphicalModels
ComputerVision Clustering Topic
ModelingCollaborative
Filtering
MachineLearning andData-MiningToolkits
![Page 57: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/57.jpg)
FutureWork
• Timeevolvinggraphs– Supportstructuralchanges duringcomputation
• Out-of-corestorage(GraphChi)– Supportgraphsthatdon’tfitinmemory
• ImprovedFault-Tolerance– Leveragevertexreplicationtoreducesnapshots– Asynchronous recovery
57
![Page 58: Joseph Gonzalez - Peoplejegonzal/assets/slides/...Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos](https://reader033.fdocuments.in/reader033/viewer/2022052802/5f1e96d3e34727409c6062ff/html5/thumbnails/58.jpg)
isGraphLab Version2.1Apache2License
http://graphlab.orgDocumentation… Code… Tutorials… (more on the way)