Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21...
Transcript of Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21...
12/6/16
1
Big Data: Graph Processing
COS 418: Distributed SystemsLecture 21
Kyle Jamieson
[Content adapted from J. Gonzalez]
Patientpresentsabdominal
pain.
Diagnosis?
Patientate
whichcontains
purchasedfrom
Alsosold
to
Diagnoses
withE.Coli
infection
BigDataisEverywhere
• Machinelearningisareality
• Howwillwedesignandimplement“BigLearning”systems?
3
72HoursaMinuteYouTube28Million
WikipediaPages
900MillionFacebookUsers
6BillionFlickrPhotos
Threads,Locks,&Messages
“Low-levelparallelprimitives”
Wecoulduse….
12/6/16
2
ShiftTowardsUseOfParallelisminML
• Programmersrepeatedly solvethesameparalleldesignchallenges:– Raceconditions,distributedstate,communication…
• Resultingcodeisveryspecialized:– Difficult tomaintain,extend,debug…
5
GPUs Multicore Clusters Clouds Supercomputers
Idea:Avoidtheseproblemsbyusinghigh-levelabstractions
MapReduce/Hadoop
Buildlearningalgorithmsontopofhigh-levelparallelabstractions
...abetter answer:
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce– MapPhase
7
EmbarrassinglyParallelindependentcomputation
12.9
42.3
21.3
25.8
NoCommunicationneeded
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce– MapPhase
8
12.9
42.3
21.3
25.8
24.1
84.3
18.4
84.4
ImageFeatures
12/6/16
3
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce– MapPhase
9
EmbarrassinglyParallelindependentcomputation
12.9
42.3
21.3
25.8
17.5
67.5
14.9
34.3
24.1
84.3
18.4
84.4
CPU 1 CPU 2
MapReduce– ReducePhase
10
12.9
42.3
21.3
25.8
24.1
84.3
18.4
84.4
17.5
67.5
14.9
34.3
2226.26
1726.31
ImageFeatures
OutdoorPictureStatistics
IndoorPictureStatistics
IOOIIIOOIOII
OutdoorPictures
IndoorPictures
BeliefPropagation
LabelPropagation
KernelMethods
DeepBeliefNetworks
NeuralNetworks
TensorFactorization
PageRank
Lasso
Map-ReduceforData-ParallelML• Excellentforlargedata-paralleltasks!
11
Data-Parallel Graph-Parallel
AlgorithmTuning
FeatureExtraction
MapReduce
BasicDataProcessing
Is there more toMachine Learning
?ExploitingDependencies
12/6/16
4
GraphsareEverywhere
Users
Movies
Netflix
CollaborativeFiltering
Docs
Words
Wiki
TextAnalysis
SocialNetwork
ProbabilisticAnalysis
ConcreteExample
LabelPropagation
Profile
LabelPropagationAlgorithm• SocialArithmetic:
• RecurrenceAlgorithm:
– iterateuntilconvergence• Parallelism:
– ComputeallLikes[i] inparallel
SueAnn
Carlos
Me
40%
10%
50%
80%Cameras20%Biking
30%Cameras70%Biking
50%Cameras50%Biking
50%WhatIlistonmyprofile40%SueAnnLikes10%CarlosLike
ILike:
+60%Cameras,40%Biking
Likes[i]= Wij × Likes[ j]j∈Friends[i]∑
PropertiesofGraphParallelAlgorithms
DependencyGraph
IterativeComputation
What I Like
What My Friends Like
FactoredComputation
12/6/16
5
BeliefPropagation
LabelPropagation
KernelMethods
DeepBeliefNetworks
NeuralNetworks
TensorFactorization
PageRank
Lasso
Map-ReduceforData-ParallelML• Excellentforlargedata-paralleltasks!
17
Data-Parallel Graph-Parallel
MapReduce MapReduce?AlgorithmTuning
FeatureExtraction
BasicDataProcessing
Problem:DataDependencies• MapReducedoesn’t efficientlyexpressdatadependencies– Usermustcodesubstantialdatatransformations– Costlydatareplication
Inde
pend
entD
ataRo
ws
Slow
Processor
IterativeAlgorithms• MRdoesn’tefficientlyexpressiterative algorithms:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Barrier
Barrier
Barrier
MapAbuse:IterativeMapReduce• Onlyasubsetofdataneedscomputation:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Barrier
Barrier
Barrier
12/6/16
6
MapAbuse:IterativeMapReduce• Systemisnotoptimizedforiteration:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Disk Penalty
Disk Penalty
Disk Penalty
Startup Penalty
Startup Penalty
Startup Penalty
6. Before
8. After
7. After
MLTasksBeyondData-Parallelism
Data-Parallel Graph-Parallel
CrossValidation
FeatureExtraction
MapReduce
ComputingSufficientStatistics
GraphicalModelsGibbsSampling
BeliefPropagationVariationalOpt.
Semi-SupervisedLearning
LabelPropagationCoEM
GraphAnalysisPageRank
TriangleCounting
CollaborativeFiltering
TensorFactorization
22
23
• Limited CPUPower• LimitedMemory• Limited Scalability
DistributedCloud
Challenges:- Distribute state- Keep data consistent- Provide fault tolerance
24
Scaleupcomputationalresources!
12/6/16
7
TheGraphLabFramework
ConsistencyModel
GraphBasedDataRepresentation
UpdateFunctionsUserComputation
25
DataGraphDataisassociatedwithbothverticesandedges
VertexData:• Userprofile• Currentinterestsestimates
EdgeData:• Relationship(friend,classmate,relative)
Graph:• SocialNetwork
26
DistributedDataGraph
27
Partitionthegraphacrossmultiplemachines: • Ghostverticesmaintainadjacencystructureandreplicateremote data.
“ghost”vertices
28
DistributedDataGraph
12/6/16
8
DistributedDataGraph
• CutefficientlyusingHPCGraphpartitioningtools(ParMetis/Scotch/…)
29
“ghost”vertices
TheGraphLabFramework
ConsistencyModel
GraphBasedDataRepresentation
UpdateFunctionsUserComputation
30
Pagerank(scope){//Updatethecurrentvertexdata
//RescheduleNeighborsifneededifvertex.PageRankchangesthenreschedule_all_neighbors;
}
vertex.PageRank = αForEach inPage:
vertex.PageRank += (1−α)× inPage.PageRank
UpdateFunctionAuser-definedprogram, appliedtoavertex;transformsdatainscope ofvertex
Selectively triggerscomputationatneighbors
Updatefunctionapplied(asynchronously)inparalleluntilconvergence
Manyschedulersavailabletoprioritizecomputation
31
DistributedScheduling
e
ih
ba
f g
kj
dca
h
f
g
j
cb
i
Eachmachinemaintainsaschedule overtheverticesitowns
32Distributed Consensus used to identify completion
12/6/16
9
• Howmuchcancomputationoverlap?
EnsuringRace-FreeCode
33
TheGraphLabFramework
ConsistencyModel
GraphBasedDataRepresentation
UpdateFunctionsUserComputation
34
PageRankRevisited
35
Pagerank(scope){
…}
vertex.PageRank = αForEach inPage:
vertex.PageRank += (1−α)× inPage.PageRankvertex.PageRank = tmp
PageRankdataracesconfoundconvergence
36
12/6/16
10
RacingPageRank:Bug
37
Pagerank(scope){
…}
vertex.PageRank = αForEach inPage:
vertex.PageRank += (1−α)× inPage.PageRankvertex.PageRank = tmp
RacingPageRank:BugFix
38
Pagerank(scope){
…}
vertex.PageRank = αForEach inPage:
vertex.PageRank += (1−α)× inPage.PageRankvertex.PageRank = tmp
tmp
tmp
Throughput!=Performance
NoConsistency
HigherThroughput(#updates/sec)
PotentiallySlowerConvergenceofML
39
Serializability
40
Foreveryparallelexecution,thereexistsasequentialexecutionofupdatefunctionswhichproducesthesameresult.
CPU 1
CPU 2
SingleCPU
Parallel
Sequential
time
12/6/16
11
SerializabilityExample
41
Read
Write
Updatefunctionsone vertexapartcanberuninparallel.
EdgeConsistency
Overlapping regionsare only read.
Stronger/Weakerconsistencylevelsavailable
User-tunableconsistencylevelstradesoffparallelism&consistency
DistributedConsistency
• Solution1:ChromaticEngine– EdgeConsistencyviaGraphColoring
• Solution2:DistributedLocking
ChromaticDistributedEngine
Time
Execute tasks on all vertices of
color 0Execute tasks
on all vertices of color 0
Ghost Synchronization Completion + Barrier
Execute tasks on all vertices of
color 1
Execute tasks on all vertices of
color 1
Ghost Synchronization Completion + Barrier
43
MatrixFactorization• NetflixCollaborativeFiltering
– AlternatingLeastSquaresMatrixFactorizationModel:0.5millionnodes,99millionedges
Netflix
Users
Movies
DD
44
Users Movies
12/6/16
12
NetflixCollaborativeFiltering
45
4 8 16 24 32 40 48 56 6412
4
6
8
10
12
14
16
#Nodes
Spee
dup
Ideald=100 (30M Cycles)d=50 (7.7M Cycles)
d=20 (2.1M Cycles)d=5 (1.0M Cycles)
Ideal
D=100
D=20
#machines4 8 16 24 32 40 48 56 64101
102
103
104
#NodesRuntim
e(s) Hadoop MPI
GraphLab
HadoopMPI
GraphLab
#machines
(D=20)
vs4m
achine
s
DistributedConsistency
• Solution1:ChromaticEngine– EdgeConsistencyviaGraphColoring– Requiresagraphcoloringtobeavailable– Frequentbarriersà inefficient whenonlysomeverticesactive
• Solution2:DistributedLocking
DistributedLockingEdgeConsistencycanbeguaranteedthroughlocking.
:RWLock
47
ConsistencyThroughLockingAcquirewrite-lockoncentervertex,read-lockonadjacent.
48
Performanceproblem:Acquiringalockfromaneighboringmachineincursalatencypenalty
12/6/16
13
Simplelocking
lock scope 1
Process request 1
scope1acquiredupdate_function 1releasescope1
Process release 1
Time
49
PipelininghideslatencyGraphLab Idea:Hidelatencyusingpipelining
lock scope 1
Process request 1
scope1acquired
update_function 1releasescope1
Process release 1
lock scope 2
Time lock scope 3 Process request 2
Process request 3scope2acquiredscope3acquired
update_function 2releasescope2 50
DistributedConsistency
• Solution1:ChromaticEngine– EdgeConsistencyviaGraphColoring– Requiresagraphcoloringtobeavailable– Frequentbarriersà inefficient whenonlysomeverticesactive
• Solution2:DistributedLocking– ResidualBPon190K-vertex/560K-edgegraph,4machines– Nopipelining:472sec;withpipelining:10sec
Howtohandlemachinefailure?
• Whatwhenmachinesfail?How doweprovidefaulttolerance?
• Strawmanscheme:Synchronoussnapshotcheckpointing1. Stoptheworld2. Writeeachmachines’statetodisk
12/6/16
14
SnapshotPerformance
0 50 100 1500
0.5
1
1.5
2
2.5x 108
time elapsed(s)
verti
ces
upda
ted
sync. snapshot
no snapshot
async. snapshot
53
No Snapshot
Snapshot
Oneslowmachine
Howcanwedobetter,leveragingGraphLab’sconsistencymechanisms?
Snapshottime
Slowmachine
Chandy-LamportcheckpointingStep1.Atomicallyoneinitiator(a)Turnsred,(b)Recordsitsownstate
(c)sendsmarker toneighbors
Step2.Onreceivingmarkernon-rednodeatomically:(a)Turnsred,(b)Recordsitsownstate,(c)sendsmarkersalongalloutgoingchannels First-in,first-
outchannelsbetweennodes
ImplementedwithinGraphLabasanUpdateFunction
Async.SnapshotPerformance
0 50 100 1500
0.5
1
1.5
2
2.5x 108
time elapsed(s)
verti
ces
upda
ted
sync. snapshot
no snapshot
async. snapshot
55
No Snapshot
Snapshot
Oneslowmachine
Nosystemperformancepenaltyincurredfromtheslowmachine!
Summary
• Twodifferentmethodsofachievingconsistency– GraphColoring– DistributedLockingwithpipelining
• Efficientimplementations• AsynchronousFTw/fine-grainedChandy-Lamport
56
Performance
Useability
Efficiency Scalability
12/6/16
15
FridayPrecept:RoofnetperformanceMoreGraphProcessing
Mondaytopic:StreamingDataProcessing
57