Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21...

15
12/6/16 1 Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain. Diagnosis? Patient ate which contains purchased from Also sold to Diagnoses with E. Coli infection Big Data is Everywhere Machine learning is a reality How will we design and implement “Big Learning” systems? 3 72 Hours a Minute YouTube 28 Million Wikipedia Pages 900 Million Facebook Users 6 Billion Flickr Photos Threads, Locks, & Messages “Low-level parallel primitives” We could use ….

Transcript of Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21...

Page 1: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

1

Big Data: Graph Processing

COS 418: Distributed SystemsLecture 21

Kyle Jamieson

[Content adapted from J. Gonzalez]

Patientpresentsabdominal

pain.

Diagnosis?

Patientate

whichcontains

purchasedfrom

Alsosold

to

Diagnoses

withE.Coli

infection

BigDataisEverywhere

• Machinelearningisareality

• Howwillwedesignandimplement“BigLearning”systems?

3

72HoursaMinuteYouTube28Million

WikipediaPages

900MillionFacebookUsers

6BillionFlickrPhotos

Threads,Locks,&Messages

“Low-levelparallelprimitives”

Wecoulduse….

Page 2: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

2

ShiftTowardsUseOfParallelisminML

• Programmersrepeatedly solvethesameparalleldesignchallenges:– Raceconditions,distributedstate,communication…

• Resultingcodeisveryspecialized:– Difficult tomaintain,extend,debug…

5

GPUs Multicore Clusters Clouds Supercomputers

Idea:Avoidtheseproblemsbyusinghigh-levelabstractions

MapReduce/Hadoop

Buildlearningalgorithmsontopofhigh-levelparallelabstractions

...abetter answer:

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce– MapPhase

7

EmbarrassinglyParallelindependentcomputation

12.9

42.3

21.3

25.8

NoCommunicationneeded

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce– MapPhase

8

12.9

42.3

21.3

25.8

24.1

84.3

18.4

84.4

ImageFeatures

Page 3: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

3

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce– MapPhase

9

EmbarrassinglyParallelindependentcomputation

12.9

42.3

21.3

25.8

17.5

67.5

14.9

34.3

24.1

84.3

18.4

84.4

CPU 1 CPU 2

MapReduce– ReducePhase

10

12.9

42.3

21.3

25.8

24.1

84.3

18.4

84.4

17.5

67.5

14.9

34.3

2226.26

1726.31

ImageFeatures

OutdoorPictureStatistics

IndoorPictureStatistics

IOOIIIOOIOII

OutdoorPictures

IndoorPictures

BeliefPropagation

LabelPropagation

KernelMethods

DeepBeliefNetworks

NeuralNetworks

TensorFactorization

PageRank

Lasso

Map-ReduceforData-ParallelML• Excellentforlargedata-paralleltasks!

11

Data-Parallel Graph-Parallel

AlgorithmTuning

FeatureExtraction

MapReduce

BasicDataProcessing

Is there more toMachine Learning

?ExploitingDependencies

Page 4: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

4

GraphsareEverywhere

Users

Movies

Netflix

CollaborativeFiltering

Docs

Words

Wiki

TextAnalysis

SocialNetwork

ProbabilisticAnalysis

ConcreteExample

LabelPropagation

Profile

LabelPropagationAlgorithm• SocialArithmetic:

• RecurrenceAlgorithm:

– iterateuntilconvergence• Parallelism:

– ComputeallLikes[i] inparallel

SueAnn

Carlos

Me

40%

10%

50%

80%Cameras20%Biking

30%Cameras70%Biking

50%Cameras50%Biking

50%WhatIlistonmyprofile40%SueAnnLikes10%CarlosLike

ILike:

+60%Cameras,40%Biking

Likes[i]= Wij × Likes[ j]j∈Friends[i]∑

PropertiesofGraphParallelAlgorithms

DependencyGraph

IterativeComputation

What I Like

What My Friends Like

FactoredComputation

Page 5: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

5

BeliefPropagation

LabelPropagation

KernelMethods

DeepBeliefNetworks

NeuralNetworks

TensorFactorization

PageRank

Lasso

Map-ReduceforData-ParallelML• Excellentforlargedata-paralleltasks!

17

Data-Parallel Graph-Parallel

MapReduce MapReduce?AlgorithmTuning

FeatureExtraction

BasicDataProcessing

Problem:DataDependencies• MapReducedoesn’t efficientlyexpressdatadependencies– Usermustcodesubstantialdatatransformations– Costlydatareplication

Inde

pend

entD

ataRo

ws

Slow

Processor

IterativeAlgorithms• MRdoesn’tefficientlyexpressiterative algorithms:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Barrier

Barrier

Barrier

MapAbuse:IterativeMapReduce• Onlyasubsetofdataneedscomputation:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Barrier

Barrier

Barrier

Page 6: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

6

MapAbuse:IterativeMapReduce• Systemisnotoptimizedforiteration:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Disk Penalty

Disk Penalty

Disk Penalty

Startup Penalty

Startup Penalty

Startup Penalty

6. Before

8. After

7. After

MLTasksBeyondData-Parallelism

Data-Parallel Graph-Parallel

CrossValidation

FeatureExtraction

MapReduce

ComputingSufficientStatistics

GraphicalModelsGibbsSampling

BeliefPropagationVariationalOpt.

Semi-SupervisedLearning

LabelPropagationCoEM

GraphAnalysisPageRank

TriangleCounting

CollaborativeFiltering

TensorFactorization

22

23

• Limited CPUPower• LimitedMemory• Limited Scalability

DistributedCloud

Challenges:- Distribute state- Keep data consistent- Provide fault tolerance

24

Scaleupcomputationalresources!

Page 7: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

7

TheGraphLabFramework

ConsistencyModel

GraphBasedDataRepresentation

UpdateFunctionsUserComputation

25

DataGraphDataisassociatedwithbothverticesandedges

VertexData:• Userprofile• Currentinterestsestimates

EdgeData:• Relationship(friend,classmate,relative)

Graph:• SocialNetwork

26

DistributedDataGraph

27

Partitionthegraphacrossmultiplemachines: • Ghostverticesmaintainadjacencystructureandreplicateremote data.

“ghost”vertices

28

DistributedDataGraph

Page 8: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

8

DistributedDataGraph

• CutefficientlyusingHPCGraphpartitioningtools(ParMetis/Scotch/…)

29

“ghost”vertices

TheGraphLabFramework

ConsistencyModel

GraphBasedDataRepresentation

UpdateFunctionsUserComputation

30

Pagerank(scope){//Updatethecurrentvertexdata

//RescheduleNeighborsifneededifvertex.PageRankchangesthenreschedule_all_neighbors;

}

vertex.PageRank = αForEach inPage:

vertex.PageRank += (1−α)× inPage.PageRank

UpdateFunctionAuser-definedprogram, appliedtoavertex;transformsdatainscope ofvertex

Selectively triggerscomputationatneighbors

Updatefunctionapplied(asynchronously)inparalleluntilconvergence

Manyschedulersavailabletoprioritizecomputation

31

DistributedScheduling

e

ih

ba

f g

kj

dca

h

f

g

j

cb

i

Eachmachinemaintainsaschedule overtheverticesitowns

32Distributed Consensus used to identify completion

Page 9: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

9

• Howmuchcancomputationoverlap?

EnsuringRace-FreeCode

33

TheGraphLabFramework

ConsistencyModel

GraphBasedDataRepresentation

UpdateFunctionsUserComputation

34

PageRankRevisited

35

Pagerank(scope){

…}

vertex.PageRank = αForEach inPage:

vertex.PageRank += (1−α)× inPage.PageRankvertex.PageRank = tmp

PageRankdataracesconfoundconvergence

36

Page 10: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

10

RacingPageRank:Bug

37

Pagerank(scope){

…}

vertex.PageRank = αForEach inPage:

vertex.PageRank += (1−α)× inPage.PageRankvertex.PageRank = tmp

RacingPageRank:BugFix

38

Pagerank(scope){

…}

vertex.PageRank = αForEach inPage:

vertex.PageRank += (1−α)× inPage.PageRankvertex.PageRank = tmp

tmp

tmp

Throughput!=Performance

NoConsistency

HigherThroughput(#updates/sec)

PotentiallySlowerConvergenceofML

39

Serializability

40

Foreveryparallelexecution,thereexistsasequentialexecutionofupdatefunctionswhichproducesthesameresult.

CPU 1

CPU 2

SingleCPU

Parallel

Sequential

time

Page 11: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

11

SerializabilityExample

41

Read

Write

Updatefunctionsone vertexapartcanberuninparallel.

EdgeConsistency

Overlapping regionsare only read.

Stronger/Weakerconsistencylevelsavailable

User-tunableconsistencylevelstradesoffparallelism&consistency

DistributedConsistency

• Solution1:ChromaticEngine– EdgeConsistencyviaGraphColoring

• Solution2:DistributedLocking

ChromaticDistributedEngine

Time

Execute tasks on all vertices of

color 0Execute tasks

on all vertices of color 0

Ghost Synchronization Completion + Barrier

Execute tasks on all vertices of

color 1

Execute tasks on all vertices of

color 1

Ghost Synchronization Completion + Barrier

43

MatrixFactorization• NetflixCollaborativeFiltering

– AlternatingLeastSquaresMatrixFactorizationModel:0.5millionnodes,99millionedges

Netflix

Users

Movies

DD

44

Users Movies

Page 12: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

12

NetflixCollaborativeFiltering

45

4 8 16 24 32 40 48 56 6412

4

6

8

10

12

14

16

#Nodes

Spee

dup

Ideald=100 (30M Cycles)d=50 (7.7M Cycles)

d=20 (2.1M Cycles)d=5 (1.0M Cycles)

Ideal

D=100

D=20

#machines4 8 16 24 32 40 48 56 64101

102

103

104

#NodesRuntim

e(s) Hadoop MPI

GraphLab

HadoopMPI

GraphLab

#machines

(D=20)

vs4m

achine

s

DistributedConsistency

• Solution1:ChromaticEngine– EdgeConsistencyviaGraphColoring– Requiresagraphcoloringtobeavailable– Frequentbarriersà inefficient whenonlysomeverticesactive

• Solution2:DistributedLocking

DistributedLockingEdgeConsistencycanbeguaranteedthroughlocking.

:RWLock

47

ConsistencyThroughLockingAcquirewrite-lockoncentervertex,read-lockonadjacent.

48

Performanceproblem:Acquiringalockfromaneighboringmachineincursalatencypenalty

Page 13: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

13

Simplelocking

lock scope 1

Process request 1

scope1acquiredupdate_function 1releasescope1

Process release 1

Time

49

PipelininghideslatencyGraphLab Idea:Hidelatencyusingpipelining

lock scope 1

Process request 1

scope1acquired

update_function 1releasescope1

Process release 1

lock scope 2

Time lock scope 3 Process request 2

Process request 3scope2acquiredscope3acquired

update_function 2releasescope2 50

DistributedConsistency

• Solution1:ChromaticEngine– EdgeConsistencyviaGraphColoring– Requiresagraphcoloringtobeavailable– Frequentbarriersà inefficient whenonlysomeverticesactive

• Solution2:DistributedLocking– ResidualBPon190K-vertex/560K-edgegraph,4machines– Nopipelining:472sec;withpipelining:10sec

Howtohandlemachinefailure?

• Whatwhenmachinesfail?How doweprovidefaulttolerance?

• Strawmanscheme:Synchronoussnapshotcheckpointing1. Stoptheworld2. Writeeachmachines’statetodisk

Page 14: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

14

SnapshotPerformance

0 50 100 1500

0.5

1

1.5

2

2.5x 108

time elapsed(s)

verti

ces

upda

ted

sync. snapshot

no snapshot

async. snapshot

53

No Snapshot

Snapshot

Oneslowmachine

Howcanwedobetter,leveragingGraphLab’sconsistencymechanisms?

Snapshottime

Slowmachine

Chandy-LamportcheckpointingStep1.Atomicallyoneinitiator(a)Turnsred,(b)Recordsitsownstate

(c)sendsmarker toneighbors

Step2.Onreceivingmarkernon-rednodeatomically:(a)Turnsred,(b)Recordsitsownstate,(c)sendsmarkersalongalloutgoingchannels First-in,first-

outchannelsbetweennodes

ImplementedwithinGraphLabasanUpdateFunction

Async.SnapshotPerformance

0 50 100 1500

0.5

1

1.5

2

2.5x 108

time elapsed(s)

verti

ces

upda

ted

sync. snapshot

no snapshot

async. snapshot

55

No Snapshot

Snapshot

Oneslowmachine

Nosystemperformancepenaltyincurredfromtheslowmachine!

Summary

• Twodifferentmethodsofachievingconsistency– GraphColoring– DistributedLockingwithpipelining

• Efficientimplementations• AsynchronousFTw/fine-grainedChandy-Lamport

56

Performance

Useability

Efficiency Scalability

Page 15: Big Data: Graph Processing · Big Data: Graph Processing COS 418: Distributed Systems Lecture 21 Kyle Jamieson [Content adapted from J. Gonzalez] Patient presents abdominal pain.

12/6/16

15

FridayPrecept:RoofnetperformanceMoreGraphProcessing

Mondaytopic:StreamingDataProcessing

57