Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos...
-
Upload
sadie-malson -
Category
Documents
-
view
218 -
download
1
Transcript of Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos...
![Page 1: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/1.jpg)
Carnegie Mellon
Joseph GonzalezJoint work with
YuchengLow
AapoKyrola
DannyBickson
CarlosGuestrin
GuyBlelloch
JoeHellerstein
DavidO’Hallaron
A New Parallel Framework for Machine Learning
AlexSmola
![Page 2: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/2.jpg)
A
BC
D
Originates From
Is the driver
hostile?
C
Lives
![Page 3: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/3.jpg)
Patient presents
abdominal pain.
Diagnosis?
Patient ate
which contains
purchasedfrom
Also sold
to
Diagnoses
withE. Coli
infection
![Page 4: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/4.jpg)
4
Cameras Cooking
Shopper 1 Shopper 2
![Page 5: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/5.jpg)
The Hollywood Fiction…Mr. Finch develops software which:
• Runs in “consolidated” data-center with access to all government data
• Processes multi-modal data• Video Surveillance• Federal and Local Databases• Social Networks• …
• Uses Advanced Machine Learning • Identify connected patterns• Predict catastrophic events
![Page 6: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/6.jpg)
…how far is this from reality?
6
![Page 7: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/7.jpg)
Big Data is a reality
48 Hours a MinuteYouTube
24 Million Wikipedia Pages
750 MillionFacebook Users
6 Billion Flickr Photos
![Page 8: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/8.jpg)
Machine learning is a reality
8
MachineLearning
Understanding
Linear Regression
xxx
xxx
x
x
x
x
Raw Data
![Page 9: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/9.jpg)
Limited to Simplistic Models Fail to fully utilize the data
Substantial System Building EffortSystems evolve slowly and are costly
9
Big Data
+Large-Scale
Compute Clusters
+
We have mastered:
Simple Machine Learning
xxx
xxx
x
x
x
x
![Page 10: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/10.jpg)
Advanced Machine Learning
10
Raw DataMachineLearning
Understanding
Mubarak Obama Netanyahu Abbas
Deep Belief / NeuralNetworks
Markov Random Fields
Needs
Supports
Cooperate
Distrusts
Cameras Cooking
Data dependencies substantiallycomplicate parallelization
![Page 11: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/11.jpg)
Challenges of Learning at ScaleWide array of different parallel architectures:
New Challenges for Designing Machine Learning Algorithms: Race conditions and deadlocksManaging distributed model stateData-Locality and efficient inter-process coordination
New Challenges for Implementing Machine Learning Algorithms:Parallel debugging and profilingFault Tolerance
11
GPUs Multicore Clusters Mini Clouds Clouds
![Page 12: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/12.jpg)
Rich Structured Machine Learning Techniques Capable of fully modeling the data dependencies
Goal: Rapid System DevelopmentQuickly adapt to new data, priors, and objectives Scale with new hardware and system advances
12
Big Data
+Large-Scale
Compute Clusters
+
The goal of the GraphLab project …
AdvancedMachine Learning
![Page 13: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/13.jpg)
OutlineImportance of Large-Scale Machine Learning
Need to model data-dependencies
Existing Large-Scale Machine Learning AbstractionsNeed for a efficient graph structured abstraction
GraphLab Abstraction:Addresses data-dependences Enables the expression of efficient algorithms
Experimental ResultsGraphLab dramatically outperforms existing abstractions
Open Research Challenges
![Page 14: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/14.jpg)
How will wedesign and implement
parallel learning systems?
![Page 15: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/15.jpg)
Threads, Locks, & Messages
“low level parallel primitives”
We could use ….
![Page 16: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/16.jpg)
Threads, Locks, and MessagesML experts repeatedly solve the same parallel design challenges:
Implement and debug complex parallel systemTune for a specific parallel platform6 months later the conference paper contains:
“We implemented ______ in parallel.”
The resulting code:is difficult to maintainis difficult to extendcouples learning model to parallel implementation
16
Graduate
students
![Page 17: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/17.jpg)
Map-Reduce / HadoopBuild learning algorithms on-top of
high-level parallel abstractions
... a better answer:
![Page 18: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/18.jpg)
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce – Map Phase
18
Embarrassingly Parallel independent computation
12.9
42.3
21.3
25.8
No Communication needed
![Page 19: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/19.jpg)
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce – Map Phase
19
12.9
42.3
21.3
25.8
24.1
84.3
18.4
84.4
Image Features
![Page 20: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/20.jpg)
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce – Map Phase
20
Embarrassingly Parallel independent computation
12.9
42.3
21.3
25.8
17.5
67.5
14.9
34.3
24.1
84.3
18.4
84.4
![Page 21: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/21.jpg)
CPU 1 CPU 2
MapReduce – Reduce Phase
21
12.9
42.3
21.3
25.8
24.1
84.3
18.4
84.4
17.5
67.5
14.9
34.3
2226.
26
1726.
31
Image Features
Attractive Face Statistics
Ugly Face Statistics
U A A U U U A A U A U A
Attractive Faces Ugly Faces
![Page 22: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/22.jpg)
BeliefPropagation
Label Propagation
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!
22
Data-Parallel Graph-Parallel
Algorithm Tuning
Feature Extraction
Map Reduce
Basic Data Processing
Is there more toMachine Learning
?
![Page 23: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/23.jpg)
Concrete Example
Label Propagation
![Page 24: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/24.jpg)
Profile
Label Propagation AlgorithmSocial Arithmetic:
Recurrence Algorithm:
iterate until convergence
Parallelism:Compute all Likes[i] in parallel
Sue Ann
Carlos
Me
50% What I list on my profile40% Sue Ann Likes10% Carlos Like
40%
10%
50%
80% Cameras20% Biking
30% Cameras70% Biking
50% Cameras50% Biking
I Like:
+60% Cameras, 40% Biking
![Page 25: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/25.jpg)
Properties of Graph Parallel Algorithms
DependencyGraph
IterativeComputation
What I Like
What My Friends Like
Factored Computation
![Page 26: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/26.jpg)
?
BeliefPropagation
Label Propagation
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!
26
Data-Parallel Graph-Parallel
Map Reduce Map Reduce?Algorithm
TuningFeature
Extraction
Basic Data Processing
![Page 27: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/27.jpg)
Why not use Map-Reducefor
Graph Parallel Algorithms?
![Page 28: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/28.jpg)
Data Dependencies
Map-Reduce does not efficiently express data dependencies
User must code substantial data transformations Costly data replication
Inde
pend
ent D
ata
Row
s
![Page 29: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/29.jpg)
Slow
Proc
esso
rIterative Algorithms
Map-Reduce not efficiently express iterative algorithms:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Barr
ier
Barr
ier
Barr
ier
![Page 30: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/30.jpg)
MapAbuse: Iterative MapReduceOnly a subset of data needs computation:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Barr
ier
Barr
ier
Barr
ier
![Page 31: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/31.jpg)
MapAbuse: Iterative MapReduceSystem is not optimized for iteration:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Disk Pe
nalty
Disk Pe
nalty
Disk Pe
nalty
Sta
rtup
Pen
alty
Sta
rtup
Pen
alty
Sta
rtup
Pen
alty
![Page 32: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/32.jpg)
BeliefPropagation
SVM
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!
32
Data-Parallel Graph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
Map Reduce?Bulk Synchronous?
![Page 33: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/33.jpg)
Barrie
rBulk Synchronous Parallel (BSP)
Implementations: Pregel, Giraph, …
Compute Communicate
![Page 34: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/34.jpg)
Bulk synchronous computation can be highly inefficient.
34
Problem
![Page 35: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/35.jpg)
Problem with Bulk SynchronousExample Algorithm: If Red neighbor then turn Red
Bulk Synchronous Computation :Evaluate condition on all vertices for every phase
4 Phases each with 9 computations 36 Computations
Asynchronous Computation (Wave-front) :Evaluate condition only when neighbor changes
4 Phases each with 2 computations 8 Computations
Time 0 Time 1 Time 2 Time 3 Time 4
![Page 36: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/36.jpg)
36
Real-World Example: Loopy Belief Propagation
![Page 37: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/37.jpg)
Loopy Belief Propagation (Loopy BP)
• Iteratively estimate the “beliefs” about vertices– Read in messages– Updates marginal
estimate (belief)– Send updated
out messages• Repeat for all variables
until convergence
37
![Page 38: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/38.jpg)
Bulk Synchronous Loopy BP
• Often considered embarrassingly parallel – Associate processor
with each vertex– Receive all messages– Update all beliefs– Send all messages
• Proposed by:– Brunton et al. CRV’06– Mendiburu et al. GECC’07– Kang,et al. LDMTA’10– …
38
![Page 39: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/39.jpg)
Sequential Computational Structure
39
![Page 40: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/40.jpg)
Hidden Sequential Structure
40
![Page 41: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/41.jpg)
Hidden Sequential Structure
• Running Time:
EvidenceEvidence
Time for a singleparallel iteration
Number of Iterations
41
![Page 42: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/42.jpg)
Optimal Sequential Algorithm
Forward-Backward
Bulk Synchronous
2n2/p
p ≤ 2n
RunningTime
2n
Gap
p = 1
Optimal Parallel
n
p = 2 42
![Page 43: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/43.jpg)
43
The Splash Operation• Generalize the optimal chain algorithm:
to arbitrary cyclic graphs:
~
1) Grow a BFS Spanning tree with fixed size
2) Forward Pass computing all messages at each vertex
3) Backward Pass computing all messages at each vertex
![Page 44: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/44.jpg)
Data-Parallel Algorithms can be Inefficient
1 2 3 4 5 6 7 80
100020003000400050006000700080009000
Number of CPUs
Runti
me
in S
econ
ds
Optimized in Memory Bulk Synchronous
Asynchronous Splash BP
![Page 45: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/45.jpg)
Summary of Work Efficiency
Bulk Synchronous Model Not Work Efficient!Compute “messages” before they are readyIncreasing processors increase the overall workCosts CPU time and Energy!
How do we recover work efficiency?Respect sequential structure of computationCompute “message” as needed: asynchronously
![Page 46: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/46.jpg)
BeliefPropagationSVM
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
The Need for a New AbstractionMap-Reduce is not well suited for Graph-Parallelism
46
Data-Parallel Graph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
Bulk Synchronous
![Page 47: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/47.jpg)
OutlineImportance of Large-Scale Machine Learning
Need to model data-dependencies
Existing Large-Scale Machine Learning AbstractionsNeed for a efficient graph structured abstraction
GraphLab Abstraction:Addresses data-dependences Enables the expression of efficient algorithms
Experimental ResultsGraphLab dramatically outperforms existing abstractions
Open Research Challenges
![Page 48: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/48.jpg)
What is GraphLab?
![Page 49: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/49.jpg)
The GraphLab Abstraction
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
49
![Page 50: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/50.jpg)
Data Graph
50
A graph with arbitrary data (C++ Objects) associated with each vertex and edge.
Vertex Data:• User profile text• Current interests estimates
Edge Data:• Similarity weights
Graph:• Social Network
![Page 51: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/51.jpg)
Implementing the Data GraphMulticore Setting
In MemoryRelatively Straight Forward
vertex_data(vid) dataedge_data(vid,vid) dataneighbors(vid) vid_list
Challenge:Fast lookup, low overhead
Solution:Dense data-structuresFixed Vdata & Edata typesImmutable graph structure
Cluster Setting
In MemoryPartition Graph:
ParMETIS or Random Cuts
Cached Ghosting
Node 1 Node 2
A B
C D
A B
C D
A B
C D
![Page 52: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/52.jpg)
The GraphLab Abstraction
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
52
![Page 53: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/53.jpg)
label_prop(i, scope){ // Get Neighborhood data (Likes[i], Wij, Likes[j]) scope;
// Update the vertex data
// Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); }
Update Functions
53
An update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex
![Page 54: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/54.jpg)
The GraphLab Abstraction
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
54
![Page 55: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/55.jpg)
The Scheduler
55
CPU 1
CPU 2
The scheduler determines the order that vertices are updated.
e f g
kjih
dcba b
ih
a
i
b e f
j
c
Sch
edule
r
The process repeats until the scheduler is empty.
![Page 56: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/56.jpg)
Choosing a Schedule
GraphLab provides several different schedulersRound Robin: vertices are updated in a fixed orderFIFO: Vertices are updated in the order they are addedPriority: Vertices are updated in priority order
56
The choice of schedule affects the correctness and parallel performance of the algorithm
Obtain different algorithms by simply changing a flag! --scheduler=roundrobin --scheduler=fifo --scheduler=priority Optimal Splash BP
Algorithm
![Page 57: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/57.jpg)
The GraphLab Abstraction
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
58
![Page 58: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/58.jpg)
Ensuring Race-Free CodeHow much can computation overlap?
![Page 59: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/59.jpg)
Importance of ConsistencyMany algorithms require strict consistency or perform
significantly better under strict consistency.
Alternating Least Squares
![Page 60: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/60.jpg)
Importance of Consistency
Machine learning algorithms require “model debugging”
Build
Test
Debug
Tweak Model
![Page 61: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/61.jpg)
GraphLab Ensures Sequential Consistency
62
For each parallel execution, there exists a sequential execution of update functions which produces the same result.
CPU 1
CPU 2
SingleCPU
Parallel
Sequential
time
![Page 62: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/62.jpg)
CPU 1 CPU 2
Common Problem: Write-Write Race
63
Processors running adjacent update functions simultaneously modify shared data:
CPU1 writes: CPU2 writes:
Final Value
![Page 63: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/63.jpg)
Consistency Rules
64
Guaranteed sequential consistency for all update functions
Data
![Page 64: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/64.jpg)
Full Consistency
65
![Page 65: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/65.jpg)
Obtaining More Parallelism
66
![Page 66: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/66.jpg)
Edge Consistency
67
CPU 1 CPU 2
Safe
Read
![Page 67: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/67.jpg)
Consistency Through R/W LocksRead/Write locks:
Full Consistency
Edge Consistency
Write Write WriteCanonical Lock Ordering
Read Write ReadRead Write
![Page 68: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/68.jpg)
The GraphLab Abstraction
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
71
![Page 69: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/69.jpg)
The Code
API Implemented in C++:Pthreads, GCC Atomics, TCP/IP, MPI, in house RPC
Multicore APIMatlab/Java/Python supportAvailable under Apache 2.0 License
Cloud APIBuilt and tested on EC2No Fault Tolerance
http://graphlab.org
![Page 70: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/70.jpg)
Anatomy of a GraphLab Program:
1) Define C++ Update Function2) Build data graph using the C++ graph object3) Set engine parameters:
1) Scheduler type 2) Consistency model
4) Add initial vertices to the scheduler 5) Run the engine on the graph [Blocking C++ call]6) Final answer is stored in the graph
![Page 71: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/71.jpg)
Carnegie Mellon
Bayesian Tensor Factorization
Gibbs Sampling
Dynamic Block Gibbs Sampling
MatrixFactorization
Lasso
SVM
Belief Propagation
PageRank
CoEM
K-Means
SVD
LDA
…Many others…
![Page 72: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/72.jpg)
Startups Using GraphLab
Companies experimenting with Graphlab
Academic projects Exploring Graphlab
1600++ Unique Downloads Tracked(possibly many more from direct repository checkouts)
![Page 73: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/73.jpg)
GraphLab Matrix Factorization Toolkit
Used in ACM KDD Cup 2011 – track1 5th place out of more than 1000 participants.2 orders of magnitude faster than Mahout
Testimonials:“The Graphlab implementation is significantly faster than the Hadoop implementation … [GraphLab] is extremely efficient for networks with millions of nodes and billions of edges …” -- Akshay Bhat, Cornell
“The guys at GraphLab are crazy helpful and supportive … 78% of our value comes from motivation and brilliance of these guys.” -- Timmy Wilson, smarttypes.org
“I have been very impressed by Graphlab and your support/work on it.” -- Clive Cox, rumblelabs.com
![Page 74: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/74.jpg)
OutlineImportance of Large-Scale Machine Learning
Need to model data-dependencies
Existing Large-Scale Machine Learning AbstractionsNeed for a efficient graph structured abstraction
GraphLab Abstraction:Addresses data-dependences Enables the expression of efficient algorithms
Experimental ResultsGraphLab dramatically outperforms existing abstractions
Open Research Challenges
![Page 75: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/75.jpg)
Shared MemoryExperiments
Shared Memory Setting16 Core Workstation
78
![Page 76: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/76.jpg)
Loopy Belief Propagation
79
3D retinal image denoising
Data GraphUpdate Function:
Loopy BP Update EquationScheduler:
Approximate PriorityConsistency Model:
Edge Consistency
Vertices: 1 MillionEdges: 3 Million
![Page 77: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/77.jpg)
Loopy Belief Propagation
80
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
Number of CPUs
Spee
dup
Optimal
Bett
er
SplashBP
15.5x speedup
![Page 78: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/78.jpg)
CoEM (Rosie Jones, 2005)Named Entity Recognition Task
the dog
Australia
Catalina Island
<X> ran quickly
travelled to <X>
<X> is pleasant
Hadoop 95 Cores 7.5 hrs
Is “Dog” an animal?Is “Catalina” a place?
Vertices: 2 MillionEdges: 200 Million
![Page 79: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/79.jpg)
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
Number of CPUs
Spee
dup
Bett
er
Optimal
GraphLab CoEM
CoEM (Rosie Jones, 2005)
82
GraphLab 16 Cores 30 min
15x Faster!6x fewer CPUs!
Hadoop 95 Cores 7.5 hrs
![Page 80: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/80.jpg)
ExperimentsAmazon EC2
High-Performance Nodes
83
![Page 81: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/81.jpg)
Video Cosegmentation
Segments mean the same
Model: 10.5 million nodes, 31 million edges
Gaussian EM clustering + BP on 3D grid
![Page 82: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/82.jpg)
Video Coseg. Speedups
![Page 83: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/83.jpg)
Prefetching Data & Locks
![Page 84: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/84.jpg)
Matrix FactorizationNetflix Collaborative Filtering
Alternating Least Squares Matrix Factorization
Model: 0.5 million nodes, 99 million edges
Netflix
Users
Movies
d
![Page 85: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/85.jpg)
NetflixSpeedup Increasing size of the matrix factorization
![Page 86: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/86.jpg)
Distributed GraphLab
![Page 87: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/87.jpg)
The Cost of Hadoop
![Page 88: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/88.jpg)
OutlineImportance of Large-Scale Machine Learning
Need to model data-dependencies
Existing Large-Scale Machine Learning AbstractionsNeed for a efficient graph structured abstraction
GraphLab Abstraction:Addresses data-dependences Enables the expression of efficient algorithms
Experimental ResultsGraphLab dramatically outperforms existing abstractions
Open Research Challenges
![Page 89: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/89.jpg)
Storage of Large Data-GraphsFault tolerance to machine/network failure
Can I remove (re-task) a node or network resources without restarting dependent computation?
Relaxed transactional consistencyCan I eliminate locking and approximately recover when data corruption occurs?
Support rapid vertex and edge additionHow can I allow graphs to continuously grow while computation proceeds?
Graph partitioning for “natural graphs” How can I balance the computation while minimizing communication on a power-law graph?
![Page 90: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/90.jpg)
Event driven graph computationTrigger computation on data and structural modifications
Exploit small neighborhood effects
![Page 91: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/91.jpg)
SummaryImportance of Large-Scale Machine Learning
Need to model data-dependencies
Existing Large-Scale Machine Learning AbstractionsNeed for a efficient graph structured abstraction
GraphLab Abstraction:Addresses data-dependences Enables the expression of efficient algorithms
Experimental ResultsGraphLab dramatically outperforms existing abstractions
Open Research Challenges
![Page 92: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New.](https://reader030.fdocuments.in/reader030/viewer/2022020219/56649c805503460f949371ff/html5/thumbnails/92.jpg)
Carnegie Mellon
Checkout GraphLab
http://graphlab.org
95
Documentation… Code… Tutorials…
Questions & Comments