Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez...
-
Upload
domenic-holt -
Category
Documents
-
view
221 -
download
1
Transcript of Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez...
![Page 1: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/1.jpg)
Carnegie Mellon
Machine Learning in the Cloud
YuchengLow
AapoKyrola
DannyBickson
JoeyGonzalez
Carlos GuestrinJoe Hellerstein
David O’Hallaron
![Page 2: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/2.jpg)
Machine Learning in the Real World
24 Hours a MinuteYouTube
13 Million Wikipedia Pages
500 MillionFacebook Users
3.6 Billion Flickr Photos
![Page 3: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/3.jpg)
Parallelism is DifficultWide array of different parallel architectures:
Different challenges for each architecture
4
GPUs Multicore Clusters Clouds Supercomputers
High Level Abstractions to make things easier.
![Page 4: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/4.jpg)
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce – Map Phase
Embarrassingly Parallel independent computation
12.9
42.3
21.3
25.8
No Communication needed
![Page 5: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/5.jpg)
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce – Map Phase
Embarrassingly Parallel independent computation
12.9
42.3
21.3
25.8
24.1
84.3
18.4
84.4
No Communication needed
![Page 6: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/6.jpg)
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce – Map Phase
Embarrassingly Parallel independent computation
12.9
42.3
21.3
25.8
17.5
67.5
14.9
34.3
24.1
84.3
18.4
84.4
No Communication needed
![Page 7: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/7.jpg)
CPU 1 CPU 2
MapReduce – Reduce Phase
12.9
42.3
21.3
25.8
24.1
84.3
18.4
84.4
17.5
67.5
14.9
34.3
2226.
26
1726.
31
Fold/Aggregation
![Page 8: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/8.jpg)
MapReduce and MLExcellent for large data-parallel tasks!
9
Data-Parallel Complex Parallel Structure
Is there more toMachine Learning
?Cross
ValidationFeature
Extraction
Map Reduce
Computing SufficientStatistics
![Page 9: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/9.jpg)
Slow
Proc
esso
rIterative Algorithms?
We can implement iterative algorithms in MapReduce:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Barr
ier
Barr
ier
Barr
ier
![Page 10: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/10.jpg)
Iterative MapReduceSystem is not optimized for iteration:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Disk Pe
nalty
Disk Pe
nalty
Disk Pe
nalty
Sta
rtup
Pen
alty
Sta
rtup
Pen
alty
Sta
rtup
Pen
alty
![Page 11: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/11.jpg)
Iterative MapReduceOnly a subset of data needs computation:
(multi-phase iteration)
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Barr
ier
Barr
ier
Barr
ier
![Page 12: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/12.jpg)
MapReduce and MLExcellent for large data-parallel tasks!
13
Data-Parallel Complex Parallel Structure
Is there more toMachine Learning
?Cross
ValidationFeature
Extraction
Map Reduce
Computing SufficientStatistics
![Page 13: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/13.jpg)
Structured Problems
14
Interdependent Computation:
Not Map-Reducible
Example Problem: Will I be successful in research?
May not be able to safely update neighboring nodes. [e.g., Gibbs Sampling]
Success depends on the success of others.
![Page 14: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/14.jpg)
Space of Problems
15
Asynchronous Iterative Computation
Repeated iterations over local kernel computations
Sparse Computation Dependencies
Can be decomposed into local “computation-kernels”
![Page 15: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/15.jpg)
GraphLab
Data-Parallel Structured Iterative Parallel
Parallel Computing and MLNot all algorithms are efficiently data parallel
16
CrossValidation
Feature Extraction Belief
Propagation
SVM
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
LearningGraphicalModels
Lasso
Map Reduce
Computing SufficientStatistics
Sampling
?
![Page 16: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/16.jpg)
GraphLab GoalsDesigned for ML needs
Express data dependenciesIterative
Simplifies the design of parallel programs:
Abstract away hardware issuesAddresses multiple hardware architectures
MulticoreDistributedGPU and others
![Page 17: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/17.jpg)
GraphLab GoalsSimple Models
ComplexModels
SmallData
LargeData
Data-Parallel
Now
Goal
![Page 18: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/18.jpg)
GraphLab GoalsSimple Models
ComplexModels
SmallData
LargeData
Data-Parallel
Now
GraphLab
![Page 19: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/19.jpg)
Carnegie Mellon
GraphLab
A Domain-Specific Abstraction for Machine Learning
![Page 20: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/20.jpg)
Everything on a GraphA Graph with data associated with every vertex and edge
:Data
![Page 21: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/21.jpg)
Update FunctionsUpdate Functions: operations applied on vertex transform data in scope of vertex
![Page 22: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/22.jpg)
Update FunctionsUpdate Function can Schedule the computation of anyother update function:
Scheduled computation is guaranteed to execute eventually.
- FIFO Scheduling - Prioritized Scheduling - Randomized Etc.
![Page 23: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/23.jpg)
Example: Page Rank
multiply adjacent pagerank values with edge weights to get current vertex’s pagerank
Graph = WWW
Update Function:
“Prioritized” PageRank Computation? Skip converged vertices.
![Page 24: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/24.jpg)
Example: K-Means Clustering
Cluster Update:compute average of data connected on a “marked” edge.
Data Update:Pick the closest cluster and mark the edge. Unmark remaining edges.
(Fully Connected?)Bipartite Graph
Update Function:
Data
Clusters
![Page 25: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/25.jpg)
Example: MRF Sampling
- Read samples on adjacent vertices - Read edge potentials - Compute new sample for current vertex
Graph = MRF
Update Function:
![Page 26: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/26.jpg)
Not Message Passing!
Graph is a data-structure.Update Functions perform parallel modifications to the data-structure.
![Page 27: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/27.jpg)
Safety
If adjacent update functions occur simultaneously?
![Page 28: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/28.jpg)
Safety
If adjacent update functions occur simultaneously?
![Page 29: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/29.jpg)
Importance of Consistency
Permit Races? “Best-effort” computation?
ML resilient to soft-optimization?
True for some algorithms.
Not true for many. May work empirically on some datasets; may fail on others.
![Page 30: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/30.jpg)
Importance of Consistency
Many algorithms require strict consistency, or performs significantly better under strict consistency.
Alternating Least Squares
![Page 31: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/31.jpg)
Importance of Consistency
Fast ML Algorithm development cycle:
Build
Test
Debug
Tweak Model
Necessary for framework to behave predictably and consistently and avoid problems caused by non-determinism.Is the execution wrong? Or is the model wrong?
![Page 32: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/32.jpg)
Sequential ConsistencyGraphLab guarantees sequential
consistency parallel execution, sequential execution of update functions which produce same result
CPU 1
CPU 2
CPU 1
Parallel
Sequential
time
![Page 33: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/33.jpg)
Sequential ConsistencyGraphLab guarantees sequential
consistency parallel execution, sequential execution of update functions which produce same result
Formalization of the intuitive concept of a “correct program”.
- Computation does not read outdated data from the past- Computation does not read results of computation that occurs in the future.Primary Property of GraphLab
![Page 34: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/34.jpg)
Global Information
What if we need global information?
Sum of all the vertices?
Algorithm Parameters?
Sufficient Statistics?
![Page 35: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/35.jpg)
Shared VariablesGlobal aggregation through Sync OperationA global parallel reduction over the graph data.Synced variables recomputed at defined intervalsSync computation is Sequentially Consistent
Permits correct interleaving of Syncs and Updates
Sync: Sum of Vertex Values
Sync: Loglikelihood
![Page 36: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/36.jpg)
Sequential ConsistencyGraphLab guarantees sequential
consistency parallel execution, sequential execution of update functions and Syncs which produce same result
CPU 1
CPU 2
CPU 1
Parallel
Sequential
time
![Page 37: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/37.jpg)
Carnegie Mellon
GraphLab in the Cloud
![Page 38: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/38.jpg)
Moving towards the cloud…
Purchasing and maintaining computers is very expensive
Most computing resources seldomly used
Only for deadlines…
Buy time, access hundreds or thousands of processors
Only pay for needed resources
![Page 39: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/39.jpg)
Distributed GL ImplementationMixed Multi-threaded / Distributed Implementation. (Each machine runs only one instance)Requires all data to be in memory. Move computation to data.
MPI for management + TCP/IP for communicationAsynchronous C++ RPC Layer
Ran on 64 EC2 HPC Nodes = 512 Processors
Skip Implementation
![Page 40: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/40.jpg)
Carnegie Mellon
Underlying Network
RPC Controller RPC Controller RPC Controller RPC Controller
Distributed Graph
Distributed Locks
Execution EngineExecution Threads
Cache Coherent Distributed K-V Store
Shared Data
Distributed Graph
Distributed Locks
Execution Engine
Execution Threads
Cache Coherent Distributed K-V Store
Shared Data
Distributed Graph
Distributed Locks
Execution Engine
Execution Threads
Cache Coherent Distributed K-V Store
Shared Data
Distributed Graph
Distributed Locks
Execution Engine
Execution Threads
Cache Coherent Distributed K-V Store
Shared Data
Distributed Graph
Distributed Locks
Execution Engine
Execution Threads
Cache Coherent Distributed K-V Store
Shared Data
![Page 41: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/41.jpg)
Carnegie Mellon
GraphLab RPC
![Page 42: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/42.jpg)
Write distributed programs easily
Asynchronous communicationMultithreaded supportFastScalable
Easy To Use
(Every machine runs the same binary)
![Page 43: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/43.jpg)
Carnegie Mellon
I
C++
![Page 44: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/44.jpg)
FeaturesEasy RPC capabilities:
rpc.remote_call([target_machine ID], printf, “%s %d %d %d\n”, “hello world”, 1, 2, 3);
Requests (call with return value)
vec = rpc.remote_request( [target_machine ID],
sort_vector, vec);
std::vector<int>& sort_vector(std::vector<int> &v) { std::sort(v.begin(), v.end()); return v;}
One way calls
![Page 45: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/45.jpg)
Features
Object Instance Context
MPI-like primitivesdc.barrier()dc.gather(...)dc.send_to([target machine], [arbitrary object])dc.recv_from([source machine], [arbitrary object ref])
K-V Object K-V Object K-V Object K-V Object
RPC Controller RPC Controller RPC Controller RPC Controller
MPI-Like Safety
![Page 46: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/46.jpg)
Request Latency
16 128 1024 102400
50
100
150
200
250
300
350
GraphLab RPC
MemCached
Value Length (Bytes)
Late
ncy
(us)
Ping RTT = 90us
![Page 47: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/47.jpg)
One-Way Call Rate
16 128 1024 102400
100200300400500600700800900
1000
GraphLab RPCICE
Value Length (Bytes)
Mbp
s
1Gbps physical peak
![Page 48: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/48.jpg)
Serialization Performance
ICE RPC Buffered RPC Unbuffered0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Receive
Issue
Seco
nds
(s)
100,000 XOne way call of vector of 10 X {"hello", 3.14, 100}
![Page 49: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/49.jpg)
Distributed Computing Challenges
Q1: How do we efficiently distribute the state ?
- Potentially varying #machines
Q2: How do we ensure sequential consistency ?
Keeping in mind:Limited BandwidthHigh LatencyPerformance
![Page 50: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/50.jpg)
Carnegie Mellon
Distributed Graph
![Page 51: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/51.jpg)
Two-stage Partitioning
Initial Overpartitioning of the Graph
![Page 52: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/52.jpg)
Two-stage Partitioning
Initial Overpartitioning of the GraphGenerate Atom Graph
![Page 53: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/53.jpg)
Two-stage Partitioning
Initial Overpartitioning of the GraphGenerate Atom Graph
![Page 54: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/54.jpg)
Two-stage Partitioning
Initial Overpartitioning of the GraphGenerate Atom GraphRepartition as needed
![Page 55: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/55.jpg)
Two-stage Partitioning
Initial Overpartitioning of the GraphGenerate Atom GraphRepartition as needed
![Page 56: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/56.jpg)
Ghosting
Ghost vertices/edges act as cache for remote data.Coherency maintained using versioning. Decrease bandwidth utilization.
Ghost vertices are a copy of neighboring vertices which are on remote machines.
![Page 57: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/57.jpg)
Carnegie Mellon
Distributed Engine
![Page 58: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/58.jpg)
Distributed Engine
Sequential Consistency can be guaranteed through distributed locking. Direct analogue to shared memory impl.
To improve performance: User provides some “expert knowledge” about the properties of the update function.
![Page 59: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/59.jpg)
Full Consistency
User says: update function modifies all data in scope.
Limited opportunities for parallelism.
Acquire write-lock on all vertices.
![Page 60: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/60.jpg)
Edge Consistency
User: update function only reads from adjacent vertices.
More opportunities for parallelism.
Acquire write-lock on center vertex, read-lock on adjacent.
![Page 61: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/61.jpg)
Vertex Consistency
User: update function does not touch edges nor adjacent vertices
Maximum opportunities for parallelism.
Acquire write-lock on current vertex.
![Page 62: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/62.jpg)
Performance Enhancements
Latency Hiding:- “pipelining” of >> #CPU update
function calls. (about 1K deep pipeline)- Hides the latency of lock acquisition
and cache synchronization
Lock Strength Reduction:- A trick where number of locks can be
decreased while still providing same guarantees
![Page 63: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/63.jpg)
Video Cosegmentation
Segments mean the same
Model: 10.5 million nodes, 31 million edges
Gaussian EM clustering + BP on 3D grid
![Page 64: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/64.jpg)
Speedups
![Page 65: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/65.jpg)
Video Segmentation
![Page 66: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/66.jpg)
Video Segmentation
![Page 67: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/67.jpg)
Chromatic Distributed Engine
Observation:Scheduling using vertex colorings can
be used to automatically satisfy consistency.
Locking overhead is too high in high-degree models. Can we satisfy sequential consistency in a simpler way?
![Page 68: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/68.jpg)
Example: Edge Consistency
(distance 1) vertex coloring
Update functions can be executed on all vertices of the same color in parallel.
![Page 69: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/69.jpg)
Example: Full Consistency
(distance 2) vertex coloring
Update functions can be executed on all vertices of the same color in parallel.
![Page 70: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/70.jpg)
Example: Vertex Consistency
(distance 0) vertex coloring
Update functions can be executed on all vertices of the same color in parallel.
![Page 71: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/71.jpg)
Chromatic Distributed Engine
Tim
e
Execute tasks on all vertices of
color 0
Execute tasks on all vertices of
color 0
Data Synchronization Completion + Barrier
Execute tasks on all vertices of
color 1
Execute tasks on all vertices of
color 1
Data Synchronization Completion + Barrier
![Page 72: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/72.jpg)
ExperimentsNetflix Collaborative Filtering
Alternating Least Squares Matrix Factorization
Model: 0.5 million nodes, 99 million edges
Netflix
Users
Movies
d
![Page 73: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/73.jpg)
NetflixSpeedup Increasing size of the matrix
factorization
![Page 74: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/74.jpg)
Netflix
![Page 75: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/75.jpg)
Netflix
![Page 76: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/76.jpg)
ExperimentsNamed Entity Recognition
(part of Tom Mitchell’s NELL project)CoEM Algorithm
Web Crawl
Model: 2 million nodes, 200 million edges
Graph is rather dense. A small number of vertices connect to almost all the vertices.
![Page 77: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/77.jpg)
Named Entity Recognition (CoEM)
85
![Page 78: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/78.jpg)
Named Entity Recognition (CoEM)
86
Bandwidth Bound
![Page 79: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/79.jpg)
Named Entity Recognition (CoEM)
87
![Page 80: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/80.jpg)
Future WorkDistributed GraphLab
Fault Tolerance Spot Instances CheaperGraph using off-memory store (disk/SSD)GraphLab as a databaseSelf-optimized partitioningFast data graph construction primitives
GPU GraphLab ?Supercomputer GraphLab ?
![Page 81: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/81.jpg)
Carnegie Mellon
Is GraphLab the Answer to (Life the
Universe and Everything?)
Probably Not.
![Page 82: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/82.jpg)
Carnegie Mellon
graphlab.ml.cmu.edu Parallel/Distributed Implementation
LGPL (highly probable switch to MPL in a few weeks)
GraphLab
bickson.blogspot.comVery fast matrix factorization implementations, other examples, installation, comparisons, etc
Danny Bickson Marketing Agency
Microsoft Safe
![Page 83: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/83.jpg)
Carnegie Mellon
Questions?
Bayesian Tensor Factorization
Gibbs Sampling
Dynamic Block Gibbs Sampling
MatrixFactorization
Lasso
SVM
Belief Propagation
PageRank
CoEM
Many Others…
SVD
![Page 84: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/84.jpg)
Video CosegmentationNaïve Idea: Treat patches independently
Use Gaussian EM clustering (on image features) E step: Predict membership of each patch given cluster centers M step: Compute cluster centers given memberships of each patchDoes not take relationships among patches into account!
![Page 85: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/85.jpg)
Video CosegmentationBetter Idea: Connect the patches using an MRF. Set edge potentials so that adjacent (spatially and temporally) patches prefer to be of the same cluster.
Gaussian EM clustering with a twist: E step: Make unary potentials for each patch using cluster centers. Predict membership of each patch using BP M step: Compute cluster centers given memberships of each patch
D. Batra, et al. iCoseg: Interactive co-segmentation with intelligent scribble guidance. CVPR 2010.
![Page 86: Carnegie Mellon Machine Learning in the Cloud Yucheng Low Aapo Kyrola Danny Bickson Joey Gonzalez Carlos Guestrin Joe Hellerstein David O’Hallaron.](https://reader035.fdocuments.in/reader035/viewer/2022062422/56649f055503460f94c19585/html5/thumbnails/86.jpg)
Distributed Memory Programming APIs
• MPI• Global Arrays• GASnet• ARMCI• etc.
…do not make it easy…
Synchronous computation.Insufficient primitives for multi-threaded use.Also, not exactly easy to use…
If all your data is a n-D array
Direct remote pointer access. Severe limitations depending on system architecture.