Distributed Machine Learning and Graph Processing with ...Distributed Machine Learning and Graph...
Transcript of Distributed Machine Learning and Graph Processing with ...Distributed Machine Learning and Graph...
Distributed Machine Learning and Graph Processing with Sparse Matrices
Shivaram Venkataraman*, Erik Bodzsar#
Indrajit Roy+, Alvin AuYoung+, Rob Schreiber+
*UC Berkeley, #U Chicago, +HP Labs
Big Data, Complex Algorithms
PageRank(Dominant eigenvector)
Recommendations(Matrix factorization)
Anomaly detection(Top-K eigenvalues)
User Importance(Vertex Centrality)
2
Machine learning + Graph algorithms
Large-Scale Processing Frameworks
Data-parallel frameworks – MapReduce/Dryad (2004)– Process each record in parallel
– Use case: Computing sufficient statistics, analytics queries
Graph-centric frameworks – Pregel/GraphLab (2010)– Process each vertex in parallel
– Use case: Graphical models
Array-based frameworks – MadLINQ (2012)– Process blocks of array in parallel
– Use case: Linear Algebra Operations
3
PageRank using Matrices
Power Method
Dominant eigenvector
4
Mp
M = web graph matrixp = PageRank vector
Simplified algorithm repeat { p = M*p }
Linear Algebra Operations on Sparse Matrices
p
Presto
Large-scale machine learning and
graph processing on sparse matrices
5
Extend R – make it scalable, distributed
Challenge 1 – Sparse Matrices
6
Challenge 1 – Sparse Matrices
1
10
100
1000
10000
1 11 21 31 41 51 61 71 81 91
Bloc
k de
nsit
y (n
orm
aliz
ed )
Block ID
LiveJournal Netflix ClueWeb-1B
1000x more data Computation imbalance
7
Challenge 2 – Data Sharing
Sharing data through pipes/network
Time-inefficient (sending copies)Space-inefficient (extra copies)
Process
copy of data
local copy
Process
data
Process
copy of data
Process
copy of data
Server 1
network
copynetwork
copy
Server 2
8
Sparse matrices
Communication overhead
Outline
• Motivation
• Programming model
• Design
• Applications and Results
9
darray
10
foreach f (x)
11
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))
P darray(dim=c(N,1),blocks=(s,1))
while(..){
foreach(i,1:len,
calculate(m=splits(M,i),
x=splits(P), p=splits(P,i)) {
p m*x
}
)}
Create Distributed Array
12
M p
P1
P2
PN/s
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))
P darray(dim=c(N,1),blocks=(s,1))
while(..){
foreach(i,1:len,
calculate(m=splits(M,i),
x=splits(P), p=splits(P,i)) {
p m*x
}
)}
Execute function in a cluster
Pass array partitions
13
p
P1
P2
PN/s
M
Presto Architecture
WorkerWorker
Master
R instanceR instance
DR
AM
R instance R instanceR instance
DR
AM
R instance
14
Repartitioning Matrices
Profile execution
Repartition
15
Partition if max(𝑡)
𝑚𝑒𝑑𝑖𝑎𝑛 (𝑡)> 𝛿
Maintaining Size Invariants
invariant(mat,vec, type=ROW)
16
Sharing Distributed Arrays
Versioned distributed arrays
17
Goal: Zero-copy sharing across cores
Immutable partitions Safe sharing
Data Sharing Challenges
1. Garbage collection
R object data partR object header
R instance R instance
2. Header conflicts
18
Overriding R’s allocator
page
Shared R object data part
Allocate process-local headers. Map data in shared memory
Local R object header
page boundary page boundary
19
Outline
• Motivation
• Programming model
• Design
• Applications and Results
20
demo
5 node cluster 8 cores per node
PageRank on 1.5B edge Twitter data
demodemo
demo
21
22
Applications Implemented in Presto
Application Algorithm Presto LOC
PageRank Eigenvector calculation 41
Triangle counting Top-K eigenvalues 121
Netflix recommendation Matrix factorization 130
Centrality measure Graph algorithm 132
k-path connectivity Graph algorithm 30
k-means Clustering 71
Sequence alignment Smith-Waterman 64
Fewer than 140 lines of code
23
Evaluation Overview
Evaluation Setup - 25 machine cluster
- Machine: 24 cores, 96GB RAM, 10Gbps network
Data-sharing benefits – 1.5B edge Twitter graph
Repartitioning analysis – 6B edge Web-graph
Faster than Spark and Hadoop using in-memory data
Collaborative Filtering using Netflix dataset
24
Data sharing benefits
25
4.45
2.49
1.63
0.71
0.7
0.72
10
20
40
CORE
S
4.38
2.21
1.22
1.22
2.12
4.16
10
20
40
CORE
S
Compute TransferNo sharing
Sharing
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Split
siz
e (G
B)
Iteration count
Repartitioning Progress
26
0 20 40 60 80 100 120 140 160
Wor
kers
Transfer Compute
0 20 40 60 80 100 120 140 160
Wor
kers
Repartitioning benefits
No Repartition
Repartition
27
Related Work
Large scale data processing frameworks
– MapReduce, Dryad, Spark, GraphLab
Matrix Computations – Ricardo, MadLINQ
HPC systems – ARPACK, Combinatorial BLAS
Multi-core R packages – doMC, snow, Rmpi
28
Presto
29
Co-partitioning
matrices
Locality-based
scheduling
Caching
partitions
Conclusion
Presto: Large scale array-based framework extends R
Challenges with Sparse matrices
Repartitioning, sharing versioned arrays
30
Backup Slides
31
Netflix Collaborative Filtering
32
755.112
380.985
256.1
234.236
202.725
155.299
0 200 400 600 800
8
16
24
32
40
48
Time (seconds)
Num
ber
of c
ores
Load t(R)×R R×t(R)×R
Repartitioning benefits
0
50
100
150
200
250
300
350
400
2000
3000
4000
5000
6000
7000
8000
0 5 10 15 20
Cum
ula
tive
par
titi
onin
g ti
me
(s)
Tim
e to
con
verg
ence
(s)
Number of Repartitions
Convergence Time
Time spent partitioning
33