Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

44
Distributed Machine Learning and Graph Processing with Sparse Matrices Speaker: LIN Qian http://www.comp.nus.edu.sg/ ~linqian/

description

 

Transcript of Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Page 1: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Distributed Machine Learning and Graph Processing with Sparse Matrices

Speaker: LIN Qianhttp://www.comp.nus.edu.sg/~linqian/

Page 2: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Big Data, Complex Algorithms

PageRank(Dominant eigenvector)

Recommendations(Matrix factorization)

Anomaly detection(Top-K eigenvalues)

User Importance(Vertex Centrality)

Machine learning + Graph algorithms

Page 3: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Large-Scale Processing Frameworks

Data-parallel frameworks – MapReduce/Dryad (2004)– Process each record in parallel – Use case: Computing sufficient statistics, analytics queries

Graph-centric frameworks – Pregel/GraphLab (2010)– Process each vertex in parallel– Use case: Graphical models

Array-based frameworks – MadLINQ (2012)– Process blocks of array in parallel– Use case: Linear Algebra Operations

Page 4: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

PageRank using Matrices

Power Method Dominant

eigenvector

Mp

M = web graph matrixp = PageRank vector

Simplified algorithm repeat { p = M*p }

Linear Algebra Operations on Sparse Matrices

p

Page 5: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Statistical software

moderately-sized datasetssingle server, entirely in memory

Page 6: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Work-around for massive dataset

Vertical scalabilitySampling

Page 7: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

MapReduce

Limited to aggregation processing

Page 8: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Data analytics

Deep vs. Scalable

Statistical software(R, MATLAB, SPASS, SAS) MapReduce

Page 9: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Improvement ways

1. Statistical sw. += large-scale data mgnt2. MapReduce += statistical functionality3. Combining both existing technologies

Page 10: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Parallel MATLAB, pR

Page 11: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

HAMA, SciHadoop

Page 12: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

MadLINQ [EuroSys’12]

Linear algebra platform on DryadNot efficient for sparse matrix comp.

Page 13: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Ricardo [SIGMOD’10]

But ends up inheriting the inefficiencies of the MapReduce

interface

R Hadoopaggregation-processing queries

aggregated data

Page 14: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Array-basedSingle-threaded

Limited support for scaling

Page 15: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Challenge 1: Sparse Matrices

Page 16: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Challenge 1 – Sparse Matrices

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 971

10

100

1000

10000

LiveJournal Netflix ClueWeb-1B

Block ID

Blo

ck d

ensi

ty (n

orm

aliz

ed )

1000x more data Computation imbalance

Page 17: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Challenge 2 – Data Sharing

Sharing data through pipes/network

Time-inefficient (sending copies)Space-inefficient (extra copies)

Process

copy of data

local copy

Process

data

Process

copy of data

Process

copy of data

Server 1

network copy

network copy

Server 2

Sparse matrices Communication overhead

Page 18: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Extend R – make it scalable, distributedLarge-scale machine learning and

graph processing on sparse matrices

Page 19: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Presto architecture

Page 20: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Presto architecture

WorkerWorker

Master

R instanceR instance

DRA

M

R instance R instanceR instance

DRA

M

R instance

Page 21: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Distributed array (darray)PartitionedSharedDynamic

Page 22: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

foreach

Parallel execution of the loop body

f (x)

Barrier

Call Update to publish changes

Page 23: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

PageRank Using Presto

M darray(dim=c(N,N),blocks=(s,N))P darray(dim=c(N,1),blocks=(s,1))while(..){ foreach(i,1:len,

calculate(m=splits(M,i), x=splits(P), p=splits(P,i)) { p m*x

} )}

Create Distributed Array

M p

P1

P2

PN/s

Page 24: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

PageRank Using Presto

M darray(dim=c(N,N),blocks=(s,N))P darray(dim=c(N,1),blocks=(s,1))while(..){ foreach(i,1:len,

calculate(m=splits(M,i), x=splits(P), p=splits(P,i)) { p m*x

} )}

Execute function in a cluster

Pass array partitions

p

P1

P2

PN/s

M

Page 25: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Dynamic repartitioning

To address load imbalanceCorrectness

Page 26: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Repartitioning Matrices

Profile execution

Repartition

Partition if

Page 27: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Invariants

compatibility in array sizes

Page 28: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Maintaining Size Invariants

invariant(mat, vec, type=ROW)

Page 29: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Data sharing for multi-core

Zero-copy sharing across cores

Page 30: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Data sharing challenges

1. Garbage collection2. Header conflict

R object data partR object header

R instance R instanceGarbage collect

ReadWrite Write

Page 31: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Overriding R’s allocator

Allocate process-local headersMap data in shared memory

page

Shared R object data partLocal R object header

page boundary page boundary

Page 32: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Immutable partitions Safe sharing

Only share read-only data

Page 33: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Versioning arrays

To ensure correctness when arrays are shared across machines

Page 34: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Fault tolerance

Master: primary-backup replicationWorker: heartbeat-based failure detection

Page 35: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Presto applications

Presto doubles LOC w.r.t. purely programming in R.

Page 36: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Evaluation

Faster than Spark and Hadoop using in-memory data

Page 37: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Page 38: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Multi-core support benefits

Page 39: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Data sharing benefits

10

20

40

4.45

2.49

1.63

0.71

0.7

0.72

Core

S

10

20

40

4.38

2.21

1.22

1.22

2.12

4.16

Compute TransferCO

RES

No sharing

Sharing

Page 40: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Repartitioning benefits

0 20 40 60 80 100 120 140 160

Transfer ComputeW

orke

rs

0 20 40 60 80 100 120 140 160

Wor

kers

No Repartition

Repartition

Page 41: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Repartitioning benefits

0 2 4 6 8 10 12 14 16 18 202000

3000

4000

5000

6000

7000

8000

0

50

100

150

200

250

300

350

400Convergence TimeTime spent partitioning

Number of Repartitions

Tim

e to

con

verg

ence

(s)

Cum

ulati

ve p

artiti

onin

g tim

e (s

)

Page 42: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Limitations

1. In-memory computation2. One writer per partition

3. Array-based programming

Page 43: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

• Presto: Large scale array-based framework extends R

• Challenges with Sparse matrices• Repartitioning, sharing versioned arrays

Conclusion

Page 44: Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

IMDb Rating: 8.5Release Date: 27 June 2008Director: Doug SweetlandStudio: PixarRuntime: 5 min

Brief: A stage magician’s rabbit gets into a magical onstage brawl against his neglectful guardian with two magic hats.