Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng,...

26
Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading Group

Transcript of Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng,...

Page 1: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Map-Reduce for Machine Learning on MulticoreC. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006)

Shimin ChenBig Data Reading Group

Page 2: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Motivations Industry-wide shift to multicore No good framework for parallelize ML

algorithms

Goal: develop a general and exact technique for parallel programming of a large class of ML algorithms for multicore processors

Page 3: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Idea

Statistical Query Model

Summation Form

Map-Reduce

Page 4: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Outline Introduction Statistical Query Model and Summation

Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion

Page 5: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Valiant Model [Valiant’84]

x is the input y is a function of x that we want to

learn In Valiant model, the learning

algorithm uses randomly drawn examples <x, y> to learn the target function

Page 6: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Statistical Query Model [Kearns’98]

A restriction on Valiant model A learning algorithm uses some

aggregates over the examples, not the individual examples

More precisely, the learning algorithm interacts with a statistical query oracle Learning algorithm asks about f(x,y) Oracle returns the expectation that f(x,y) is

true

Page 7: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Summation Form

Aggregate over the data: Divide the data set into pieces Compute aggregates on each cores Combine all results at the end

Page 8: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Example: Linear Regression using Least SquaresModel:Goal:

Solution: Given m examples: (x1, y1), (x2, y2), …, (xm, ym) We write a matrix X with x1, …, xm as rows, and row vector Y=(y1, y2, …ym). Then the solution is

Parallel computation:

Cut to m/num_processor pieces

Page 9: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Outline Introduction Statistical Query Model and Summation

Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion

Page 10: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Lighter Weight Map-Reduce for Multicore

Page 11: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Outline Introduction Statistical Query Model and Summation

Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion

Page 12: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Locally Weighted Linear Regression (LWLR)

Mappers: one sets compute A, the other set compute b Two reducers for computing A and b Finally compute the solution

When wi==1, this is least squares.

Solve:

Page 13: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Naïve Bayes (NB) Goal: estimate P(xj=k|y=1) and P(xj=k|y=0) Computation: count the occurrence of (xj=k, y=1) and

(xj=k, y=0), count the occurrence of (y=1) and (y=0), the compute division

Mappers: count a subgroup of training samples Reducer: aggregate the intermediate counts, and

calculate the final result

Page 14: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Gaussian Discriminative Analysis (GDA) Goal: classification of x into classes of y

assuming each class is a Gaussian Mixture model with different means but same covariance.

Computation: Mappers: compute for a subset of training

samples Reducer: aggregate intermediate results

Page 15: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

K-means Computing the Euclidean distance between

sample vectors and centroids Recalculating the centroids Divide the computation to subgroups to be

handled by map-reduce

Page 16: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Expectation Maximization (EM) E-step computes some prob or counts per

training example M-step combines these values to update the

parameters Both of them can be parallelized using map-

reduce

Page 17: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Neural Network (NN) Back-propagation, 3-layer network

Input, middle, 2 output nodes Goal: compute the weights in the NN by back

propagation

Mapper: propagate its set of training data through the network, and propagate errors to calculate the partial gradient for weights

Reducer: sums the partial gradients and does a batch gradient descent to update the weights

Page 18: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Principal Components Analysis (PCA) Compute the principle eigenvectors of the covariance

matrix

Clearly, we can compute the summation form using map-reduce

Page 19: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Other Algorithms

Logistic Regression Independent Component Analysis Support Vector Machine

Page 20: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Time Complexity

Page 21: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Outline Introduction Statistical Query Model and Summation

Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion

Page 22: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Setup Compare map-reduce version and sequential

version 10 data sets Machines:

Dual-processor Pentium-III 700MHz, 1GB RAM 16-way Sun Enterprise 6000 (these are SMP, not multicore)

Page 23: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Dual-Processor SpeedUps

Page 24: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

2-16 processor speedups

More data in the paper

Page 25: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Multicore Simulator Results

A paragraph on this Basically, says that results are

better than multiprocessor machines. Could be because of less

communication cost

Page 26: Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Conclusion

Parallelize summation forms Use map-reduce on a single

machine