Descent with Distributed Stochastic Gradient Large-Scale...
Transcript of Descent with Distributed Stochastic Gradient Large-Scale...
Large-Scale Matrix Factorizationwith Distributed Stochastic Gradient
DescentPresented by Joe, Jiefeng and Xiaolu
Matrix FactorizationStochastic Gradient Descent
Distributed SGD with MapReduceExperiments
Summary
Matrix Factorization
Original Image Actual Image
Reconstructed Image
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Collaborative Filtering -- Recommendation Systems
Problem Set of users Set of items (movies, books, jokes, products, stories, … Feedback (ratings, purchase, click-through, tags, ...
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Collaborative Filtering -- Recommendation Systems
Problem Set of users Set of items (movies, books, jokes, products, stories, … Feedback (ratings, purchase, click-through, tags, …
Predict additional items a user may like Assumptions: Similar feedback Similar taste
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Collaborative Filtering -- Recommendation Systems
Problem Set of users Set of items (movies, books, jokes, products, stories, … Feedback (ratings, purchase, click-through, tags, …
Predict additional items a user may like Assumptions: Similar feedback Similar taste
Example:
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Collaborative Filtering -- Recommendation Systems
Problem Set of users Set of items (movies, books, jokes, products, stories, … Feedback (ratings, purchase, click-through, tags, …
Predict additional items a user may like Assumptions: Similar feedback Similar taste
Example:
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
traditional Matrix Factorization
Given an m*n matrix V and a rank r, find an m*r matrix W and an r*n matrix H such that V = W H.
The goal here is to obtain a low-rank approximation V ~ W H.
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Generalized Matrix Factorization
A general machine learning problem Recommender systems, text indexing, face recognition, ...
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Generalized Matrix Factorization
A general machine learning problem Recommender systems, text indexing, face recognition, …Training data V: m * n input matrix (e.g., rating matrix) Z: training set of indexes in V (e.g., subset of known ratings)
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Generalized Matrix Factorization
A general machine learning problem Recommender systems, text indexing, face recognition, …Training data V: m * n input matrix (e.g., rating matrix) Z: training set of indexes in V (e.g., subset of known ratings)Parameter space W: row factors (e.g., m * r latent customer factors) H: column factors (e.g., r * n latent movie factors)
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Generalized Matrix Factorization
A general machine learning problem Recommender systems, text indexing, face recognition, …Training data V: m * n input matrix (e.g., rating matrix) Z: training set of indexes in V (e.g., subset of known ratings)Parameter space W: row factors (e.g., m * r latent customer factors) H: column factors (e.g., r * n latent movie factors)Model Lij(Wi*, H*j): loss at element (i, j) Includes prediction error, regularization, auxiliary information, … Constraints (e.g., non-negativity)
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Generalized Matrix FactorizationA general machine learning problem Recommender systems, text indexing, face recognition, …Training data V: m * n input matrix (e.g., rating matrix) Z: training set of indexes in V (e.g., subset of known ratings)Parameter space W: row factors (e.g., m * r latent customer factors) H: column factors (e.g., r * n latent movie factors)Model Lij(Wi*, H*j): loss at element (i, j) Includes prediction error, regularization, auxiliary information, … Constraints (e.g., non-negativity)Find best model
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Successful Applications Movie recommendation (Netflix) >20M users, >20k movies, 4B ratings (projected) 60GB data, 15GB model (projected) Collaborative filtering
Website recommendation (Microsoft, WWW10) 51M users, 15M URLs, 1.2B clicks 17.8GB data, 161GB metadata, 49GB model Gaussian non-negative matrix factorization
News personalization (Google, WWW07) Millions of users, millions of stories, ? clicks Probabilistic latent semantic indexing
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Successful Applications Movie recommendation (Netflix) >20M users, >20k movies, 4B ratings (projected) 60GB data, 15GB model (projected) Collaborative filtering Website recommendation (Microsoft, WWW10) 51M users, 15M URLs, 1.2B clicks 17.8GB data, 161GB metadata, 49GB model Gaussian non-negative matrix factorization News personalization (Google, WWW07) Millions of users, millions of stories, ? clicks Probabilistic latent semantic indexing
How to handle such massive scale? Big data Large models Expensive, iterative computations
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Matrix FactorizationStochastic Gradient Descent
Distributed SGD with MapReduceExperiments
Summary
Stochastic Gradient Descent
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Stochastic Gradient Descent
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Stochastic Gradient Descent
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Stochastic Gradient Descent
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Stochastic Gradient Descent
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Stochastic Gradient Descent
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Stochastic Gradient Descent for Matrix Factorization
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Stochastic Gradient Descent for Matrix Factorization
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Stochastic Gradient Descent for Matrix Factorization
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Stochastic Gradient Descent for Matrix Factorization
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Stochastic Gradient Descent for Matrix Factorization
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
A Small Problem for Distributed Computation
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
A Small Problem for Distributed Computation
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
A Small Problem for Distributed Computation, however
Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012
Matrix FactorizationStochastic Gradient Descent
Distributed SGD with MapReduceExperiments
Summary
Matrix FactorizationStochastic Gradient Descent
Distributed SGD with MapReduceExperiments
Summary
Example: Netflix data
Example: Synth data
Scalability
Speed up
Matrix FactorizationStochastic Gradient Descent
Distributed SGD with MapReduceExperiments
Summary
References:Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent. Rainer Gemulla. Peter J. Haas. Erik
Nijkamp. Yannis SismanisLarge-Scale Matrix Factorization, Rainer Gemulla