Descent with Distributed Stochastic Gradient Large-Scale...

Large-Scale Matrix Factorizationwith Distributed Stochastic Gradient

DescentPresented by Joe, Jiefeng and Xiaolu

http://users.wpi.edu/~xxiong/m.htm

Matrix FactorizationStochastic Gradient Descent

Distributed SGD with MapReduceExperiments

Summary

Matrix Factorization

Original Image Actual Image

Reconstructed Image

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Collaborative Filtering -- Recommendation Systems

Problem Set of users Set of items (movies, books, jokes, products, stories, … Feedback (ratings, purchase, click-through, tags, ...



Problem Set of users Set of items (movies, books, jokes, products, stories, … Feedback (ratings, purchase, click-through, tags, …

Predict additional items a user may like Assumptions: Similar feedback Similar taste



Problem Set of users Set of items (movies, books, jokes, products, stories, … Feedback (ratings, purchase, click-through, tags, …

Predict additional items a user may like Assumptions: Similar feedback Similar taste

Example:


traditional Matrix Factorization

Given an m*n matrix V and a rank r, find an m*r matrix W and an r*n matrix H such that V = W H.

The goal here is to obtain a low-rank approximation V ~ W H.


Generalized Matrix Factorization

A general machine learning problem Recommender systems, text indexing, face recognition, ...



A general machine learning problem Recommender systems, text indexing, face recognition, …Training data V: m * n input matrix (e.g., rating matrix) Z: training set of indexes in V (e.g., subset of known ratings)



A general machine learning problem Recommender systems, text indexing, face recognition, …Training data V: m * n input matrix (e.g., rating matrix) Z: training set of indexes in V (e.g., subset of known ratings)Parameter space W: row factors (e.g., m * r latent customer factors) H: column factors (e.g., r * n latent movie factors)



A general machine learning problem Recommender systems, text indexing, face recognition, …Training data V: m * n input matrix (e.g., rating matrix) Z: training set of indexes in V (e.g., subset of known ratings)Parameter space W: row factors (e.g., m * r latent customer factors) H: column factors (e.g., r * n latent movie factors)Model Lij(Wi*, H*j): loss at element (i, j) Includes prediction error, regularization, auxiliary information, … Constraints (e.g., non-negativity)


Generalized Matrix FactorizationA general machine learning problem Recommender systems, text indexing, face recognition, …Training data V: m * n input matrix (e.g., rating matrix) Z: training set of indexes in V (e.g., subset of known ratings)Parameter space W: row factors (e.g., m * r latent customer factors) H: column factors (e.g., r * n latent movie factors)Model Lij(Wi*, H*j): loss at element (i, j) Includes prediction error, regularization, auxiliary information, … Constraints (e.g., non-negativity)Find best model


Successful Applications Movie recommendation (Netflix) >20M users, >20k movies, 4B ratings (projected) 60GB data, 15GB model (projected) Collaborative filtering

Website recommendation (Microsoft, WWW10) 51M users, 15M URLs, 1.2B clicks 17.8GB data, 161GB metadata, 49GB model Gaussian non-negative matrix factorization

News personalization (Google, WWW07) Millions of users, millions of stories, ? clicks Probabilistic latent semantic indexing


Successful Applications Movie recommendation (Netflix) >20M users, >20k movies, 4B ratings (projected) 60GB data, 15GB model (projected) Collaborative filtering Website recommendation (Microsoft, WWW10) 51M users, 15M URLs, 1.2B clicks 17.8GB data, 161GB metadata, 49GB model Gaussian non-negative matrix factorization News personalization (Google, WWW07) Millions of users, millions of stories, ? clicks Probabilistic latent semantic indexing

How to handle such massive scale? Big data Large models Expensive, iterative computations




Summary

Stochastic Gradient Descent


Stochastic Gradient Descent for Matrix Factorization


A Small Problem for Distributed Computation


A Small Problem for Distributed Computation, however




Summary

Example: Netflix data

Example: Synth data

Scalability

Speed up



Summary

References:Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent. Rainer Gemulla. Peter J. Haas. Erik

Nijkamp. Yannis SismanisLarge-Scale Matrix Factorization, Rainer Gemulla

Descent with Distributed Stochastic Gradient Large-Scale...

Documents

Transcript of Descent with Distributed Stochastic Gradient Large-Scale...