Clustering and Factorization using Apache SystemML by Prithviraj Sen

7
Matrix Factorization Algorithms in Apache SystemML Prithviraj Sen

Transcript of Clustering and Factorization using Apache SystemML by Prithviraj Sen

Page 1: Clustering and Factorization using Apache SystemML by  Prithviraj Sen

MatrixFactorizationAlgorithmsinApacheSystemML

PrithvirajSen

Page 2: Clustering and Factorization using Apache SystemML by  Prithviraj Sen

ApplicationsofMatrixFactorization

• NetflixPrize• Givenratingsdatapredictwhatmoviesuserswillwatch

1 3 4 ? ? ?

? 3 5 ? ? 5

? ? 4 5 ? 5

? ? 3 ? ? ?

? ? 3 ? ? ?

2 ? ? 2 ? 2

? ? ? ? 5 ?

? 2 1 ? ? 1

? 3 ? ? 3 ?

1 ? ? ? ? ?

17,700 movies

480,000users

Page 3: Clustering and Factorization using Apache SystemML by  Prithviraj Sen

Applications:PartsBasedDecomposition

Page 4: Clustering and Factorization using Apache SystemML by  Prithviraj Sen

LeastSquaresMatrixFactorization

• ApproximateVusingLRmin∑ (𝑣$%−𝑙$′𝑟%)2($,%)

• Leadstotheverywellknownalternatingleastsquaresalgorithm• Onlyrequiressolvingleastsquares• Embarassingly parallel

V L

Rf

f

~~x

Page 5: Clustering and Factorization using Apache SystemML by  Prithviraj Sen

ALSinDML

parfor

parfor

Directsolvingleastsquares

Directsolvingleastsquares

Page 6: Clustering and Factorization using Apache SystemML by  Prithviraj Sen

PoissonMatrixFactorization(NMF)

• Suitableifyouarelookingfornon-negativefactors𝑣 = 𝑒/01𝑙𝑟𝑛/𝑛!

• LeadstothewellknownGeneralizedKL-Divergencemax∑ (𝑛𝑖𝑗log𝑙𝑖′𝑟𝑗 − 𝑙𝑖′𝑟𝑗)$%

• Wellknownupdateequationsexist*

*“Generalized NonnegativeMatrixApproximationswithBregman Divergences” byDhillon andSra inNIPS2005.* “DistributedNonnegativeMatrixFactorizationforWeb-ScaleDyadicDataAnalysisonMapReduce”byLiuetalinWWW2010.

Page 7: Clustering and Factorization using Apache SystemML by  Prithviraj Sen

PNMFinDML

• Veryefficientupdatesusingonlylinearalgebra• UsesApacheSystemML’s wdivmm operator

wdivmm operator