Clustering and Factorization using Apache SystemML by Prithviraj Sen
-
Upload
arvind-surve -
Category
Education
-
view
74 -
download
1
Transcript of Clustering and Factorization using Apache SystemML by Prithviraj Sen
MatrixFactorizationAlgorithmsinApacheSystemML
PrithvirajSen
ApplicationsofMatrixFactorization
• NetflixPrize• Givenratingsdatapredictwhatmoviesuserswillwatch
1 3 4 ? ? ?
? 3 5 ? ? 5
? ? 4 5 ? 5
? ? 3 ? ? ?
? ? 3 ? ? ?
2 ? ? 2 ? 2
? ? ? ? 5 ?
? 2 1 ? ? 1
? 3 ? ? 3 ?
1 ? ? ? ? ?
17,700 movies
480,000users
Applications:PartsBasedDecomposition
LeastSquaresMatrixFactorization
• ApproximateVusingLRmin∑ (𝑣$%−𝑙$′𝑟%)2($,%)
• Leadstotheverywellknownalternatingleastsquaresalgorithm• Onlyrequiressolvingleastsquares• Embarassingly parallel
V L
Rf
f
~~x
ALSinDML
parfor
parfor
Directsolvingleastsquares
Directsolvingleastsquares
PoissonMatrixFactorization(NMF)
• Suitableifyouarelookingfornon-negativefactors𝑣 = 𝑒/01𝑙𝑟𝑛/𝑛!
• LeadstothewellknownGeneralizedKL-Divergencemax∑ (𝑛𝑖𝑗log𝑙𝑖′𝑟𝑗 − 𝑙𝑖′𝑟𝑗)$%
• Wellknownupdateequationsexist*
*“Generalized NonnegativeMatrixApproximationswithBregman Divergences” byDhillon andSra inNIPS2005.* “DistributedNonnegativeMatrixFactorizationforWeb-ScaleDyadicDataAnalysisonMapReduce”byLiuetalinWWW2010.
PNMFinDML
• Veryefficientupdatesusingonlylinearalgebra• UsesApacheSystemML’s wdivmm operator
wdivmm operator