Next directions in Mahout's recommenders
-
Upload
sscdotopen -
Category
Technology
-
view
5.186 -
download
0
description
Transcript of Next directions in Mahout's recommenders
Next Directions in Mahout’s RecommendersSebastian Schelter, Apache Software FoundationBay Area Mahout Meetup
Next
Directions
inM
ahout’sRecom
menders
2/38
About me
PhD student at the Database Systems and InformationManagement Group of Technische Universitat Berlin
Member of the Apache Software Foundation, committer onMahout and Giraph
currently interning at IBM Research Almaden
Next
Directions
inM
ahout’sRecom
menders
3/38
Next Directions?
Mahout in Action is the prime source ofinformation for using Mahout in practice.
As it is more than two years old(and only covers Mahout 0.5), it ismissing a lot of recent developments.
This talk describes what has been added to the recommendersof Mahout since then and gives suggestions on directions forfuture versions of Mahout.
Collaborative Filtering 101
Next
Directions
inM
ahout’sRecom
menders
5/38
Collaborative Filtering
Problem: Given a user’s interactions with items, guess whichother items would be highly preferred
Collaborative Filtering: infer recommendations from patternsfound in the historical user-item interactions
data can be explicit feedback (ratings) or implicit feedback(clicks, pageviews), represented in the interaction matrix A
item1 · · · item3 · · ·
user1 3 · · · 4 · · ·user2 − · · · 4 · · ·user3 5 · · · 1 · · ·· · · · · · · · · · · · · · ·
Next
Directions
inM
ahout’sRecom
menders
6/38
Neighborhood Methods
User-based:I for each user, compute a ”jury” of users with similar tasteI pick the recommendations from the ”jury’s” items
Item-based:I for each item, compute a set of items with similar
interaction patternI pick the recommendations from those similar items
Next
Directions
inM
ahout’sRecom
menders
7/38
Neighborhood Methods
item-based variant most popular:I simple and intuitively understandableI additionally gives non-personalized, per-item
recommendations (people who like X might also like Y)I recommendations for new users without model retrainingI comprehensible explanations (we recommend Y because
you liked X)
Next
Directions
inM
ahout’sRecom
menders
8/38
Latent factor models
Idea: interactions are deeply influenced by a set of factorsthat are very specific to the domain (e.g. amount of actionor complexity of characters in movies)
these factors are in general not obvious and need to beinferred from the interaction data
both users and items can be described in terms of these factors
Next
Directions
inM
ahout’sRecom
menders
9/38
Matrix factorization
Computing a latent factor model: approximately factor Ainto the product of two rank k feature matrices U and M suchthat A ≈ UM.
U models the latent features of the users, M models the latentfeatures of the items
dot product u>i mj in the latent feature space predicts strength
of interaction between user i and item j
≈ ×
Au × i
Uu × k
Mk × i
Single machine recommenders
Next
Directions
inM
ahout’sRecom
menders
11/38
Taste
I based on Sean Owen’s Taste framework (started in 2005)I mature and stable codebaseI Recommender implementations encapsulate recommender
algorithmsI DataModel implementations handle interaction data in
memory, files, databases, key-value stores
but focus was mostly on neighborhood methodsI lack of implementations for latent factor modelsI little support for scientific usecases (e.g. recommender
contests)
Next
Directions
inM
ahout’sRecom
menders
12/38
Collaboration
MyMedialite, scientific library of recom-mender system algorithmshttp://www.mymedialite.net/
Mahout now features a couple of popular latent factor models,mostly ported by Zeno Gantner.
Next
Directions
inM
ahout’sRecom
menders
13/38
Lots of different Factorizers for our SVDRecommender
RatingSGDFactorizer, biased matrix factorizationKoren et al.: Matrix Factorization Techniques for Recommender Systems, IEEE Computer ’09
SVDPlusPlusFactorizer, SVD++Koren: Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, KDD ’08
ALSWRFactorizer, matrix factorization using AlternatingLeast SquaresZhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08
Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08
ParallelSGDFactorizer, parallel version of biased matrixfactorization (contributed by Peng Cheng)Takacs et. al.: Scalable Collaborative Filtering Approaches for Large Recommender Systems, JMLR ’09
Niu et al.: Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, NIPS ’11
Next
Directions
inM
ahout’sRecom
menders
14/38
Next directions
I better tooling for cross-validation and hold-out tests (e.g.time-based splits of interactions)
I memory-efficient DataModel implementations tailored tospecific usecases (e.g. matrix factorization with SGD)
I better support for computing recommendations for”anonymous” users
I online recommenders
Next
Directions
inM
ahout’sRecom
menders
15/38
Usage
I researchers at TU Berlin and CWI Amsterdamregularly use Mahout for their recommender researchpublished at international conferences
I ”Bayrischer Rundfunk”, one of Germany’s largest publicTV broadcasters, uses Mahout to help users discover TVcontent in its online media library
I Berlin-based company plista runs a live contest for thebest news recommender algorithm and providesMahout-based ”skeleton code” to participants
I The Dutch Institute of Sound and Vision runs awebplatform that uses Mahout for recommending contentfrom its archive of Dutch audio-visual heritage collectionsof the 20th century
Parallel processing
Next
Directions
inM
ahout’sRecom
menders
17/38
Distribution
difficult environment:I data is partitioned and stored in a distributed filesystemI algorithms must be expressed in MapReduce
our distributed implementations focus on two popular methods
I item-based collaborative filteringI matrix factorization with Alternating Least Squares
Scalable neighborhood methods
Next
Directions
inM
ahout’sRecom
menders
19/38
Cooccurrences
start with a simplified view:imagine interaction matrix A wasbinary
→ we look at cooccurrences only
item similarity computation becomes matrix multiplication
S = A>A
scale-out of the item-based approach reduces to finding anefficient way to compute this item similarity matrix
Next
Directions
inM
ahout’sRecom
menders
20/38
Parallelizing S = A>A
standard approach of computing item cooccurrences requiresrandom access to both users and items
foreach item f doforeach user i who interacted with f do
foreach item j that i also interacted with doSfj = Sfj + 1
→ not efficiently parallelizable on partitioned data
row outer product formulation of matrix multiplication isefficiently parallelizable on a row-partitioned A
S = A>A =∑i∈A
aia>i
mappers compute the outer products of rows of A, emit theresults row-wise, reducers sum these up to form S
Next
Directions
inM
ahout’sRecom
menders
21/38
Parallel similarity computation
much more details in the implementation
I support for various similarity measuresI various optimizations (e.g. for symmetric similarity
measures)I downsampling of skewed interaction data
in-depth description available in:
Sebastian Schelter, Christoph Boden, Volker Markl:Scalable Similarity-Based Neighborhood Methods withMapReduceACM RecSys 2012
Next
Directions
inM
ahout’sRecom
menders
22/38
Implementation in Mahout
o.a.m.math.hadoop.similarity.cooccurrence.RowSimilarityJobcomputes the top-k pairwise similarities for each row of amatrix using some similarity measure
o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJobcomputes the top-k similar items per item usingRowSimilarityJob
o.a.m.cf.taste.hadoop.item.RecommenderJobcomputes recommendations and similar items usingRowSimilarityJob
Next
Directions
inM
ahout’sRecom
menders
23/38
Scalable Neighborhood Methods: Experiments
Setup
I 6 machines running Java 7 and Hadoop 1.0.4I two 4-core Opteron CPUs, 32 GB memory and four 1 TB
disk drives per machine
Results
Yahoo Songs dataset (700M datapoints, 1.8M users, 136Kitems), similarity computation takes less than 100 minutes
Scalable matrix factorization
Next
Directions
inM
ahout’sRecom
menders
25/38
Alternating Least Squares
ALS rotates between fixing U and M. When U is fixed, thesystem recomputes M by solving a least-squares problem peritem, and vice versa.
easy to parallelize, as all users (and vice versa, items) can berecomputed independently
additionally, ALS can be applied to usecases with implicit data(pageviews, clicks)
≈ ×
Au × i
Uu × k
Mk × i
Next
Directions
inM
ahout’sRecom
menders
26/38
Scalable Matrix Factorization: ImplementationRecompute user feature matrix U using a broadcast-join:
1. Run a map-only job using multithreaded mappers2. load item-feature matrix M into memory from HDFS to
share it among the individual mappers3. mappers read the interaction histories of the users4. multithreaded: solve a least squares problem per user to
recompute its feature vector
user histories A user features U
item features M
MapHash-Join + Re-computation
local fw
dlo
cal fwd
local fw
d
MapHash-Join + Re-computation
MapHash-Join + Re-computation
broadcast
mac
hin
e 1
mac
hin
e 2
mac
hin
e 3
Next
Directions
inM
ahout’sRecom
menders
27/38
Implementation in Mahout
o.a.m.cf.taste.hadoop.als.ParallelALSFactorizationJobdifferent solvers for explicit and implicit dataZhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08
Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08
o.a.m.cf.taste.hadoop.als.RecommenderJob computesrecommendations from a factorization
in-depth description available in:
Sebastian Schelter, Christoph Boden, Martin Schenck,Alexander Alexandrov, Volker Markl:Distributed Matrix Factorization with MapReduce using aseries of Broadcast-Joinsto appear at ACM RecSys 2013
Next
Directions
inM
ahout’sRecom
menders
28/38
Scalable Matrix Factorization: Experiments
Cluster: 26 machines, two 4-core Opteron CPUs, 32 GBmemory and four 1 TB disk drives each
Hadoop Configuration: reuse JVMs, used JBlas as solver,run multithreaded mappers
Datasets: Netflix (0.5M users, 100M datapoints), YahooSongs (1.8M users, 700M datapoints), Bigflix (25M users, 5Bdatapoints)
0
50
100
150
number of features r
avg
. du
rati
on
per
job
(se
con
ds)
(U) 10
(M) 10
(U) 20
(M) 20
(U) 50
(M) 50
(U) 100
(M) 100
Yahoo SongsNetflix
5 10 15 20 250
100
200
300
400
500
600
number of machines
avg
. du
rati
on
per
job
(se
con
ds)
Bigflix (M)Bigflix (U)
Next
Directions
inM
ahout’sRecom
menders
29/38
Next directions
I better tooling for cross-validation and hold-out tests (e.g.to find parameters for ALS)
I integration of more efficient solver libraries like JBlasI should be easier to modify and adjust the MapReduce
code
Next
Directions
inM
ahout’sRecom
menders
30/38
A selection of users
I Mendeley, a data platform for researchers (2.5M users,50M research articles): Mendeley Suggest for discoveringrelevant research publications
I Researchgate, the world’s largest social network forresearchers (3M users)
I a German online retailer with several million customersacross Europe
I German online market places for real estate andpre-owned cars with millions of users
Deployment -
Next
Directions
inM
ahout’sRecom
menders
32/38
”Small data, low load”
I use GenericItembasedRecommender orGenericUserbasedRecommender, feed it with interactiondata stored in a file, database or key-value store
I have it load the interaction data in memory and computerecommendations on request
I collect new interactions into your files or database andperiodically refresh the recommender
In order to improve performance, try to:I have your recommender look at fewer interactions by
using SamplingCandidateItemsStrategyI cache computed similarities with a CachingItemSimilarity
Next
Directions
inM
ahout’sRecom
menders
33/38
”Medium data, high load”
Assumption: interaction data still fits into main memory
I use a recommender that is able to leverage aprecomputed model, e.g. GenericItembasedRecommenderor SVDRecommender
I load the interaction data and the model in memory andcompute recommendations on request
I collect new interactions into your files or database andperiodically recompute the model and refresh therecommender
use BatchItemSimilarities or ParallelSGDFactorizer forprecomputing the model using multiple threads on a singlemachine
Next
Directions
inM
ahout’sRecom
menders
34/38
”Lots of data, high load”
Assumption: interaction data does not fit into main memory
I use a recommender that is able to leverage aprecomputed model, e.g. GenericItembasedRecommenderor SVDRecommender
I keep the interaction data in a (potentially partitioned)database or in a key-value store
I load the model into memory, the recommender will onlyuse one (cacheable) query per recommendation request toretrieve the user’s interaction history
I collect new interactions into your files or database andperiodically recompute the model offline
use ItemSimilarityJob or ParallelALSFactorizationJob toprecompute the model with Hadoop
Next
Directions
inM
ahout’sRecom
menders
35/38
”Precompute everything”
I use RecommenderJob to precompute recommendationsfor all users with Hadoop
I directly serve those recommendations
successfully employed by Mendeley for their research paperrecommender ”Suggest”
allowed them to run their recommender infrastructure serving2 million users for less than $100 per month in AWS
Next
Directions
inM
ahout’sRecom
menders
36/38
Next directions
”Search engine based recommender infrastructure”(work in progress driven by Pat Ferrel)
I use RowSimilarityJob to find anomalously co-occuringitems using Hadoop
I index those item pairs with a distributed search enginesuch as Apache Solr
I query based on a user’s interaction history and the searchengine will answer with recommendations
I gives us an easy-to-use, scalable serving layer for free(Apache Solr)
I allows complex recommendation queries containing filters,geo-location, etc.
Next
Directions
inM
ahout’sRecom
menders
37/38
The shape of things to come
MapReduce is not well suited for certain ML usecases, e.g.when the algorithms to apply are iterative and the dataset fitsinto the aggregate main memory of the cluster
Mahout always stated that it is not tied to Hadoop, howeverthere were no production-quality alternatives in the past
With the advent of YARN and the maturing of alternativesystems, this situation is changing and we should embrace thischange
Personally, I would love to see an experimental port of ourdistributed recommenders to another Apache-supportedsystem such Spark or Giraph
Thanks for listening!Follow me on twitter at http://twitter.com/sscdotopen
Join Mahout’s mailinglists at http://s.apache.org/mahout-lists
picture on slide 3 by Tim Abott, http://www.flickr.com/photos/theabbott/picture on slide 21 by Crimson Diabolics, http://crimsondiabolics.deviantart.com/