Post on 03-Jan-2016
Contextual Recommendationin Multi-User Devices
Raz Nissim, Michal Aharon, Eshcar Hillel, Amit Kagian, Ronny Lempel, Hayim Makabee
Challenge: Recommendations in Shared Accounts and Devices
“I am a 34 yo man who enjoys action and sci-fi movies. This is what my children have done to my netflix account”
04/20/23 3
Our Focus: Recommendations for Smart TVs
04/20/23 4
Main problems: Inferring who has
consumed each item in the past
Who is currently requesting the recommendations
“Who” can be a subset of users
Smart TVs can track what is being watched on them
This Work: Contextual Personalized Recommendations
04/20/23 7
WatchItNext problem: it is 8:30pm and “House of Cards” is
on What should we recommend to be
watched next on this device?
Implicit assumption: there’s a good chance whoever is in front of the set now, will remain there
Technically, think of HMM where the hidden state corresponds to who is watching the set, and states don’t change too often
WatchItNext Inputs and Output
04/20/23 8
Available programs, a.k.a. “line-up”
Ranked recommendations
Recommendation Settings: Exploratory and Habitual
One typically doesn’t buy the same book twice, nor do people typically read the same news story twice
But people listen to the songs they like over and over again, and watch movies they like multiple times as well
04/20/23 9
In the TV setting, people regularly watch series and sports events
Habitual setting: all line-up items are eligible for recommendation to a device
Exploratory setting: only items that were not previously watched on the device are eligible for recommendation
Contextual
Personalized
Popular
04/20/23 10
Contextual Recommendations in a Different Context
How can contextualized and personalized recommendations be served together?
A fundamental principle in recommender systems Taps similarities in patterns of
consumption/enjoyment of items by users Recommends to a user what users with detected
similar tastes have consumed/enjoyed
Collaborative Filtering
04/20/23 11
Consider a consumption matrix R of users and items
ru,i=1 whenever person u consumed item i In other cases, ru,i might be person u’s rating on item i
The matrix R is typically very sparse …and often very large
Collaborative Filtering – Mathematical Abstraction
use
rs
R =
Items
|U| x |I|
• Real-life task: top-k recommendation– predict which yet-to-be-consumed
items the user would most enjoy• Related task on ratings data:
matrix completion– Predict users’ ratings for items they
have yet to rate, i.e. “complete” missing values
04/20/23 12
Latent factor models (LFM): Map both users and items to some f-dimensional space Rf,
i.e. produce f-dimensional vectors vu and wi for each user and item
Define rating estimates as inner products: qui = <vu,wi> Main problem: finding a mapping of users and items to the
latent factor space that produces “good” estimates
Collaborative Filtering – Matrix Factorization
use
rs
R =
Items
≈
|U| x |I| |U| x f f x |I|
VW Closely related to
dimensionality reduction techniques of the ratings matrix R (e.g. Singular Value Decomposition)
04/20/23 13
LFMs Rise to Fame: Netflix Prize
04/20/23 14
Used extensively by Challenge winners “Bellkor’s Pragmatic Chaos”(2006-2009)
Originally devised as a generative model of documents in a corpus, where documents are represented as bags-of-words
L
k is a parameter representing the number of “topics” in the corpus
V is a stochastic matrix: V[d,t] = P(topict|documentd), t=1,…,k U is a stochastic matrix: U[t,w] = P(wordw|topict), t=1,…,k L is a vector holding the documents’ lengths (#words per
document)
Latent Dirichlet Allocation (LDA)[Blei, Ng, Jordan 2003]
04/20/23 15
Word1
Word2
…
Document1
#1,1 #1,2 #1,…
Document2
#2,1 #2,2 #2,…
… #...,1 #...,2 …
≈
|D| x k k x |W|
VU
L
In our case: given a parameter k and the collection of devices (=documents) and their viewing history (=bags of shows), output:
k “profiles”, where each profile is a distribution over items Associate each device to a distribution over the profiles
Profiles, hopefully, will represent viewing preferences such as: “Kids shows” “Cooking reality and home improvement” “News and Late Night” “History and Science” “Redneck reality: fishing & hunting shows, MMA”
A-priori probability of an item being watched on a device:
Score(item|device) = profile=1,…,k P(item|profile) x P(profile|device)
Latent Dirichlet Allocation (cont.)
04/20/23 16
Contextualizing Recommendations: Three Main Approaches
1. Contextual pre-filtering: use context to restrict the data to be modeled
2. Contextual post-filtering: use context to filter or weight the recommendations produced by conventional models
3. Contextual modeling: context information is incorporated in the model itself Typically requires denser data due to many more
parameters Computationally intensive E.g. Tensor Factorization,
Karatzoglou et al., 2010
04/20/23 17
Main Contribution:“3-Way” Technique
Learn a standard matrix factorization model (LFM/LDA) When recommending to a device d currently watching
context item c, score each target item t as follows:
S(t follows c|d) = j=1..k vd(j)*wc(j)*wt(j) With LFM, requires an additive shift to all vectors to get rid
of negative values Results in “Sequential LFM/LDA” – a personalized
contextual recommender Score is high for targets that agree with both context and
device Again – no need to model context or change learning
algorithm; learn as usual, just apply change when scoring
04/20/23 18
Data: Historical Viewing Logs Triplets of the form (devide ID, program ID,
timestamp) Don’t know who watched the device at that time Actually, don’t know whether anyone watched
Is anyone watching?
Time
04/20/23 19
Data by the Numbers Training data: three months’ worth of viewership
data
Test Data: derived from one month of viewership data
04/20/23 20
* Items are {movie, sports event, series} – not at the individual episode level
Devices Unique items* Triplets
339647 17232 More than 19M
Setting Test Instances Average Line-up Size
Habitual ~3.8M 390
Exploratory
~1.7M 349
Metric: Avg. Rank Percentile (ARP)
Note: with large line-ups, ARP is practically equivalent to average AUC
04/20/23 21
RP = 0.75
?next(RP = 0.25)
(RP = 0.50)
(RP = 1.0)Rank Percentile
properties: Ranges in (0,1] Higher is better Random scores ~0.5
in large lineups
Baselines
04/20/23 22
Name Personalized?
Contextual?
General popularity No No
Sequential popularity No Yes
Temporal popularity No Yes
Device popularity* Yes No
LFM Yes No
LDA Yes No
* Only applicable to habitual recommendations
Contextual Personalized Recommenders
04/20/23 23
SequentialLDA [LFM]: 3-way element-wise multiplication of device vector, context item and target item
TemporalLDA[LFM]: regular LDA/LFM score, multiplied by Temporal Popularity
TempSeqLDA[LFM]: 3-way score multiplied by Temporal Popularity
All LDA/LFM models are 80-dimensional
Results (1)Sequential Context Matters
Degradation when using a random item as context indicates that the correct context item reflects the current viewing session, and implicitly the current watchers of the device04/20/23 24
Results (2)Sequential Context Matters
Device Entropy: the entropy of p(topic | device) as computed by LDAon the training data; high values correspond to diverse distributions
04/20/23 25
Conclusions Multi-user or shared devices pose challenging
recommendation problems TV recommendations characterized by two use cases –
habitual and exploratory Sequential context helps – it “narrows" the topical
variety of the program to be watched next on the device
Intuitively, context serves to implicitly disambiguate the current user or users of the device
3-Way technique is an effective way of incorporating sequential context that has no impact on learning
Future: explore applications of Hidden Topic Markov Models [Gruber, Rosen-Zvi, Weiss 2007]
04/20/23 28