CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord...
Transcript of CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord...
![Page 1: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/1.jpg)
CS 572: Information Retrieval
Recommender Systems 2: Implementation and Applications
Acknowledgements
Many slides in this lecture are adapted from Xavier Amatriain (Netflix), Yehuda
Koren (Yahoo), and Dietmar Jannach (TU Dortmund), Jimmy Lin, Foster Provost
![Page 2: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/2.jpg)
Recap of Recommendation Approaches
Pros Cons
Collaborative No knowledge-engineering effort, serendipity of results, learns market segments
Requires some form of rating feedback, cold start for new users and new items
Content-based No community required, comparison between items possible
Content descriptions necessary, cold start for new users, no surprises
Knowledge-based Deterministic recommendations, assured quality, no cold-start, can resemble sales dialogue
Knowledge engineering effort to bootstrap, basically static, does not react to short-term trends
4/6/2016 CS 572: Information Retrieval. Spring 2016 2
![Page 3: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/3.jpg)
Recap: What works (in “real world”)
4/6/2016 CS 572: Information Retrieval. Spring 2016 3
![Page 4: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/4.jpg)
Today: Applications
• Products: Netflix challenge
• Information: News recommendation
• Social: Who to follow
• Human issues (how recommendations are used)
4/6/2016 CS 572: Information Retrieval. Spring 2016 4
![Page 5: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/5.jpg)
4/6/2016 CS 572: Information Retrieval. Spring 2016 5
Netflix Challenge
![Page 6: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/6.jpg)
scoremovieuser
1211
52131
43452
41232
37682
5763
4454
15685
23425
22345
5766
4566
scoremovieuser
?621
?961
?72
?32
?473
?153
?414
?284
?935
?745
?696
?836
Training data Test data
Movie rating data
4/6/2016 CS 572: Information Retrieval. Spring 2016 6
![Page 7: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/7.jpg)
Netflix Prize
• Training data
– 100 million ratings
– 480,000 users
– 17,770 movies
– 6 years of data: 2000-2005
• Test data
– Last few ratings of each user (2.8 million)
– Evaluation criterion: root mean squared error (RMSE)
– Netflix Cinematch RMSE: 0.9514
• Competition
– 2700+ teams
– $1 million grand prize for 10% improvement on Cinematch result
– $50,000 2007 progress prize for 8.43% improvement
4/6/2016 CS 572: Information Retrieval. Spring 2016 7
![Page 8: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/8.jpg)
Overall rating distribution
• Third of ratings are 4s
• Average rating is 3.68From TimelyDevelopment.com
4/6/2016 CS 572: Information Retrieval. Spring 2016 8
![Page 9: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/9.jpg)
#ratings per movie
• Avg #ratings/movie: 56274/6/2016 CS 572: Information Retrieval. Spring 2016 9
![Page 10: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/10.jpg)
#ratings per user
• Avg #ratings/user: 2084/6/2016 CS 572: Information Retrieval. Spring 2016 10
![Page 11: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/11.jpg)
Average movie rating by movie count
More ratings to better moviesFrom TimelyDevelopment.com
4/6/2016 CS 572: Information Retrieval. Spring 2016 11
![Page 12: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/12.jpg)
Most loved movies
CountAvg ratingTitle
137812 4.593 The Shawshank Redemption
133597 4.545 Lord of the Rings: The Return of the King
180883 4.306 The Green Mile
150676 4.460 Lord of the Rings: The Two Towers
139050 4.415 Finding Nemo
117456 4.504 Raiders of the Lost Ark
180736 4.299 Forrest Gump
147932 4.433 Lord of the Rings: The Fellowship of the ring
149199 4.325 The Sixth Sense
144027 4.333 Indiana Jones and the Last Crusade
4/6/2016CS 572: Information Retrieval. Spring 201612
![Page 13: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/13.jpg)
Important RMSEs
Grand Prize: 0.8563; 10% improvement
BellKor: 0.8693; 8.63% improvement
Cinematch: 0.9514; baseline
Movie average: 1.0533
User average: 1.0651
Global average: 1.1296
Inherent noise: ????
Personalization
erroneous
accurate
4/6/2016 CS 572: Information Retrieval. Spring 2016 13
![Page 14: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/14.jpg)
Challenges
• Size of data
– Scalability
– Keeping data in memory
• Missing data
– 99 percent missing
– Very imbalanced
• Avoiding overfitting
• Test and training data differ significantly
movie #16322
4/6/2016 CS 572: Information Retrieval. Spring 2016 14
![Page 15: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/15.jpg)
15
The Rules
• Improve RMSE metric by 10%
– RMSE= square root of the averaged squared difference between each prediction and the actual rating
– amplifies the contributions of egregious errors, both false positives ("trust busters") and false negatives ("missed opportunities").
• 100 million “training” ratings provided
– User, Rating, Movie name, date
4/6/2016 CS 572: Information Retrieval. Spring 2016
![Page 16: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/16.jpg)
4/6/2016 CS 572: Information Retrieval. Spring 2016 16
![Page 17: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/17.jpg)
4/6/2016 CS 572: Information Retrieval. Spring 2016 17
![Page 18: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/18.jpg)
movie #7614 movie #3250
movie #15868
Lessons from the Netflix Prize
Yehuda Koren
The BellKor team
(with Robert Bell & Chris Volinsky)
4/6/2016 CS 572: Information Retrieval. Spring 2016 18
![Page 19: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/19.jpg)
• Use an ensemble of complementing predictors
• Two, half tuned models worth more than a single, fully tuned model
The BellKor recommender system
4/6/2016 CS 572: Information Retrieval. Spring 2016 19
![Page 20: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/20.jpg)
Effect of ensemble size
0.87
0.8725
0.875
0.8775
0.88
0.8825
0.885
0.8875
0.89
0.8925
1 8 15 22 29 36 43 50
#Predictors
Err
or
- R
MS
E
4/6/2016 CS 572: Information Retrieval. Spring 2016 20
![Page 21: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/21.jpg)
• Use an ensemble of complementing predictors
• Two, half tuned models worth more than a single, fully tuned model
• But:Many seemingly different models expose similar characteristics of the data, and won’t mix well
• Concentrate efforts along three axes...
The BellKor recommender system
4/6/2016 CS 572: Information Retrieval. Spring 2016 21
![Page 22: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/22.jpg)
The first axis:
• Multi-scale modeling of the data
• Combine top level, regional modeling of the data, with a refined, local view:
– k-NN: Extracting local patterns
– Factorization: Addressing regional effects
The three dimensions of the BellKor system
Global effects
Factorization
k-NN
4/6/2016 CS 572: Information Retrieval. Spring 2016 22
![Page 23: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/23.jpg)
Multi-scale modeling – 1st tier
• Mean rating: 3.7 stars
• The Sixth Sense is 0.5 stars above avg
• Joe rates 0.2 stars below avg
Baseline estimation:Joe will rate The Sixth Sense 4 stars
Global effects:
4/6/2016 CS 572: Information Retrieval. Spring 2016 23
![Page 24: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/24.jpg)
Multi-scale modeling – 2nd tier
• Both The Sixth Sense and Joe are placed high on the “Supernatural Thrillers” scale
Adjusted estimate:Joe will rate The Sixth Sense 4.5 stars
Factors model:
4/6/2016 CS 572: Information Retrieval. Spring 2016 24
![Page 25: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/25.jpg)
Multi-scale modeling – 3rd tier
• Joe didn’t like related movie Signs
• Final estimate:Joe will rate The Sixth Sense 4.2 stars
Neighborhood model:
4/6/2016 CS 572: Information Retrieval. Spring 2016 25
![Page 26: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/26.jpg)
The second axis:
• Quality of modeling
• Make the best out of a model
• Strive for:
– Fundamental derivation
– Simplicity
– Avoid overfitting
– Robustness against #iterations, parameter setting, etc.
• Optimizing is good, but don’t overdo it!
The three dimensions of the BellKor system
global
local
quality4/6/2016 CS 572: Information Retrieval. Spring 2016 26
![Page 27: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/27.jpg)
• The third dimension will be discussed later...
• Next:Moving the multi-scale view along the quality axis
The three dimensions of the BellKor system
global
local
quality
???
4/6/2016 CS 572: Information Retrieval. Spring 2016 27
![Page 28: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/28.jpg)
• Earliest and most popular collaborative filtering method
• Derive unknown ratings from those of “similar” items (movie-movievariant)
• A parallel user-user flavor: rely on ratings of like-minded users (not in this talk)
Local modeling through k-NN
4/6/2016 CS 572: Information Retrieval. Spring 2016 28
![Page 29: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/29.jpg)
k-NN
121110987654321
455311
3124452
534321423
245424
5224345
423316
users
movie
s
- unknown rating - rating between 1 to 5
4/6/2016 CS 572: Information Retrieval. Spring 2016 29
![Page 30: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/30.jpg)
k-NN
121110987654321
455 ?311
3124452
534321423
245424
5224345
423316
users
movie
s
- estimate rating of movie 1 by user 5
4/6/2016 CS 572: Information Retrieval. Spring 2016 30
![Page 31: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/31.jpg)
k-NN
121110987654321
455 ?311
3124452
534321423
245424
5224345
423316
users
Neighbor selection:
Identify movies similar to 1, rated by user 5
movie
s
4/6/2016 CS 572: Information Retrieval. Spring 2016 31
![Page 32: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/32.jpg)
k-NN
121110987654321
455 ?311
3124452
534321423
245424
5224345
423316
users
Compute similarity weights:
s13=0.2, s16=0.3
movie
s
4/6/2016 CS 572: Information Retrieval. Spring 2016 32
![Page 33: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/33.jpg)
k-NN
121110987654321
4552.6311
3124452
534321423
245424
5224345
423316
users
Predict by taking weighted average:
(0.2*2+0.3*3)/(0.2+0.3)=2.6
movie
s
4/6/2016 CS 572: Information Retrieval. Spring 2016 33
![Page 34: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/34.jpg)
Properties of k-NN
• Intuitive
• No substantial preprocessing is required
• Easy to explain reasoning behind a recommendation
• Accurate?
4/6/2016 CS 572: Information Retrieval. Spring 2016 34
![Page 35: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/35.jpg)
Grand Prize: 0.8563
BellKor: 0.8693
Cinematch: 0.9514
Movie average: 1.0533
User average: 1.0651
Global average: 1.1296
Inherent noise: ????
0.96
0.91
k-NN on the RMSE scale
k-NN
erroneous
accurate
4/6/2016 CS 572: Information Retrieval. Spring 2016 35
![Page 36: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/36.jpg)
k-NN - Common practice
1. Define a similarity measure between items: sij
2. Select neighbors -- N(i;u): items most similar to i, that were rated by u
3. Estimate unknown rating, rui, as the weighted average:
( ; )
( ; )
ij uj ujj N i u
ui ui
ijj N i u
s r br b
s
baseline estimate for rui
4/6/2016 CS 572: Information Retrieval. Spring 2016 36
![Page 37: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/37.jpg)
k-NN - Common practice1. Define a similarity measure between items: sij
2. Select neighbors -- N(i;u): items similar to i, rated by u
3. Estimate unknown rating, rui, as the weighted average:
Problems:
1. Similarity measures are arbitrary; no fundamental justification
2. Pairwise similarities isolate each neighbor; neglect
interdependencies among neighbors
3. Taking a weighted average is restricting; e.g., when
neighborhood information is limited
( ; )
( ; )
ij uj ujj N i u
ui ui
ijj N i u
s r br b
s
4/6/2016 CS 572: Information Retrieval. Spring 2016 37
![Page 38: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/38.jpg)
(We allow )
Interpolation weights
• Use a weighted sum rather than a weighted average:
( ; )
1ij
j N i u
w
( ; )ui ui ij uj ujj N i u
r b w r b
• Model relationships between item i and its neighbors
• Can be learnt through a least squares problem from all
other users that rated i:
2
( ; )Minw vi vi ij vj vjv u j N i u
r b w r b
4/6/2016 CS 572: Information Retrieval. Spring 2016 38
![Page 39: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/39.jpg)
Interpolation weights
2
( ; )Minw vi vi ij vj vjv u j N i u
r b w r b
• Interpolation weights derived based on their role;
no use of an arbitrary similarity measure
• Explicitly account for interrelationships among the
neighbors
Challenges:
• Deal with missing values
• Avoid overfitting
• Efficient implementation
Mostly
unknown
Estimate inner-products
among movie ratings
4/6/2016 CS 572: Information Retrieval. Spring 2016 39
![Page 40: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/40.jpg)
Geared
towards
females
Geared
towards
males
serious
escapist
The PrincessDiaries
The Lion King
Braveheart
Lethal Weapon
Independence Day
AmadeusThe Color Purple
Dumb and Dumber
Ocean’s 11
Sense and Sensibility
Gus
Dave
Latent factor models
4/6/2016 CS 572: Information Retrieval. Spring 2016 40
![Page 41: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/41.jpg)
Latent factor models
45531
312445
53432142
24542
522434
42331
item
s
.2-.4.1
.5.6-.5
.5.3-.2
.32.11.1
-22.1-.7
.3.7-1
-.92.41.4.3-.4.8-.5-2.5.3-.21.1
1.3-.11.2-.72.91.4-1.31.4.5.7-.8
.1-.6.7.8.4-.3.92.41.7.6-.42.1
~
~
item
s
users
users
A rank-3 SVD approximation
4/6/2016 CS 572: Information Retrieval. Spring 2016 41
![Page 42: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/42.jpg)
Estimate unknown ratings as inner-products of factors:
45531
312445
53432142
24542
522434
42331
item
s
.2-.4.1
.5.6-.5
.5.3-.2
.32.11.1
-22.1-.7
.3.7-1
-.92.41.4.3-.4.8-.5-2.5.3-.21.1
1.3-.11.2-.72.91.4-1.31.4.5.7-.8
.1-.6.7.8.4-.3.92.41.7.6-.42.1
~
~
item
s
users
A rank-3 SVD approximation
users
?
4/6/2016 CS 572: Information Retrieval. Spring 2016 42
![Page 43: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/43.jpg)
Estimate unknown ratings as inner-products of factors:
45531
312445
53432142
24542
522434
42331
item
s
.2-.4.1
.5.6-.5
.5.3-.2
.32.11.1
-22.1-.7
.3.7-1
-.92.41.4.3-.4.8-.5-2.5.3-.21.1
1.3-.11.2-.72.91.4-1.31.4.5.7-.8
.1-.6.7.8.4-.3.92.41.7.6-.42.1
~
~
item
s
users
A rank-3 SVD approximation
users
?
4/6/2016 CS 572: Information Retrieval. Spring 2016 43
![Page 44: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/44.jpg)
Estimate unknown ratings as inner-products of factors:
45531
312445
53432142
24542
522434
42331
item
s
.2-.4.1
.5.6-.5
.5.3-.2
.32.11.1
-22.1-.7
.3.7-1
-.92.41.4.3-.4.8-.5-2.5.3-.21.1
1.3-.11.2-.72.91.4-1.31.4.5.7-.8
.1-.6.7.8.4-.3.92.41.7.6-.42.1
~
~
item
s
users
2.4
A rank-3 SVD approximation
users
4/6/2016 CS 572: Information Retrieval. Spring 2016 44
![Page 45: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/45.jpg)
Latent factor models45531
312445
53432142
24542
522434
42331
.2-.4.1
.5.6-.5
.5.3-.2
.32.11.1
-22.1-.7
.3.7-1
-.92.41.4.3-.4.8-.5-2.5.3-.21.1
1.3-.11.2-.72.91.4-1.31.4.5.7-.8
.1-.6.7.8.4-.3.92.41.7.6-.42.1
~
Properties:• SVD isn’t defined when entries are unknown use specialized
methods
• Very powerful model can easily overfit, sensitive to regularization
• Probably most popular model among contestants
– 12/11/2006: Simon Funk describes an SVD based method
– 12/29/2006: Free implementation at timelydevelopment.com
4/6/2016 CS 572: Information Retrieval. Spring 2016 45
![Page 46: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/46.jpg)
Grand Prize: 0.8563
BellKor: 0.8693
Cinematch: 0.9514
Movie average: 1.0533
User average: 1.0651
Global average: 1.1296
Inherent noise: ????
0.93
0.89
Factorization on the RMSE scale
factorization
4/6/2016 CS 572: Information Retrieval. Spring 2016 46
![Page 47: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/47.jpg)
BellKore Approach
• User factors:Model a user u as a vector pu ~ Nk(, )
• Movie factors:Model a movie i as a vector qi ~ Nk(γ, Λ)
• Ratings:Measure “agreement” between u and i: rui ~ N(pu
Tqi, ε2)
• Maximize model’s likelihood:– Alternate between recomputing user-factors, movie-
factors and model parameters
– Special cases: • Alternating Ridge regression
• Nonnegative matrix factorization
4/6/2016 CS 572: Information Retrieval. Spring 2016 47
![Page 48: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/48.jpg)
Combining multi-scale views
global effects
regional effects
local effects
Residual fitting
factorization
k-NN
Weighted average
A unified model
k-NN
factorization
4/6/2016 CS 572: Information Retrieval. Spring 2016 48
![Page 49: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/49.jpg)
Localized factorization model
• Standard factorization: User u is a linear function parameterized by pu
rui = puTqi
• Allow user factors – pu – to depend on the item being predicted
rui = pu(i)Tqi
• Vector pu(i) models behavior of u on items like i
4/6/2016 CS 572: Information Retrieval. Spring 2016 49
![Page 50: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/50.jpg)
RMSE vs. #Factors
0.905
0.91
0.915
0.92
0.925
0.93
0.935
0 20 40 60 80
#Factors
RM
SE
Factorization
Localized factorization
Results on Netflix Probe set
More
accurate
4/6/2016 CS 572: Information Retrieval. Spring 2016 50
![Page 51: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/51.jpg)
The third dimension of the BellKor system
• A powerful source of information:Characterize users by which movies they rated, rather than how they rated
• A binary representation of the data
45531
312445
53432142
24542
522434
42331
users
mo
vie
s
010100100101
111001001100
011101011011
010010010110
110000111100
010010010101
usersm
ovie
s
4/6/2016 CS 572: Information Retrieval. Spring 2016 51
![Page 52: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/52.jpg)
The third dimension of the BellKor system
movie #17270
Great news to recommender systems:
• Works even better on real life datasets (?)
• Improve accuracy by exploiting implicit feedback
• Implicit behavior is abundant and easy to collect:
– Rental history
– Search patterns
– Browsing history
– ...
• Implicit feedback allows predicting personalized ratings for users that never rated!
4/6/2016 CS 572: Information Retrieval. Spring 2016 52
![Page 53: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/53.jpg)
![Page 54: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/54.jpg)
![Page 55: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/55.jpg)
Recommended Stories
Signed in, search history enabled
![Page 56: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/56.jpg)
![Page 57: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/57.jpg)
Challenges
• Scale
– Number of unique visitors in last 2 months: several millions
– Number of stories within last 2 months: several millions
• Item Churn
– News stories change every 10-15 mins
– Recency of stories is important
– Cannot build models ever so often
• Noisy ratings
– Clicks treated as noisy positive vote
![Page 58: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/58.jpg)
Approach
• Content-based vs. Collaborative filtering
• Collaborative filtering based on co-visitation
– Content agnostic: Can be applied to other application domains, languages, countries
– Google’s key strength: Huge user base and their data.
• Could have used content based filtering
• Focus on algorithms that are scalable
![Page 59: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/59.jpg)
Algorithm Overview
• Obtain a list of candidate stories
• For each story:
– Obtain 3 scores (y1, y2, y3)
– Final score = ∑ wi*yi
• User clustering algorithms (Minhash (y1), PLSI (y2))
– Score ~ number of times this story was clicked by other users in your cluster
• Story-story co-visitation (y3)
– Score ~ number of times this story was co-clicked with other stories in your recent click history
![Page 60: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/60.jpg)
Algorithm Overview …
• User clustering done offline as batch process
– Can be run every day or 2-3 times a week
• Cluster-story counts maintained in real time
• New users
– May not be clustered
– Rely on co-visitation to generate recommendations
![Page 61: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/61.jpg)
Personalization
Servers
StoryTable
(cluster +covisit
counts)
UserTable
(user clusters,
click hist)
User clustering
(Offline)
(Mapreduce)
Rank Request
Rank Reply
Click Notify
Read Stats
Update Stats
Read user profileUpdate profile
Architecture
Bigtables
News Frontend
Webserver
![Page 62: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/62.jpg)
User clustering - Minhash
||/||2121 uuuu SSSS
• Input: User and his clicked stories
• User similarity =
• Output: User clusters.
– Similar users belong to same cluster
},...,,{ 21
u
m
uu
u sssS
![Page 63: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/63.jpg)
Minhash …
Randomly permute the universe of clicked stories
min defined by permutation
Pseudo-random permutation
Compute hash for each story and treat hash-value as
permutation index
Treat MinHash value as ClusterId
Probabilistic clustering
},...,,{},...,,{ ''
2
'
121 mm ssssss
)min()( u
jsuMH
||
||)}()({
21
21
21
uu
uu
SS
SSuMHuMHP
![Page 64: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/64.jpg)
MinHash as Mapreduce
• Map phase:– Input: key = userid, value = storyid
– Compute hash for each story (parallelizable across all data)
– Output: key = userid value = storyhash
• Reduce phase:– Input: key = userid value = <list of storyhash values>
– Compute min over the story hash-values (parallelizable across users)
![Page 65: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/65.jpg)
PLSI Framework
u1
u2
u3
u4
Users
s1
s2
s3
Stories
z1
z2
Latent Classes
P(s|u) = ∑z P(s|z)*P(z|u)
![Page 66: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/66.jpg)
Clustering - PLSI Algorithm
• Learning (done offline)
– ML estimation • Learn model parameters that maximize the
likelihood of the sample data
• Ouput = P[zj|u]’s P[s|zj]’s
– P[zj|u]’s lead to a soft clustering of users
• Runtime: we only use P[zj|u]’s and
ignore P[s|zj]’s
u
z
s
P[z|u]
P[s|z]
![Page 67: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/67.jpg)
PLSI (EM estimation) as Mapreduce
• E step: (Map phase)
q*(z; u,s,Ø) = p(s|z)p(z|u)/(∑z p(s|z)p(z|u))
• M step: (Reduce phase)
p(s|z) = ∑u q*(z; u,s,Ø)/(∑s ∑u q*(z; u,s,Ø))
p(z|u) = ∑s q*(z; u,s,Ø)/(∑z ∑s q*(z; u,s,Ø))
• Cannot load the entire model from prev iteration in a single machine during map phase
• Trick: Partition users and stories. Each partition loads the stats pertinent to it
![Page 68: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/68.jpg)
PLSI as Mapreduce
![Page 69: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/69.jpg)
Covisitation
• For each story si store the covisitation counts with other stories c(si, sj )
• Candidate story: sk
• User history: s1,…, sn
• score (si, sj ) = c(si, sj )/∑m c(si, sm )
• total_score(sk) = ∑n score(sk, sn )
![Page 70: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/70.jpg)
Experimental results
![Page 71: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/71.jpg)
Live traffic clickthrough evaluation
![Page 72: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/72.jpg)
Summary
• Most effective approaches are often hybrid between collaborative and content-based filtering– c.f. Netflix, Google News recommendations
• Lots of data available, but algorithm scalability critical – (use Map/Reduce, Hadoop)
• Suggested rules-of-thumb (take w/ grain of salt):– If have moderate amounts of rating data for each user, use
collaborative filtering with kNN (reasonable baseline)– When little data available, use content (model-) based
recommendations
4/6/2016 CS 572: Information Retrieval. Spring 2016 72
![Page 73: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/73.jpg)
4/6/2016 CS 572: Information Retrieval. Spring 2016 73
![Page 74: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/74.jpg)
4/6/2016 CS 572: Information Retrieval. Spring 2016 74
![Page 75: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/75.jpg)
4/6/2016 CS 572: Information Retrieval. Spring 2016 75
![Page 76: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/76.jpg)
4/6/2016 CS 572: Information Retrieval. Spring 2016 76
![Page 77: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/77.jpg)
4/6/2016 CS 572: Information Retrieval. Spring 2016 77
![Page 78: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/78.jpg)
4/6/2016 CS 572: Information Retrieval. Spring 2016 78
![Page 79: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/79.jpg)
EXPLAINING RECOMMENDATIONS
4/6/2016 CS 572: Information Retrieval. Spring 2016 79
![Page 80: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/80.jpg)
Explanation is Critical
4/6/2016 CS 572: Information Retrieval. Spring 2016 80
![Page 81: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/81.jpg)
Explanations in recommender systems
Additional information to explain the system’s output following some objectives
![Page 82: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/82.jpg)
Objectives of explanations
• Transparency
• Validity
• Trustworthiness
• Persuasiveness
• Effectiveness
Efficiency
Satisfaction
Relevance
Comprehensibility
Education
![Page 83: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/83.jpg)
Explanations in general
• How? and Why? explanations in expert systems• Form of abductive reasoning
– Given: 𝐾𝐵⊨𝑅𝑆𝑖 (item i is recommended by method RS)– Find 𝐾𝐵′ ⊆ 𝐾𝐵 s.t. 𝐾𝐵′⊨𝑅𝑆𝑖
• Principle of succinctness– Find smallest subset of 𝐾𝐵′ ⊆ 𝐾𝐵 s.t. 𝐾𝐵′⊨𝑅𝑆𝑖
i.e. for all 𝐾𝐵′′ ⊂ 𝐾𝐵′ holds 𝐾𝐵′′⊭ 𝑅𝑆𝑖
• But additional filtering– Some parts relevant for
deduction, might be obviousfor humans
[Friedrich & Zanker, AI Magazine, 2011]
![Page 84: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/84.jpg)
Taxonomy for generating explanations in RS
Major design dimensions of current explanation components:
• Category of reasoning model for generating explanations – White box
– Black box
• RS paradigm for generating explanations– Determines the exploitable semantic relations
• Information categories
![Page 85: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/85.jpg)
RS paradigms and their ontologies
• Classes of objects – Users
– Items
– Properties
• N-ary relations between them
• Collaborative filtering– Neighborhood based CF (a)
– Matrix factorization (b)
• Introduces additional factors as proxies for determining similarities
![Page 86: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/86.jpg)
RS paradigms and their ontologies
• Content-based
– Properties characterizing items
– TF*IDF model
• Knowledge based
– Properties of items
– Properties of user model
– Additional mediating domain concepts
![Page 87: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/87.jpg)
• Similarity between items
• Similarity between users
• Tags– Tag relevance (for
item)– Tag preference (of
user)
![Page 88: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/88.jpg)
Results from testing the explanation feature
• Knowledgeable explanations significantly increase the users’
perceived utility
• Perceived utility strongly correlates with usage intention etc.
Explanation
Trust
PerceivedUtility
PositiveUsage exp.
Recommend others
Intention to repeated usage
** sign. < 1%, * sign. < 5%
+*
+**
+**
+**
+**
+
![Page 89: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/89.jpg)
Probabilistic View (Foster Provost)
4/6/2016 CS 572: Information Retrieval. Spring 2016 89
![Page 90: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/90.jpg)
Example
4/6/2016 CS 572: Information Retrieval. Spring 2016 90
![Page 91: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/91.jpg)
Consumers Increasingly Concerned About Inferences Made About Them
4/6/2016 CS 572: Information Retrieval. Spring 2016 91
![Page 92: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/92.jpg)
Example Result
4/6/2016 CS 572: Information Retrieval. Spring 2016 92
![Page 93: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/93.jpg)
Critical Eveidence(“Counter-factural”)
• Models can be viewed as evidence-combining systems
• For a specific decision, can ask:
• What is a minimal set of evidence such that if it were not present, the decision would not have been made?
4/6/2016 CS 572: Information Retrieval. Spring 2016 93
![Page 94: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/94.jpg)
Removing Evidence: Cloacking
4/6/2016 CS 572: Information Retrieval. Spring 2016 94
![Page 95: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/95.jpg)
How Many Likes Have to Remove?
4/6/2016 CS 572: Information Retrieval. Spring 2016 95
![Page 96: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/96.jpg)
Explanations in recommender systems: Summary
• There are many types of explanations and various goals that an explanation can achieve
• Which type of explanation can be generated depends greatly on the recommender approach applied
• Explanations may be used to shape the wishes and desires of customers but are a double-edged sword– On the one hand, explanations can help the customer to make
wise buying decisions;
– On the other hand, explanations can be abused to push a customer in a direction which is advantageous solely for the seller
• As a result a deep understanding of explanations and their effects on customers is of great interest.
![Page 97: CS 572: Information Retrievaleugene/cs572/lectures... · The Shawshank Redemption 4.593 137812 Lord of the Rings: The Return of the King 4.545 133597 The Green Mile 4.306 180883 Lord](https://reader033.fdocuments.in/reader033/viewer/2022042803/5f4cde3b22f39663cb5ee34a/html5/thumbnails/97.jpg)
Resources
• Recommender Systems Survey: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1423975
• Koren et al. on winning the NetFlix challenge:
– http://videolectures.net/kdd09_koren_cftd/
– http://www.searchenginecaffe.com/2009/09/winning-
1m-netflix-prize-methods.html
• http://www.recommenderbook.net/
4/6/2016 CS 572: Information Retrieval. Spring 2016 97