Post on 26-Jan-2015
description
X. Amatriain et. alRate It Again
Rate it AgainIncreasing Recommendation Accuracy by
User reRatingXavier Amatriain (with J.M. Pujol, N. Tintarev, N. Oliver)
Telefonica ResearchRecsys 09
X. Amatriain et. alRate It Again
The Recommender Problem
● Two ways to address it
1. Improve the Algorithm
X. Amatriain et. alRate It Again
The Recommender Problem
● Two ways to address it
2. Improve the Input Data
Time for Data Cleaning!
X. Amatriain et. alRate It Again
User Feedback is Noisy
● See our UMAP '09 Publication: “I like it... I like it not” (Amatriain et al. '09)
X. Amatriain et. alRate It Again
Natural Noise Limits our User Model
DID YOU HEAR WHAT I LIKE??!!
...and Our Prediction Accuracy
X. Amatriain et. alRate It Again
Experimental setup● 118 participants rated movies in 3 trials
T1 (rand) <> 24 h <>T
2 (pop.) <> 15 days <>T
3 (rand)
● 100 Movies from Netflix dataset, stratified random sampling on popularity
● Ratings on a 1 to 5 star scale with special “not seen” symbol.
X. Amatriain et. alRate It Again
Users are Inconsistent
● What is the probability of making an inconsistency given an original rating
X. Amatriain et. alRate It Again
Users are Inconsistent
● What is the percentage of inconsistencies given an original rating
Mild ratings are noisier
X. Amatriain et. alRate It Again
Users are Inconsistent
● What is the percentage of inconsistencies given an original rating
Negative ratings are noisier
X. Amatriain et. alRate It Again
Prediction Accuracy#T
i#T
j# RMSE
T
1, T
2 2185 1961 1838 2308 0.573 0.707
T1, T
3 2185 1909 1774 2320 0.637 0.765
T2, T
3 1969 1909 1730 2140 0.557 0.694
● Pairwise RMSE between trials considering intersection and union of both sets
X. Amatriain et. alRate It Again
Prediction Accuracy#T
i#T
j# RMSE
T
1, T
2 2185 1961 1838 2308 0.573 0.707
T1, T
3 2185 1909 1774 2320 0.637 0.765
T2, T
3 1969 1909 1730 2140 0.557 0.694
● Pairwise RMSE between trials considering intersection and union of both sets
Max error in trials that are most distant in time
X. Amatriain et. alRate It Again
Prediction Accuracy#T
i#T
j# RMSE
T
1, T
2 2185 1961 1838 2308 0.573 0.707
T1, T
3 2185 1909 1774 2320 0.637 0.765
T2, T
3 1969 1909 1730 2140 0.557 0.694
● Pairwise RMSE between trials considering intersection and union of both sets
Significant less error when 2nd trial is involved
X. Amatriain et. alRate It Again
Algorithm Robustness to NNAlg./Trial T
1T
2 T
3 T
worst /T
best
User Average
1.2011 1.1469 1.1945 4.7%
Item Average
1.0555 1.0361 1.0776 4%
Userbased kNN
0.9990 0.9640 1.0171 5.5%
Itembased kNN
1.0429 1.0031 1.0417 4%
SVD 1.0244 0.9861 1.0285 4.3%
● RMSE for different Recommendation algorithms when predicting each of the trials
X. Amatriain et. alRate It Again
Algorithm Robustness to NNAlg./Trial T
1T
2 T
3 T
worst /T
best
User Average
1.2011 1.1469 1.1945 4.7%
Item Average
1.0555 1.0361 1.0776 4%
Userbased kNN
0.9990 0.9640 1.0171 5.5%
Itembased kNN
1.0429 1.0031 1.0417 4%
SVD 1.0244 0.9861 1.0285 4.3%
● RMSE for different Recommendation algorithms when predicting each of the trials
Trial 2 is consistently the least noisy
X. Amatriain et. alRate It Again
Algorithm Robustness to NN (2)TrainingTesting Dataset
T1-T
2T
1-T
3T
2-T
3
User Average 1.1585 1.2095 1.2036
Movie Average 1.0305 1.0648 1.0637
Userbased kNN 0.9693 1.0143 1.0184
Itembased kNN 1.0009 1.0406 1.0590
SVD 0.9741 1.0491 1.0118
● RMSE for different Recommendation algorithms when predicting ratings in one trial (testing) from ratings on another (training)
X. Amatriain et. alRate It Again
Algorithm Robustness to NN (2)TrainingTesting Dataset
T1-T
2T
1-T
3T
2-T
3
User Average 1.1585 1.2095 1.2036
Movie Average 1.0305 1.0648 1.0637
Userbased kNN 0.9693 1.0143 1.0184
Itembased kNN 1.0009 1.0406 1.0590
SVD 0.9741 1.0491 1.0118
● RMSE for different Recommendation algorithms when predicting ratings in one trial (testing) from ratings on another (training)
Noise is minimized when we predict Trial 2
X. Amatriain et. alRate It Again
Let's recap
● Users are inconsistent● Inconsistencies can depend on many things
including how the items are presented● Inconsistencies produce natural noise● Natural noise reduces our prediction accuracy
independently of the algorithm
X. Amatriain et. alRate It Again
Hypothesis
● If we can somehow reduce natural noise due to user inconsistencies we could greatly improve recommendation accuracy.
● We can reduce natural noise by taking advantage of user inconsistencies when rerating items.
X. Amatriain et. alRate It Again
Algorithm
● Given a rating dataset where (some) items have been rerated,
● Two fairness conditions:
1. Algorithm should remove as few ratings as possible (i.e. only when there is some certainty that the rating is only adding noise)
2.Algorithm should not make up new ratings but decide on which of the existing ones are valid.
X. Amatriain et. alRate It Again
Algorithm● One source rerating case:
● Given the following milding function:
X. Amatriain et. alRate It Again
Results
● Onesource rerating (Denoised Denoising)⊚
T1⊚T
2ΔT
1T
1⊚T
3ΔT
1T
2⊚T
3ΔT
2
Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%
SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%
Datasets T1
(⊚ T2, T
3) ΔT
1
Userbased kNN 0.8647 13.4%
SVD 0.8800 14.1%
● Twosource rerating (Denoising T1with the other 2)
X. Amatriain et. alRate It Again
Results
● Onesource rerating (Denoised Denoising)⊚
T1⊚T
2ΔT
1T
1⊚T
3ΔT
1T
2⊚T
3ΔT
2
Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%
SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%
Datasets T1
(⊚ T2, T
3) ΔT
1
Userbased kNN 0.8647 13.4%
SVD 0.8800 14.1%
● Twosource rerating (Denoising T1with the other 2)
Best results (above 10%!) when denoising noisy trial with less noisy
X. Amatriain et. alRate It Again
Results
● Oneway rerating (Denoised Denoising)⊚
T1⊚T
2ΔT
1T
1⊚T
3ΔT
1T
2⊚T
3ΔT
2
Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%
SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%
Datasets T1
(⊚ T2, T
3) ΔT
1
Userbased kNN 0.8647 13.4%
SVD 0.8800 14.1%
● Twoway rerating (Denoising T1with the other 2)
Smaller (yet important) improvement when denoising less noisy set
X. Amatriain et. alRate It Again
Results
● Oneway rerating (Denoised Denoising)⊚
T1⊚T
2ΔT
1T
1⊚T
3ΔT
1T
2⊚T
3ΔT
2
Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%
SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%
Datasets T1
(⊚ T2, T
3) ΔT
1
Userbased kNN 0.8647 13.4%
SVD 0.8800 14.1%
● Twoway rerating (Denoising T1with the other 2)
Improvements up to 14% with 2 reratings!
X. Amatriain et. alRate It Again
But...
● We can't expect all users to rerate all items once or twice to improve accuracy!
● Need to devise methods to selectively choose which ratings to denoise:
– Random selection
– Datadependent (select ratings based on values)
– Userdependent (select ratings based on how “noisy” user is)
X. Amatriain et. alRate It Again
Random rerating
● Improvement in RMSE when doing oncesource (left) and twosource (right) rerating as a function of the percentage of randomlyselected denoised ratings (T
1⊚T
3 )
X. Amatriain et. alRate It Again
Random rerating
● Improvement in RMSE when doing oncesource (left) and twosource (right) rerating as a function of the percentage of randomlyselected denoised ratings (T
1⊚T
3 )
X. Amatriain et. alRate It Again
Denoise Extreme Ratings
● Improvement in RMSE when doing oncesource (left) and twosource (right) rerating as a function of the percentage of denoised ratings: selecting only extreme
X. Amatriain et. alRate It Again
Denoise Extreme Ratings
● Improvement in RMSE when doing oncesource (left) and twosource (right) rerating as a function of the percentage of denoised ratings: selecting only extreme
X. Amatriain et. alRate It Again
Denoise outliers
● Improvement in RMSE when doing oncesource (left) and twosource (right) rerating as a function of the percentage of denoised ratings and users: selecting only noisy users and extreme ratings
X. Amatriain et. alRate It Again
Denoise outliers
● Improvement in RMSE when doing oncesource (left) and twosource (right) rerating as a function of the percentage of denoised ratings and users: selecting only noisy users and extreme ratings
X. Amatriain et. alRate It Again
Value of Rating
● Is it worth to add new ratings or rerate existing items? RMSE improvement as a function of new ratings added in each case.
An extreme rerating improves RMSE 10 times more than adding a new rating!
X. Amatriain et. alRate It Again
Conclusions
● Improving data can be more beneficial than improving the algorithm
● Natural noise limits the accuracy of Recommender Systems
● We can reduce natural noise by asking users to rerate items
● There are strategies to minimize the impact of the rerating process
● The value of a rerate may be higher than that of a new rating
X. Amatriain et. alRate It Again
Rate it AgainIncreasing Recommendation Accuracy by
User reRating
Thanks!