Rate it Again

X. Amatriain et. alRate It Again

Rate it AgainIncreasing Recommendation Accuracy by

User reRatingXavier Amatriain (with J.M. Pujol, N. Tintarev, N. Oliver)

Telefonica ResearchRecsys 09


The Recommender Problem

● Two ways to address it

1. Improve the Algorithm


The Recommender Problem

● Two ways to address it

2. Improve the Input Data

Time for Data Cleaning!


User Feedback is Noisy

● See our UMAP '09 Publication: “I like it... I like it not” (Amatriain et al. '09)


Natural Noise Limits our User Model

DID YOU HEAR WHAT I LIKE??!!

...and Our Prediction Accuracy


Experimental setup● 118 participants rated movies in 3 trials

T1 (rand) <> 24 h <>T

2 (pop.) <> 15 days <>T

3 (rand)

● 100 Movies from Netflix dataset, stratified random sampling on popularity

● Ratings on a 1 to 5 star scale with special “not seen” symbol.


Users are Inconsistent

● What is the probability of making an inconsistency given an original rating



● What is the percentage of inconsistencies given an original rating

Mild ratings are noisier



● What is the percentage of inconsistencies given an original rating

Negative ratings are noisier


Prediction Accuracy#T

i#T

j# RMSE

T

1, T

2 2185 1961 1838 2308 0.573 0.707

T1, T

3 2185 1909 1774 2320 0.637 0.765

T2, T

3 1969 1909 1730 2140 0.557 0.694

● Pairwise RMSE between trials considering intersection and union of both sets



i#T

j# RMSE

T

1, T

2 2185 1961 1838 2308 0.573 0.707

T1, T

3 2185 1909 1774 2320 0.637 0.765

T2, T

3 1969 1909 1730 2140 0.557 0.694


Max error in trials that are most distant in time



i#T

j# RMSE

T

1, T

2 2185 1961 1838 2308 0.573 0.707

T1, T

3 2185 1909 1774 2320 0.637 0.765

T2, T

3 1969 1909 1730 2140 0.557 0.694


Significant less error when 2nd trial is involved


Algorithm Robustness to NNAlg./Trial T

1T

2 T

3 T

worst /T

best

User Average

1.2011 1.1469 1.1945 4.7%

Item Average

1.0555 1.0361 1.0776 4%

Userbased kNN

0.9990 0.9640 1.0171 5.5%

Itembased kNN

1.0429 1.0031 1.0417 4%

SVD 1.0244 0.9861 1.0285 4.3%

● RMSE for different Recommendation algorithms when predicting each of the trials


Algorithm Robustness to NNAlg./Trial T

1T

2 T

3 T

worst /T

best

User Average

1.2011 1.1469 1.1945 4.7%

Item Average

1.0555 1.0361 1.0776 4%

Userbased kNN

0.9990 0.9640 1.0171 5.5%

Itembased kNN

1.0429 1.0031 1.0417 4%

SVD 1.0244 0.9861 1.0285 4.3%

● RMSE for different Recommendation algorithms when predicting each of the trials

Trial 2 is consistently the least noisy


Algorithm Robustness to NN (2)TrainingTesting Dataset

T1-T

2T

1-T

3T

2-T

3

User Average 1.1585 1.2095 1.2036

Movie Average 1.0305 1.0648 1.0637

Userbased kNN 0.9693 1.0143 1.0184

Itembased kNN 1.0009 1.0406 1.0590

SVD 0.9741 1.0491 1.0118

● RMSE for different Recommendation algorithms when predicting ratings in one trial (testing) from ratings on another (training)


Algorithm Robustness to NN (2)TrainingTesting Dataset

T1-T

2T

1-T

3T

2-T

3

User Average 1.1585 1.2095 1.2036

Movie Average 1.0305 1.0648 1.0637

Userbased kNN 0.9693 1.0143 1.0184

Itembased kNN 1.0009 1.0406 1.0590

SVD 0.9741 1.0491 1.0118

● RMSE for different Recommendation algorithms when predicting ratings in one trial (testing) from ratings on another (training)

Noise is minimized when we predict Trial 2


Let's recap

● Users are inconsistent● Inconsistencies can depend on many things

including how the items are presented● Inconsistencies produce natural noise● Natural noise reduces our prediction accuracy

independently of the algorithm


Hypothesis

● If we can somehow reduce natural noise due to user inconsistencies we could greatly improve recommendation accuracy.

● We can reduce natural noise by taking advantage of user inconsistencies when rerating items.


Algorithm

● Given a rating dataset where (some) items have been rerated,

● Two fairness conditions:

1. Algorithm should remove as few ratings as possible (i.e. only when there is some certainty that the rating is only adding noise)

2.Algorithm should not make up new ratings but decide on which of the existing ones are valid.


Algorithm● One source rerating case:

● Given the following milding function:


Results

● Onesource rerating (Denoised Denoising)⊚

T1⊚T

2ΔT

1T

1⊚T

3ΔT

1T

2⊚T

3ΔT

2

Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%

SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%

Datasets T1

(⊚ T2, T

3) ΔT

1

Userbased kNN 0.8647 13.4%

SVD 0.8800 14.1%

● Twosource rerating (Denoising T1with the other 2)


Results

● Onesource rerating (Denoised Denoising)⊚

T1⊚T

2ΔT

1T

1⊚T

3ΔT

1T

2⊚T

3ΔT

2

Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%

SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%

Datasets T1

(⊚ T2, T

3) ΔT

1


SVD 0.8800 14.1%

● Twosource rerating (Denoising T1with the other 2)

Best results (above 10%!) when denoising noisy trial with less noisy


Results

● Oneway rerating (Denoised Denoising)⊚

T1⊚T

2ΔT

1T

1⊚T

3ΔT

1T

2⊚T

3ΔT

2

Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%

SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%

Datasets T1

(⊚ T2, T

3) ΔT

1


SVD 0.8800 14.1%

● Twoway rerating (Denoising T1with the other 2)

Smaller (yet important) improvement when denoising less noisy set


Results

● Oneway rerating (Denoised Denoising)⊚

T1⊚T

2ΔT

1T

1⊚T

3ΔT

1T

2⊚T

3ΔT

2

Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%

SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%

Datasets T1

(⊚ T2, T

3) ΔT

1


SVD 0.8800 14.1%

● Twoway rerating (Denoising T1with the other 2)

Improvements up to 14% with 2 reratings!


But...

● We can't expect all users to rerate all items once or twice to improve accuracy!

● Need to devise methods to selectively choose which ratings to denoise:

– Random selection

– Datadependent (select ratings based on values)

– Userdependent (select ratings based on how “noisy” user is)


Random rerating

● Improvement in RMSE when doing oncesource (left) and twosource (right) rerating as a function of the percentage of randomlyselected denoised ratings (T

1⊚T

3 )


Denoise Extreme Ratings

● Improvement in RMSE when doing oncesource (left) and twosource (right) rerating as a function of the percentage of denoised ratings: selecting only extreme


Denoise outliers

● Improvement in RMSE when doing oncesource (left) and twosource (right) rerating as a function of the percentage of denoised ratings and users: selecting only noisy users and extreme ratings


Value of Rating

● Is it worth to add new ratings or rerate existing items? RMSE improvement as a function of new ratings added in each case.

An extreme rerating improves RMSE 10 times more than adding a new rating!


Conclusions

● Improving data can be more beneficial than improving the algorithm

● Natural noise limits the accuracy of Recommender Systems

● We can reduce natural noise by asking users to rerate items

● There are strategies to minimize the impact of the rerating process

● The value of a rerate may be higher than that of a new rating


Rate it AgainIncreasing Recommendation Accuracy by

User reRating

Thanks!

Rate it Again

Technology

Transcript of Rate it Again