Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Filling value to unrated items in Collaborative

Filtering Supervisor: Associate Prof. Jiuyong Li(John)

Student: Kang SunDate: 28th May 2010

OutlineIntroductionMotivationsRelated workExperiments Conclusion

Introductionfriends and neighbours were the main

resource to provide recommendationsrecommendations from friends

a) best café in the local areab) best book in particular topic

MotivationFind out a more reliable and accuracy

solutionLarge database supposed to help user to get

more accuracy result, however, when recommendation turn to online, similar user become hard to found

Research question How to build up a framework to improve the

prediction accuracy among recommendation data sets?

DilemmaNormally, data store in the large online

recommendation database contains lot of unrated items.

Unrated items could affect the result of recommendation

Related workSparse Matrix Prediction Filling in

Collaborative Filtering[Liu et al. 2009b]• Develop the approach to overcome the sparse

problem in user-based and item-based • Similarity computation based on the Boolean

matrix

Related workEffective Missing Data Prediction for

Collaborative Filtering[Ma, King & Lyu 2007]• Develop user information and item information

combination to give better performance

Related workA Hybrid User and Item-based Collaborative

Filtering with Smoothing on Sparse Data[(Rong & Yansheng 2006]• a framework to alleviate sparse problem • smoothing did increased the quality of

recommendation by their experiments

research data sets

Whole Jester has 617, 000 ratings of 100 jokes by 24, 900 users. range from −10 to +10. Whole rating matrix is filled to about 25%.

This research was using the part 1 of the three parts Data from 24,983 users who have rated 36 or more jokes, a matrix with dimensions 24983 X 101

data set is in .CSV format

Data pre-processingJester data sets using 99 to represent the

unrated valueFirst step is to change all the unrated values

to 0.Second part is the most important part of this

research which is predict the necessary unrated value for future prediction generation

All the data processing were using R programming

Data set preview-7.82 8.79 -9.66 -8.16 -7.52 -8.5 -9.85 4.17 -8.98 -4.764.08 -0.29 6.36 4.37 -2.38 -9.66 -0.73 -5.34 8.88 9.22

99 99 99 99 9.03 9.27 9.03 9.27 99 9999 8.35 99 99 1.8 8.16 -2.82 6.21 99 1.84

8.5 4.61 -4.17 -5.39 1.36 1.6 7.04 4.61 -0.44 5.73-6.17 -3.54 0.44 -8.5 -7.09 -4.32 -8.69 -0.87 -6.65 -1.8

99 99 99 99 8.59 -9.85 7.72 8.79 99 996.84 3.16 9.17 -6.21 -8.16 -1.7 9.27 1.41 -5.19 -4.42

-3.79 -3.54 -9.42 -6.89 -8.74 -0.29 -5.29 -8.93 -7.86 -1.63.01 5.15 5.15 3.01 6.41 5.15 8.93 2.52 3.01 8.16

-2.91 4.08 99 99 -5.73 99 2.48 -5.29 99 1.461.31 1.8 2.57 -2.38 0.73 0.73 -0.97 5 -7.23 -1.36

99 99 99 99 5.87 99 5.58 0.53 99 7.149.22 9.27 9.22 8.3 7.43 0.44 3.5 8.16 5.97 8.988.79 -5.78 6.02 3.69 7.77 -5.83 8.69 8.59 -5.92 7.52-3.5 1.55 2.33 -4.13 4.22 -2.28 -2.96 -0.49 2.91 1.99

99 -9.27 99 99 -7.38 99 8.74 -6.31 99 2.333.16 7.62 3.79 8.25 4.22 7.62 2.43 0.97 0.53 0.834.22 3.64 99 99 2.52 99 4.13 -5.19 99 7.91

99 7.62 99 99 -8.64 2.43 8.93 -6.6 99 -9.472.57 -0.73 99 99 2.57 99 -4.22 2.67 99 -1.317.28 5.39 99 99 -4.22 99 8.93 3.5 99 6.12

Data processing approach

Joke1 Joke2 Joke3 Joke4 Joke5 Joke6 Joke7 Joke

8

User 2 0 0 1 3 4 3 0 0

User 3 0 0 0 4 5 4 0 0

User 4 0 0 2 3 3 2 0 0

Manhattan distance measure is applied

Data processingDistance between user 2 and user 3 is four Distance between user 2 and user 4 is threeUser 4 seems more close to user 3

Data processing approach(con’d.)

Joke1 Joke2 Joke3 Joke4 Joke5 Joke6 Joke7 Joke

8

User 2 0 0 1 3 4 3 0 0

User 3 0 0 1 4 5 4 0 0

User 4 0 0 2 3 3 2 0 0

Data processingDistance between user 2 and user 3 is threeDistance between user 2 and user 4 is threeBoth user 3 and 4 has the same distance with

user 2

Measurement of accuracyrelative squared error used to computing

the accuracy

Traditional CF accuracy of joke 3Accuracy= 1-(1-2)²/1=0

Current approach accuracy Accuracy=1-(1-1.5)²/1=75%

User similarity comparison

Joke1 Joke2 Joke3 Joke4 Joke 5 Joke 6 Joke7 Joke 80

1

2

3

4

5

6

User 2User 3User 4

Conclusion Heavy computation forceMethods for both unrated value and missing

value

References Liu, Z, Wang, H, Qu, W, Liu, W & Fan, R 2009b, Sparse Matrix

Prediction Filling in Collaborative Filtering, IEEE Computer Society, pp. 304-307.

Ma, H, King, I & Lyu, MR 2007, Effective missing data prediction for collaborative filtering, ACM, Amsterdam, The Netherlands, pp. 39-46.

Rong, H & Yansheng, L 2006, 'A Hybrid User and Item-Based Collaborative Filtering with Smoothing on Sparse Data', paper presented at the Artificial Reality and Telexistence--Workshops, 2006. ICAT '06. 16th International Conference on, Nov. 2006.

Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Documents

Transcript of Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.