Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

20
Filling value to unrated items in Collaborative Filtering Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010

Transcript of Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Page 1: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Filling value to unrated items in Collaborative

Filtering Supervisor: Associate Prof. Jiuyong Li(John)

Student: Kang SunDate: 28th May 2010

Page 2: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

OutlineIntroductionMotivationsRelated workExperiments Conclusion

Page 3: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Introductionfriends and neighbours were the main

resource to provide recommendationsrecommendations from friends

a) best café in the local areab) best book in particular topic

Page 4: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

MotivationFind out a more reliable and accuracy

solutionLarge database supposed to help user to get

more accuracy result, however, when recommendation turn to online, similar user become hard to found

Page 5: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Research question How to build up a framework to improve the

prediction accuracy among recommendation data sets?

Page 6: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

DilemmaNormally, data store in the large online

recommendation database contains lot of unrated items.

Unrated items could affect the result of recommendation

Page 7: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Related workSparse Matrix Prediction Filling in

Collaborative Filtering[Liu et al. 2009b]• Develop the approach to overcome the sparse

problem in user-based and item-based • Similarity computation based on the Boolean

matrix

Page 8: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Related workEffective Missing Data Prediction for

Collaborative Filtering[Ma, King & Lyu 2007]• Develop user information and item information

combination to give better performance

Page 9: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Related workA Hybrid User and Item-based Collaborative

Filtering with Smoothing on Sparse Data[(Rong & Yansheng 2006]• a framework to alleviate sparse problem • smoothing did increased the quality of

recommendation by their experiments

Page 10: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

research data sets

Whole Jester has 617, 000 ratings of 100 jokes by 24, 900 users. range from −10 to +10. Whole rating matrix is filled to about 25%.

This research was using the part 1 of the three parts Data from 24,983 users who have rated 36 or more jokes, a matrix with dimensions 24983 X 101

data set is in .CSV format

Page 11: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Data pre-processingJester data sets using 99 to represent the

unrated valueFirst step is to change all the unrated values

to 0.Second part is the most important part of this

research which is predict the necessary unrated value for future prediction generation

All the data processing were using R programming

Page 12: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Data set preview-7.82 8.79 -9.66 -8.16 -7.52 -8.5 -9.85 4.17 -8.98 -4.764.08 -0.29 6.36 4.37 -2.38 -9.66 -0.73 -5.34 8.88 9.22

99 99 99 99 9.03 9.27 9.03 9.27 99 9999 8.35 99 99 1.8 8.16 -2.82 6.21 99 1.84

8.5 4.61 -4.17 -5.39 1.36 1.6 7.04 4.61 -0.44 5.73-6.17 -3.54 0.44 -8.5 -7.09 -4.32 -8.69 -0.87 -6.65 -1.8

99 99 99 99 8.59 -9.85 7.72 8.79 99 996.84 3.16 9.17 -6.21 -8.16 -1.7 9.27 1.41 -5.19 -4.42

-3.79 -3.54 -9.42 -6.89 -8.74 -0.29 -5.29 -8.93 -7.86 -1.63.01 5.15 5.15 3.01 6.41 5.15 8.93 2.52 3.01 8.16

-2.91 4.08 99 99 -5.73 99 2.48 -5.29 99 1.461.31 1.8 2.57 -2.38 0.73 0.73 -0.97 5 -7.23 -1.36

99 99 99 99 5.87 99 5.58 0.53 99 7.149.22 9.27 9.22 8.3 7.43 0.44 3.5 8.16 5.97 8.988.79 -5.78 6.02 3.69 7.77 -5.83 8.69 8.59 -5.92 7.52-3.5 1.55 2.33 -4.13 4.22 -2.28 -2.96 -0.49 2.91 1.99

99 -9.27 99 99 -7.38 99 8.74 -6.31 99 2.333.16 7.62 3.79 8.25 4.22 7.62 2.43 0.97 0.53 0.834.22 3.64 99 99 2.52 99 4.13 -5.19 99 7.91

99 7.62 99 99 -8.64 2.43 8.93 -6.6 99 -9.472.57 -0.73 99 99 2.57 99 -4.22 2.67 99 -1.317.28 5.39 99 99 -4.22 99 8.93 3.5 99 6.12

Page 13: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Data processing approach

Joke1 Joke2 Joke3 Joke4 Joke5 Joke6 Joke7 Joke

8

User 2 0 0 1 3 4 3 0 0

User 3 0 0 0 4 5 4 0 0

User 4 0 0 2 3 3 2 0 0

Manhattan distance measure is applied

Page 14: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Data processingDistance between user 2 and user 3 is four Distance between user 2 and user 4 is threeUser 4 seems more close to user 3

Page 15: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Data processing approach(con’d.)

Joke1 Joke2 Joke3 Joke4 Joke5 Joke6 Joke7 Joke

8

User 2 0 0 1 3 4 3 0 0

User 3 0 0 1 4 5 4 0 0

User 4 0 0 2 3 3 2 0 0

Page 16: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Data processingDistance between user 2 and user 3 is threeDistance between user 2 and user 4 is threeBoth user 3 and 4 has the same distance with

user 2

Page 17: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Measurement of accuracyrelative squared error used to computing

the accuracy

Traditional CF accuracy of joke 3Accuracy= 1-(1-2)²/1=0

Current approach accuracy Accuracy=1-(1-1.5)²/1=75%

Page 18: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

User similarity comparison

Joke1 Joke2 Joke3 Joke4 Joke 5 Joke 6 Joke7 Joke 80

1

2

3

4

5

6

User 2User 3User 4

Page 19: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

Conclusion Heavy computation forceMethods for both unrated value and missing

value

Page 20: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.

References Liu, Z, Wang, H, Qu, W, Liu, W & Fan, R 2009b, Sparse Matrix

Prediction Filling in Collaborative Filtering, IEEE Computer Society, pp. 304-307.

Ma, H, King, I & Lyu, MR 2007, Effective missing data prediction for collaborative filtering, ACM, Amsterdam, The Netherlands, pp. 39-46.

Rong, H & Yansheng, L 2006, 'A Hybrid User and Item-Based Collaborative Filtering with Smoothing on Sparse Data', paper presented at the Artificial Reality and Telexistence--Workshops, 2006. ICAT '06. 16th International Conference on, Nov. 2006.