Post on 17-Dec-2015
Filling value to unrated items in Collaborative
Filtering Supervisor: Associate Prof. Jiuyong Li(John)
Student: Kang SunDate: 28th May 2010
OutlineIntroductionMotivationsRelated workExperiments Conclusion
Introductionfriends and neighbours were the main
resource to provide recommendationsrecommendations from friends
a) best café in the local areab) best book in particular topic
MotivationFind out a more reliable and accuracy
solutionLarge database supposed to help user to get
more accuracy result, however, when recommendation turn to online, similar user become hard to found
Research question How to build up a framework to improve the
prediction accuracy among recommendation data sets?
DilemmaNormally, data store in the large online
recommendation database contains lot of unrated items.
Unrated items could affect the result of recommendation
Related workSparse Matrix Prediction Filling in
Collaborative Filtering[Liu et al. 2009b]• Develop the approach to overcome the sparse
problem in user-based and item-based • Similarity computation based on the Boolean
matrix
Related workEffective Missing Data Prediction for
Collaborative Filtering[Ma, King & Lyu 2007]• Develop user information and item information
combination to give better performance
Related workA Hybrid User and Item-based Collaborative
Filtering with Smoothing on Sparse Data[(Rong & Yansheng 2006]• a framework to alleviate sparse problem • smoothing did increased the quality of
recommendation by their experiments
research data sets
Whole Jester has 617, 000 ratings of 100 jokes by 24, 900 users. range from −10 to +10. Whole rating matrix is filled to about 25%.
This research was using the part 1 of the three parts Data from 24,983 users who have rated 36 or more jokes, a matrix with dimensions 24983 X 101
data set is in .CSV format
Data pre-processingJester data sets using 99 to represent the
unrated valueFirst step is to change all the unrated values
to 0.Second part is the most important part of this
research which is predict the necessary unrated value for future prediction generation
All the data processing were using R programming
Data set preview-7.82 8.79 -9.66 -8.16 -7.52 -8.5 -9.85 4.17 -8.98 -4.764.08 -0.29 6.36 4.37 -2.38 -9.66 -0.73 -5.34 8.88 9.22
99 99 99 99 9.03 9.27 9.03 9.27 99 9999 8.35 99 99 1.8 8.16 -2.82 6.21 99 1.84
8.5 4.61 -4.17 -5.39 1.36 1.6 7.04 4.61 -0.44 5.73-6.17 -3.54 0.44 -8.5 -7.09 -4.32 -8.69 -0.87 -6.65 -1.8
99 99 99 99 8.59 -9.85 7.72 8.79 99 996.84 3.16 9.17 -6.21 -8.16 -1.7 9.27 1.41 -5.19 -4.42
-3.79 -3.54 -9.42 -6.89 -8.74 -0.29 -5.29 -8.93 -7.86 -1.63.01 5.15 5.15 3.01 6.41 5.15 8.93 2.52 3.01 8.16
-2.91 4.08 99 99 -5.73 99 2.48 -5.29 99 1.461.31 1.8 2.57 -2.38 0.73 0.73 -0.97 5 -7.23 -1.36
99 99 99 99 5.87 99 5.58 0.53 99 7.149.22 9.27 9.22 8.3 7.43 0.44 3.5 8.16 5.97 8.988.79 -5.78 6.02 3.69 7.77 -5.83 8.69 8.59 -5.92 7.52-3.5 1.55 2.33 -4.13 4.22 -2.28 -2.96 -0.49 2.91 1.99
99 -9.27 99 99 -7.38 99 8.74 -6.31 99 2.333.16 7.62 3.79 8.25 4.22 7.62 2.43 0.97 0.53 0.834.22 3.64 99 99 2.52 99 4.13 -5.19 99 7.91
99 7.62 99 99 -8.64 2.43 8.93 -6.6 99 -9.472.57 -0.73 99 99 2.57 99 -4.22 2.67 99 -1.317.28 5.39 99 99 -4.22 99 8.93 3.5 99 6.12
Data processing approach
Joke1 Joke2 Joke3 Joke4 Joke5 Joke6 Joke7 Joke
8
User 2 0 0 1 3 4 3 0 0
User 3 0 0 0 4 5 4 0 0
User 4 0 0 2 3 3 2 0 0
Manhattan distance measure is applied
Data processingDistance between user 2 and user 3 is four Distance between user 2 and user 4 is threeUser 4 seems more close to user 3
Data processing approach(con’d.)
Joke1 Joke2 Joke3 Joke4 Joke5 Joke6 Joke7 Joke
8
User 2 0 0 1 3 4 3 0 0
User 3 0 0 1 4 5 4 0 0
User 4 0 0 2 3 3 2 0 0
Data processingDistance between user 2 and user 3 is threeDistance between user 2 and user 4 is threeBoth user 3 and 4 has the same distance with
user 2
Measurement of accuracyrelative squared error used to computing
the accuracy
Traditional CF accuracy of joke 3Accuracy= 1-(1-2)²/1=0
Current approach accuracy Accuracy=1-(1-1.5)²/1=75%
User similarity comparison
Joke1 Joke2 Joke3 Joke4 Joke 5 Joke 6 Joke7 Joke 80
1
2
3
4
5
6
User 2User 3User 4
Conclusion Heavy computation forceMethods for both unrated value and missing
value
References Liu, Z, Wang, H, Qu, W, Liu, W & Fan, R 2009b, Sparse Matrix
Prediction Filling in Collaborative Filtering, IEEE Computer Society, pp. 304-307.
Ma, H, King, I & Lyu, MR 2007, Effective missing data prediction for collaborative filtering, ACM, Amsterdam, The Netherlands, pp. 39-46.
Rong, H & Yansheng, L 2006, 'A Hybrid User and Item-Based Collaborative Filtering with Smoothing on Sparse Data', paper presented at the Artificial Reality and Telexistence--Workshops, 2006. ICAT '06. 16th International Conference on, Nov. 2006.