Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.
-
Upload
amber-peters -
Category
Documents
-
view
216 -
download
2
Transcript of Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.
![Page 1: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/1.jpg)
Filling value to unrated items in Collaborative
Filtering Supervisor: Associate Prof. Jiuyong Li(John)
Student: Kang SunDate: 28th May 2010
![Page 2: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/2.jpg)
OutlineIntroductionMotivationsRelated workExperiments Conclusion
![Page 3: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/3.jpg)
Introductionfriends and neighbours were the main
resource to provide recommendationsrecommendations from friends
a) best café in the local areab) best book in particular topic
![Page 4: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/4.jpg)
MotivationFind out a more reliable and accuracy
solutionLarge database supposed to help user to get
more accuracy result, however, when recommendation turn to online, similar user become hard to found
![Page 5: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/5.jpg)
Research question How to build up a framework to improve the
prediction accuracy among recommendation data sets?
![Page 6: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/6.jpg)
DilemmaNormally, data store in the large online
recommendation database contains lot of unrated items.
Unrated items could affect the result of recommendation
![Page 7: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/7.jpg)
Related workSparse Matrix Prediction Filling in
Collaborative Filtering[Liu et al. 2009b]• Develop the approach to overcome the sparse
problem in user-based and item-based • Similarity computation based on the Boolean
matrix
![Page 8: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/8.jpg)
Related workEffective Missing Data Prediction for
Collaborative Filtering[Ma, King & Lyu 2007]• Develop user information and item information
combination to give better performance
![Page 9: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/9.jpg)
Related workA Hybrid User and Item-based Collaborative
Filtering with Smoothing on Sparse Data[(Rong & Yansheng 2006]• a framework to alleviate sparse problem • smoothing did increased the quality of
recommendation by their experiments
![Page 10: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/10.jpg)
research data sets
Whole Jester has 617, 000 ratings of 100 jokes by 24, 900 users. range from −10 to +10. Whole rating matrix is filled to about 25%.
This research was using the part 1 of the three parts Data from 24,983 users who have rated 36 or more jokes, a matrix with dimensions 24983 X 101
data set is in .CSV format
![Page 11: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/11.jpg)
Data pre-processingJester data sets using 99 to represent the
unrated valueFirst step is to change all the unrated values
to 0.Second part is the most important part of this
research which is predict the necessary unrated value for future prediction generation
All the data processing were using R programming
![Page 12: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/12.jpg)
Data set preview-7.82 8.79 -9.66 -8.16 -7.52 -8.5 -9.85 4.17 -8.98 -4.764.08 -0.29 6.36 4.37 -2.38 -9.66 -0.73 -5.34 8.88 9.22
99 99 99 99 9.03 9.27 9.03 9.27 99 9999 8.35 99 99 1.8 8.16 -2.82 6.21 99 1.84
8.5 4.61 -4.17 -5.39 1.36 1.6 7.04 4.61 -0.44 5.73-6.17 -3.54 0.44 -8.5 -7.09 -4.32 -8.69 -0.87 -6.65 -1.8
99 99 99 99 8.59 -9.85 7.72 8.79 99 996.84 3.16 9.17 -6.21 -8.16 -1.7 9.27 1.41 -5.19 -4.42
-3.79 -3.54 -9.42 -6.89 -8.74 -0.29 -5.29 -8.93 -7.86 -1.63.01 5.15 5.15 3.01 6.41 5.15 8.93 2.52 3.01 8.16
-2.91 4.08 99 99 -5.73 99 2.48 -5.29 99 1.461.31 1.8 2.57 -2.38 0.73 0.73 -0.97 5 -7.23 -1.36
99 99 99 99 5.87 99 5.58 0.53 99 7.149.22 9.27 9.22 8.3 7.43 0.44 3.5 8.16 5.97 8.988.79 -5.78 6.02 3.69 7.77 -5.83 8.69 8.59 -5.92 7.52-3.5 1.55 2.33 -4.13 4.22 -2.28 -2.96 -0.49 2.91 1.99
99 -9.27 99 99 -7.38 99 8.74 -6.31 99 2.333.16 7.62 3.79 8.25 4.22 7.62 2.43 0.97 0.53 0.834.22 3.64 99 99 2.52 99 4.13 -5.19 99 7.91
99 7.62 99 99 -8.64 2.43 8.93 -6.6 99 -9.472.57 -0.73 99 99 2.57 99 -4.22 2.67 99 -1.317.28 5.39 99 99 -4.22 99 8.93 3.5 99 6.12
![Page 13: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/13.jpg)
Data processing approach
Joke1 Joke2 Joke3 Joke4 Joke5 Joke6 Joke7 Joke
8
User 2 0 0 1 3 4 3 0 0
User 3 0 0 0 4 5 4 0 0
User 4 0 0 2 3 3 2 0 0
Manhattan distance measure is applied
![Page 14: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/14.jpg)
Data processingDistance between user 2 and user 3 is four Distance between user 2 and user 4 is threeUser 4 seems more close to user 3
![Page 15: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/15.jpg)
Data processing approach(con’d.)
Joke1 Joke2 Joke3 Joke4 Joke5 Joke6 Joke7 Joke
8
User 2 0 0 1 3 4 3 0 0
User 3 0 0 1 4 5 4 0 0
User 4 0 0 2 3 3 2 0 0
![Page 16: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/16.jpg)
Data processingDistance between user 2 and user 3 is threeDistance between user 2 and user 4 is threeBoth user 3 and 4 has the same distance with
user 2
![Page 17: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/17.jpg)
Measurement of accuracyrelative squared error used to computing
the accuracy
Traditional CF accuracy of joke 3Accuracy= 1-(1-2)²/1=0
Current approach accuracy Accuracy=1-(1-1.5)²/1=75%
![Page 18: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/18.jpg)
User similarity comparison
Joke1 Joke2 Joke3 Joke4 Joke 5 Joke 6 Joke7 Joke 80
1
2
3
4
5
6
User 2User 3User 4
![Page 19: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/19.jpg)
Conclusion Heavy computation forceMethods for both unrated value and missing
value
![Page 20: Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.](https://reader036.fdocuments.in/reader036/viewer/2022082709/56649cee5503460f949bbd16/html5/thumbnails/20.jpg)
References Liu, Z, Wang, H, Qu, W, Liu, W & Fan, R 2009b, Sparse Matrix
Prediction Filling in Collaborative Filtering, IEEE Computer Society, pp. 304-307.
Ma, H, King, I & Lyu, MR 2007, Effective missing data prediction for collaborative filtering, ACM, Amsterdam, The Netherlands, pp. 39-46.
Rong, H & Yansheng, L 2006, 'A Hybrid User and Item-Based Collaborative Filtering with Smoothing on Sparse Data', paper presented at the Artificial Reality and Telexistence--Workshops, 2006. ICAT '06. 16th International Conference on, Nov. 2006.