Applied Algorithm Lab Wooram Heo
description
Transcript of Applied Algorithm Lab Wooram Heo
Toward the Next Generation of Recommender Systems: A Survey of theState-of-the-Art and Possible Extensions
Applied Algorithm LabWooram Heo
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005
Outline
• Recommemder Systems
• Problem statement
• Survey of Recommender systems– Content-Based Methods– Collabolative Methods– Hybrid Methods
Recommender Systems• Systems for recommending items (e.g. books, movies,
CD’s, web pages, newsgroup messages) to users based on examples of their preferences.
• Many on-line stores provide recommendations (e.g. Ama-zon, CDNow).
• Recommenders have been shown to substantially increase sales at on-line stores.
Recommender Systems• Examples
Problem statement• Recommendation problem is to estimate ratings for the
items that have not been seen by a user
• Estimation is usually based on the ratings given by the user to other items and on some other information
Problem statement• : the set of all users• : the set of all possible items that can be recommended• : , where is a nonnegative integers or real numbers within
certain range• For each user , we want to choose such item that maxi-
mizes the user’s utility.
• Utility needs to be extrapolated to the whole space
Recommender System Categories
• Content-based recommendations– The user will be recommended items similar to the ones the user
preferred in the past
• Collaborative recommendations – The user will be recommended items that people with similar tastes
and preferences liked in the past
• Hybrid approaches – These methods combine collaborative and content-based methods
Content-Based Methods• Recommend items similar to those users preferred in the
past• User profiling is the key• E.g. in a movie recommender application,
– Specific actors– Directors– Genres– etc
Content-Based Methods• Content-based approach has its roots in information re-
trieval– Documents, web sites(URLs), and news messages
• Designed mostly to recommend text-based items– Content is usually described with keywords
Content-Based Methods• TF-IDF weight for keywords in document is defined as
• Content of document is defined as
• Cosine similarity measure
Disadvantages• Not all content is well represented by keywords
– Multimedia data
• Items represented by same set of features are indistin-guishable
• Overspecialization problem
• New user problem– No history available
Collaborative Methods• Use other users recommendations (ratings) to judge item’s
utility
• Key is to find users/user groups whose interests match with the current user
• More users, more ratings: better results
• Can account for items dissimilar to the ones seen in the past too
Collaborative Methods
A 9B 3C 9: :Z 5
A B C 9: :Z 10
A 5B 3C: : Z 7
A B C 8: : Z
A 6B 4C 2: :Z
A 10B 4C 8. .Z 1
UserDatabase
ActiveUser
CorrelationMatch
A 9B 3C . .Z 5
A 9B 3C 9: :Z 5
A 10B 4C 8. .Z 1
ExtractRecommendations
C
Collaborative Methods• Memory-based algorithms
– Value of the unknown rating for user and item is usually computed as an aggregate of the ratings of some other users for the same item
– Where denotes the set of users that are the most similar to user c and who have rated item
Collaborative Methods• Similarity between two users
– Pearson correlation coefficient
– Cosine similarity
Collaborative Methods• Model-based algorithm
– Cluster models and Bayesian networks are used to estimate this probability
Collaborative Methods• Model-based approaches use various machine learning
techniques– K-means clustering– Gibbs sampling– Bayesian model– Probabilistic relational model– Linear regression– Maximum entropy model– Markov decision process– Probabilistic latent semantic analysis– Latent Dirichlet allocation– etc
Disadvantages• Finding similar users/user groups isn’t very easy
• New user problem : No preferences available
• New item problem: No ratings available
• Sparsity problem
END