Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.
-
Upload
barrie-clark -
Category
Documents
-
view
220 -
download
0
Transcript of Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.
Cold Start Problem in Movie Recommendation
JIANG CAIGAO, WANG WEIYANGroup 20
Outline
1. Introduction2. Problem Statement3. Transfer Learning based4. Semantic Extracting based5. Experiments
Introduction: Problem
• Collaborative filtering: A Classical Recommending Solution1) Look for users who share the same rating patternswith the user who needs recommendation.2) Use the ratings from those like-minded users to predict active user’s rating for unrated items.
• Cold Start Problem: lack sufficient information to recommend• New users take long time to rate sufficient movie to predict their preference• Browsers even don’t have any direct preference information
Introduction: Solutions
• With Clicking data: Recommend basing on Transfer Learning• Extract knowledge from one or more source tasks and apply to the target task
• With Semantic Data: Recommend Basing on Semantics Extracting• Extract movie’s semantic information from its tags• Respond to users’ actions e.g. visiting a movie’s page or fuzzy query
Learning System Learning System Learning System
Different Tasks
Knowledge Learning System
Source Tasks Target Task
Introduction
Collective Matrix Factorization(CMF)CMF jointly factorizes multiple relation matrices which have many different value types, and the factors share parameters when entities appear in multiple relations.Let be the rating matrix, the element denotes the user i’s rating for movie j.Let be the a binary matrix denoting the genres each movie belonging to, and indicate whether movie j belongs to genre i.
The factors U, Z, V. And V is the shared factor in both construction:
The average loss can be:
Problem Statement
Task:Focus on three transferring tasks with three types of auxiliary data, the four different domains represented as matrices.
Notations:• There are two auxiliary matrices with heterogeneous binary feedback, denoted as , • is another auxiliary domain that represents another different but related CF task with neither users nor items• R is the target matrix and has sparse rating values, and represent the domain we want to predict rating
Problem Statement
Problem Formulation:• Given a target rating matrix , and three auxiliary matrices, , and • Our goal is to utilize auxiliary matrices to boost missing rating prediction
performance for R
Methods
The Model:• The distribution of the homogeneous ratings are assumed to be Gaussian:
• The distribution of the heterogeneous binary value is modeled by Bernoulli distribution:
Methods
The Model:• The whole generative process is as follows:
Domain a) For each user i, generate b) For each item j, generate c) For each cell(i,j) in R, generate d) For each cell(I,j) in , generate e) For each cell(I,j) in , generate f) For each cell(l,s) in , generate
MethodsThe object function:• The log-likelihood function of our probabilistic model as follows:
MethodsLearning Process:• Use Jensen’s inequality to derive a lower bound on log-likelihood
Where H is the entropy, and:
MethodsLearning Process:
Parameters:1. Model Parameters:
2. Variational Parameters:
MethodsLearning Process:
Variational expectation-maximization(VEM)1. VE-Step
Fix model parameters and optimize the bound w.r.t the variational parameters to make the bound as tight as possible.
2. VM-StepFix variational parameters and optimize the equation w.r.t the model parameters to raise the bound.
ExperimentData Sets: There are in total of four datasets used in our experiments, namely, Netflix, Movielens, Book-Corssing and Each-Movie.
Netflix: The Netflix dataset contains about 10^8 rating values in the range{1,2,3,4,5}, given by about 7x10^4 users on around 1.7x10^4 movies
Movielens:
The movielens contains about 10^7 rating value, rated by 7x10^4 users on around 10^4 movies.
EachMovie: EachMovie contains approximately 2.8x10^6 ratings given by 7.2x10^4 users on 1628 movies
ExperimentExperiment settting:1. Learning Netflix with MovieLens:
We take Netflix and MovieLens datasets to conduct this part of the experiment. Here the goal is to predict missing values in Netflix (target domain) and MovieLens is only used to construct .a) Randomly extract a 4000x4000 rating matrix from Netflix data, and take a
sub-matrix 2000x2000 as the target matrix R b) Take two sub-matrix 2000x2000 as the auxiliary matrix, so that share the
same users but not with R, and share the common itemsc) Preprocess and by relabeling ratings in the range{1,2,3} as 0 and ratings in
the range {4,5} as 1d) Randomly select 2000x2000 matrix from Movielens data as without making
any assumption on the correspondence of users/items between {R, , } and
ExperimentExperiment Setting:2. Learning EachMovie with MovieLens:
All the preprocessing steps are the same as before, but the dimensions is 1000x800.
ExperimentResults:
ExperimentResults:
Semantics Extracting SolutionGoal: To deal with situation knowing nothing about the user
Motivation: Most movies are tagged with short phases and words by users. E.g.
Extract the semantics from tags to describe the movie’s content for recommendation responding to users’ browsing and fuzzy query.
Semantics Extracting: Related WorkTags, as brief and informative data, has been used for recommending and prediction:
(1) As a kind of binary variable only.[1][2](2)Otherwise user manually provide relevance value between
tag and item.[3]
Tags are regarded as features instead of language words, and the semantics are ignored.
[1] GUAN, Z.etc. Document recommendation in social tagging services. (WWW’10). ACM.
[2] TSO-SUTTER, etc. Tag-aware recommender systems by fusion of collaborative filtering algorithms. ACM Symposium on Applied Computing (SAC’08)
[3] Vig, Jesse, etc. "The tag genome: Encoding community knowledge to support novel interaction." ACM Transactions on Interactive Intelligent Systems (2012)
Semantics Extracting: Word EmbeddingNLP Perspective: Treat tags as brief and informative description of the movie and extract the semantics by generating word embedding[4].
VectorTag Neutral Network
Lookup table containing the vector
Hierarchical SoftMax (Huffman Tree)
Sample: Shorten the similar words’ distanceUnsample: enlarge the dissimilar words’ distance
[4]Mikolov, etc. Distributed Representations of Words and Phrases and their Compositionality
Semantics Extracting: Word EmbeddingModified for tags semantics extracting:
Generate vectors in 100 dimensions representing tags, in which similar and related tags’ vector have large cosine distance (inner production)
Original Word2vec Modified Tag2vec Reason
Context Fixed context window size: 5~10
All tags of the movie regardless of the length
All tags are related regardless of the order and appearing position
Unsample
5~10 times randomly unsamples
>=1000 randomly unsamples To effectively enlarge the distances
Dataset Large corpus: Wikipedia Tags of each movie + movie name + category
To extract the special semantics in tags: e.g. name, special phrase
Semantics Extracting: Movie & Query Embedding for Recommending
Movie Embedding: vector calculated from tags’ vectors
Query Embedding: key words’ vector calculated from tags’ vectors
Movie Vector or Query Vector => Similar Movie:• Ball Tree: Indexing all movies’ 100 dimension vectors.• KNN algorithm: Take the being visited movie’s vector or users’ fuzzy query
vector as input to find similar movies in KD Tree
Semantics Extracting: ExperimentData Set: Full MovieLens (Last updated 8/2015)
Tag Vector:
Funny: inspiring: moving:
Item Num
Movies ~30,000
Users 230,000
Tags 510,000
Vocabulary Size 11363
Training Words 100949
Semantics Extracting: Experiment• Recommend similar movies with what is being visited: Matrix The Lord of the Rings
I, Robot Pride and Prejudice(2005)
Semantics Extracting: ExperimentRecommend responding to single word fuzzy query: bond China
kid funny
Semantics Extracting: ExperimentRecommend responding to multi-words fuzzy query : french funny kid action
Magic book war documentary
THANKS