Collaborative Filtering with Entity Similarity …ink-ron.usc.edu/xiangren/ijcai13_HINA.pdfenhance...
Transcript of Collaborative Filtering with Entity Similarity …ink-ron.usc.edu/xiangren/ijcai13_HINA.pdfenhance...
Co l l a bo rat ive F i l te r i ng wi t h Entity Similarity Regularization in Heterogeneous Information Networks
Xiao Yu1, Xiang Ren1*, Quanquan Gu1, Yizhou Sun2, Jiawei Han1
1Univ. of Illinois, at Urbana-Champaign 2Northeastern Univ. *[email protected]
1
Roadmap
• Why Study CF in HIN?
• Background and Preliminaries
• Proposed Method
• Experiments
• Conclusion and Future Work
2
Recommender Systems are Everywhere!
3
Recommendation Paradigm
4
user profiles
I1 I2 … Im
U1 ? ? ? 5
U2 ? 3 ? 4
… ? ? ? ?
Un 2 1 ? ?
user item ratings
item features
external knowledge
recommender system recommendation
Recommender System with Network
• Utilizing network relationship information can enhance the recommendation quality
• However, most of the previous studies only use single type of relationship between users or items (e.g., social network [Ma,WSDM11], trust relationship [Ester, KDD10], service membership [Yuan, RecSys11])
5
The Heterogeneous Information Network View of Recommender System
6
Why Information Network Can Help?
• Various types of information and relationships complement each other.
• Number of ratings - power law distribution
• Cold Start – How to handle new users or new items?
7
# of ratings
A very small number of users and items have a lot of ratings
Most users and items do not have enough ratings
nu
ms
of
use
rs
Roadmap
• Why Study CF in HIN?
• Background and Preliminaries
• Proposed Method
• Experiments
• Conclusion and Future Work
8
What Are Information Networks? • A network where each node represents an entity (e.g.,
user in a social network) and each link (e.g., friendship)
a relationship between entities.
– Nodes/links may have attributes, labels, and weights.
– Links may carry rich semantic information.
9
Heterogeneous Information Networks
10
Venue Paper Author
DBLP Bibliographic Network The IMDb Movie Network
Actor
Movie
Director
Movie
Studio
The Facebook Network
1. Multiple entity types and link types 2. New problems are emerging in heterogeneous networks!
Heterogeneous Information Networks Are Ubiquitous
11
Social Media Protein Networks E-commerce
Medical
Database Medical
Images
Medical
Records
Treatment Plan
Pharmacy Service
Healthcare Knowledge Graph
IMDb Network Schema
12
background
Entity Similarity
13
In heterogeneous information networks, find entities which are similar to a given entity query.
In DBLP, who are similar to “C. Faloutsos”?
In IMDb, which TVs / movies are similar to “Avatar”?
In Yelp, which restaurants are similar to “Blackdog”?
background
Meta-Path [Sun, VLDB 2011]
14
A1
A2
P1
P2
VLDB
Social Network
A3
A4
Network Snippet
• Meta-level description of a path between two entities • A path on network schema • Denote an existing or concatenated relation between two
entity types
A1-P1-A2 A1-P1-VLDB-P3-A3 A1-P1-”Social Network”-P2-A4 ……
P3
A1 is similar to A2, A3 and A4 but why?
Author-Paper-Author Author-Paper-Venue-Paper-Author Author-Paper-Term-Paper-Author
background
Similarity Measurement
• PathSim [Sun, VLDB 2011]
• Normalized path count between x and y following meta-path 𝒫
• Entities with strong connectivity and similar visibility under the given meta-path
– Path Constrained Random Walk[Lao, Machine Learning, 2010]
15
Visibility of x Visibility of y
background
Different Meta-Paths Carry Different Semantics
• Who are most similar to C. Faloutsos?
16
Christos’s students or close collaborators Work on similar topics and have similar reputation
Meta-Path: Author-Paper-Author Meta-Path: Author-Paper-Venue-Paper-Author
background
Problem Definition
• Given
• For a specific user, find items of interests based his / her previous rating history.
17
E1 e2 … em
u1 0 0 0 1
u2 0 2 0 5
… 0 0 0 0
un 3 4 0 0
Rating Data Information Network
Roadmap
• Why Study CF in HIN?
• Background and Preliminaries
• Proposed Method
• Experiments
• Conclusion and Future Work
18
Notations
• We have n users and m items.
• By computing similarity scores of all item pairs along certain meta-pat, we can get a similarity matrix.
• With L different meta-paths, we can calculate L similarity matrices as
19
Traditional Matrix Factorization
• Approximate R with product of U and V
• Non-Negative Matrix Factorization
• Weighted Non-Negative matrix Factorization
20
Objective Function
21
Approximate R with U V product Regularization on U V
Regularization on θ Similar items measured from HIN should have similar low-rank representations
Simplify Optimization Process
22
where
Revised Objective Function
23
Similar items measured from HIN should have similar low-rank representations
Parameter Estimation
24
Step 1
Step 2
Step 3
Iteratively updating U, V and θ till convergence
Roadmap
• Why Study CF in HIN?
• Background and Preliminaries
• Proposed Method
• Experiments
• Conclusion and Future Work
25
Dataset • We combine IMDb + MovieLens100K
26
We random sample training datasets of different sizes (0.4, 0.6, and 0.8)
Comparison Methods
27
We use Hete-MF to represent the proposed method.
Evaluation Metrics
• We use Mean Absolute Error and Root Mean Square Error to evaluate the performance.
28
Performance Comparison
29
Performance Analysis
30
Convergence Rate
31
Roadmap
• Why Study CF in HIN?
• Background and Preliminaries
• Proposed Method
• Experiments
• Conclusion and Future Work
32
Conclusions
• We study CF in HIN.
• We combine rating data with meta-path-based similarity matrices.
• We compared the proposed approaches with several widely employed or state-of-the-art recommendation techniques.
• We analyzed the performance of these methods under different scenarios.
33
Future Work
• Adding user and/or item rating priors to the proposed method to alleviate cold start problem
• Personalized recommendation models
• On-line version of the method to incorporate newly generated ratings
34
Thank You!!
35