Dual Similarity Learning for Heterogeneous One...
Transcript of Dual Similarity Learning for Heterogeneous One...
Dual Similarity Learning for HeterogeneousOne-Class Collaborative Filtering
Xiancong Chena,b,c , Weike Pana,b,c∗, Zhong Minga,b,c∗
[email protected], {panweike,mingz}@szu.edu.cn
aNational Engineering Laboratory for Big Data System Computing Technology,Shenzhen University, Shenzhen, China
bGuangdong Laboratory of Artificial Intelligence and Digital Economy (SZ),Shenzhen University, Shenzhen, China
cCollege of Computer Science and Software Engineering,Shenzhen University, Shenzhen, China
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 1 / 22
Introduction
Problem Definition
Problem: In this paper, we study the heterogeneous one-classcollaborative filtering (HOCCF) problem.Input: For a user u ∈ U , we have a set of purchasing items, i.e.,IPu , and a set of examined items, i.e., IEu .Goal: Our goal is to exploit such two types of one-class feedbackand recommend a ranked list of items for each user u.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 2 / 22
Introduction
Challenges
1 The sparsity of target feedback (i.e., purchases).2 The ambiguity and noise of auxiliary feedback (i.e., examinations).
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 3 / 22
Introduction
Overall of Our Solution
Target Item
Target User
Dual
Similarity
i'is jis
u'us wus
Dual Similarity Learning Model (DSLM):Learn the similarity si ′i between a target item i and a purchaseditem i ′, and the similarity sji between a target item i and anexamined item j .Learn the similarity su′u between a target user u and a user u′
who purchased item i , and the similarity swu between a target useru and a user w who examined item i .
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 4 / 22
Introduction
Advantages of Our Solution
By introducing the auxiliary feedback (i.e., examinations), DSLM isable to alleviate the sparsity problem to some extent.
DSLM learns not only the similarity among items, but also thesimilarity among users, which is useful to capture the correlationsbetween users and items.
DSLM strikes a good balance between the item-based similarityand user-based similarity.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 5 / 22
Introduction
Notations
Notation Explanation
n number of usersm number of itemsu, u′,w ∈ {1, 2, . . . , n} user IDi, i′, j ∈ {1, 2, . . . ,m} item IDU = {u}, |U| = n the whole set of usersI = {i}, |I| = m the whole set of itemsRP = {(u, i)} the whole set of purchasesRE = {(u, i)} the whole set of examinationsRA = {(u, i)} the set of absent pairsIPu = {i|(u, i) ∈ RP} the set of purchased items w.r.t. uIEu = {i|(u, i) ∈ RE} the set of examined items w.r.t. uUPi = {u|(u, i) ∈ RP} the set of users that have purchased item iUEi = {u|(u, i) ∈ RE} the set of users that have examined item iUu·, Pu′·, Ew· ∈ R1×d user’s latent vectorsVi·, Pi′ , Ej· ∈ R1×d item’s latent vectorsbu , bi ∈ R user bias and item biasd latent feature numberrui predicted preference of user u on item iρ sampling parameterγ learning rateT , L, L0 iteration numbersλ∗, α∗, β∗ tradeoff parameters
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 6 / 22
Background
Factored Item Similarity Model (FISM)
In FISM [Kabbur et al., 2013], we can estimate the preference of useru towards item i by aggregating the similarity between item i and allthe items purchased by user u (i.e., IPu \{i}),
rui =1√
|IPu \{i}|
∑i ′∈IPu \{i}
Pi ′·V Ti· , (1)
where we can regard the term 1√|IPu \{i}|
∑i ′∈IPu \{i} Pi ′· as a certain
virtual user profile w.r.t. the target feedback, denoting the distinctpreference of user u.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 7 / 22
Background
Transfer via Joint Similarity Learning (TJSL)
TJSL [Pan et al., 2016] introduces a new similarity term in a similarway to that of FISM, through which the knowledge from the auxiliaryfeedback (i.e., examinations) can be transferred. Then, the preferenceestimation of user u towards item i becomes as follows,
∑i ′∈IPu \{i}
si ′i +∑
j∈IE(`)u
sji , IE(`)u ⊆ IEu , (2)
where∑
j∈IE(`)u
sji = 1√|IE(`)
u |
∑j∈IE(`)
uEj·V T
i· , and the term
1√|IE(`)
u |
∑j∈IE(`)
uEj· can be regarded as a virtual user profile w.r.t. the
auxiliary feedback.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 8 / 22
Method
Dual Similarity Learning Model (DSLM)
In TJSL [Pan et al., 2016], two similarities among items are learned.Symmetrically, we define the similarity among users as follows,∑
u′∈UPi \{u}
su′u +∑
w∈UE(`)i
swu, UE(`)i ⊆ UEi , (3)
where∑
u′∈UPi \{u}su′u = 1√
|UPi \{u}|
∑u′∈UPi \{u}
Pu′·UTu·,∑
w∈UE(`)i
swu = 1√|UE(`)
i |
∑w∈UE(`)
iEw ·UT
u·. Intuitively, we can also regard
the term 1√|UPi \{u}|
∑u′∈UPi \{u}
Pu′· and 1√|UE(`)
i |
∑w∈UE(`)
iEw · as the
virtual item profiles w.r.t. the target feedback and auxiliary feedback,respectively.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 9 / 22
Method
Prediction Rule
The predicted preference of user u on item i ,
r (`)ui =
∑i ′∈IPu \{i}
si ′i +∑
j∈IE(`)u
sji + bu + bi+∑u′∈UPi \{u}
su′u +∑
w∈UE(`)i
swu, IE(`)u ⊆ IEu , U
E(`)i ⊆ UEi ,
(4)
where IE(`)u is the set of likely-to-prefer items selected from IEu , and
UE(`)i is the set of potential users that are likely to purchase item i
selected from UEi .
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 10 / 22
Method
Objective Function
The objective function of DSLM is as follows,
minΘ(`), IE(`)
u ⊆IEu ,UE(`)i ⊆UEi
∑(u,i)∈RP∪RA
f (`)ui , (5)
where f (`)ui = 1
2(rui − r (`)ui )2 + λu
2 ||Uu·||2F +λp2∑
u′∈UPi \{u}||Pu′·||2F +
λe2∑
w∈UE(`)i||Ew ·||2F + βu
2 b2u + αv
2 ||Vi·||2F +αp2∑
i ′∈IPu \{i} ||Pi ′·||2F +
αe2∑
j∈IE(`)u||Ej·||2F + βv
2 b2i , and the model parameters are
Θ(`) = {Uu·,Pu′·,Ew ·,Vi·, Pi ′·, Ej·,bu,bi}. Note that RA is a set ofnegative feedback used to complement the target feedback, whererui = 1 if (u, i) ∈ RP and rui = 0 otherwise.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 11 / 22
Method
Gradients (1/2)
To learn the parameters Θ(`), we use the stochastic gradient decent(SGD) algorithm and have the gradients of the model parameters for arandomly sampled pair (u, i) ∈ RP ∪RA,
∇Uu· = −eui1√
|UPi \{u}|
∑u′∈UPi \{u}
Pu′· − eui1√|UE(`)
i |
∑w∈UE(`)
i
Ew · + λuUu·,
(6)
∇Vi· = −eui1√
|IPu \{i}|
∑i ′∈IPu \{i}
Pi ′· − eui1√|IE(`)
u |
∑j∈IE(`)
u
Ej· + αv Vi·,
(7)
∇Pu′· = −eui1√
|UPi \{u}|Uu· + λpPu′·,u′ ∈ UPi \{u}, (8)
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 12 / 22
Method
Gradients (2/2)
∇Ew · =− eui1√|UE(`)
i |Uu· + λeEw ·,w ∈ UE(`)
i , (9)
∇Pi ′· =− eui1√
|IPu \{i}|Vi· + αpPi ′·, i ′ ∈ IPu \{i}, (10)
∇Ej· =− eui1√|IE(`)
u |Vi· + αeEj·, j ∈ I
E(`)u , (11)
∇bu =− eui + βubu,∇bi = −eui + βv bi , (12)
where eui = rui − rui is the difference between the true preference andthe predicted preference.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 13 / 22
Method
Update Rules
We have the update rules,
θ(`) ← θ(`) − γ∇θ(`), (13)
where γ is the learning rate, and θ(`) ∈ Θ(`) is a model parameter to belearned.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 14 / 22
Method
Identification of IE(`)u and UE(`)
i
Note that we identify UE(`)i and IE(`)
u by the following way,For each user u ∈ UEi , we estimate the preference for the targetitem i , i.e., r (`)
ui , and take τ |UEi |(τ ∈ (0,1]) users with the highestscores as the potential users that are likely to purchase the targetitem i .For each j ∈ IEu , similarly, we estimate the preference r (`)
uj , andtake τ |IEu | items with the highest scores as the candidate items.
Finally, we save the model and data of the last L0 epochs. Theestimated preference is the average value of r (`)
ui , where ` ranges fromL− L0 + 1 to L.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 15 / 22
Method
Algorithm of DSLM
1: Input: RP ,RE , T , L, L0, ρ, γ, λ∗, α∗, β∗.2: Output: UE(`), IE(`) and Θ(`), ` = L− L0 + 1, ..., L.3: Let UE(1) = UE , IE(1) = IE , τ = 14: for ` = 1, ..., L do5: Initialize the model Θ(`)
6: for t = 1, ..., T do7: Randomly sampleRA ⊂ R\RP with |RA| = ρ|RP |8: for t2 = 1, ..., |RP ∪RA| do9: Randomly pick up (u, i) ∈ RP ∪RA
10: Calculate r (`)ui via (4)
11: Calculate∇θ, θ ∈ Θ(`) via (6)−(12)12: Update θ, θ ∈ Θ(`) via (13)13: end for14: end for15: if ` > L− L0 then16: Save the current model and data (Θ(`),UE(`), IE(`))
17: end if18: if L > 1 and L > ` then19: τ ← τ × 0.920: Select IE(`+1)
u with |IE(`+1)u | = τ |IEu | for each u
21: Select UE(`+1)i with |UE(`+1)
i | = τ |UEi | for each i
22: end if23: end for
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 16 / 22
Experiments
Datasets and Evaluation Metrics
For direct comparison, we use the three datasets fromTJSL [Pan et al., 2016], including ML100K, ML1M andAlibaba2015.
Table: Description of the datasets used in the experiments.
Dataset |U| |I| |RP | |RE | |RP te |ML100K 943 1682 9438 45285 2153ML1M 6040 3952 90848 400083 45075
Alibaba2015 7475 5275 9290 62659 2322
For performance evaluation, we adopt two commonly usedranking-oriented metrics, i.e., Precision@5 and NDCG@5.
The data and code are available athttp://csse.szu.edu.cn/staff/panwk/publications/DSLM/
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 17 / 22
Experiments
Baselines and Parameter Configurations
For comparative studies, we include the competitive methods asfollows.
BPR (Bayesian personalized ranking) [Rendle et al., 2009]FISM (factored item similarity model) [Kabbur et al., 2013]TJSL (transfer via joint similarity learning)[Pan et al., 2016]RBPR (role-based Bayesian personalizedranking) [Peng et al., 2016]
For parameter configurations, we fix the number of latentdimension d = 20, the learning rate γ = 0.01 and samplingparameter ρ = 3, and search the tradeoff parameters from{0.001,0.01,0.1} and the best iteration number T from{100,500,1000} via NDCG@5 performance.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 18 / 22
Experiments
Main Results
Dataset BPR FISM TJSL RBPR DSLM
ML100K Precision@5 0.0552± 0.0006 0.0628± 0.0015 0.0697± 0.0016 0.0654± 0.0013 0.0694± 0.0014
NDCG@5 0.0874± 0.0020 0.1029± 0.0017 0.1133± 0.0047 0.1058± 0.0047 0.1140± 0.0022
ML1M Precision@5 0.0928± 0.0008 0.0971± 0.0013 0.1012± 0.0011 0.1086± 0.0009 0.1107± 0.0015
NDCG@5 0.1121± 0.0010 0.1189± 0.0008 0.1248± 0.0010 0.1327± 0.0016 0.1353± 0.0012
Alibaba2015 Precision@5 0.0050± 0.0006 0.0046± 0.0003 0.0071± 0.0004 0.0076± 0.0005 0.0087± 0.0006
NDCG@5 0.0138± 0.0017 0.0126± 0.0009 0.0200± 0.0008 0.0220± 0.0013 0.0269± 0.0017
Observations:In all cases, TJSL, RBPR and DSLM perform significantly betterthan BPR and FISM, which shows the effectiveness of introducingthe auxiliary feedback to assist the task of learning users’preferences; andDSLM performs better than TJSL and RBPR in most cases, e.g., itis the best on ML1M and Alibaba2015, and is comparable withTJSL on ML100K, which clearly shows the usefulness of the dualsimilarity in capturing the correlations between users and items.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 19 / 22
Conclusions and Future Work
Conclusion
We propose a novel solution, i.e., dual similarity learning model(DSLM), for a recent and important recommendation problemcalled heterogeneous one-class collaborative filtering (HOCCF).
In particular, we jointly learn the dual similarity among both usersand items so as to exploit the complementarity well. Extensiveempirical studies on three public datasets clearly show theeffectiveness of our solution.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 20 / 22
Conclusions and Future Work
Future Work
For future work, we are interested in further improving our DSLMby exploring pairwise ranking models [Yu et al., 2018] and deeplearning models [Shi et al., 2019].
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 21 / 22
Thank you
Thank you!
We thank Weike Pan and Zhong Ming are corresponding authorsfor this work, the anonymous reviewers for their expert andconstructive comments and suggestions, and the support ofNational Natural Science Foundation of China Nos. 61872249,61836005 and 61672358.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 22 / 22
References
Kabbur, S., Ning, X., and Karypis, G. (2013).FISM: Factored item similarity models for top-n recommender systems.KDD’13, pages 659–667.
Pan, W., Liu, M., and Ming, Z. (2016).Transfer learning for heterogeneous one-class collaborative filtering.IEEE Intelligent Systems, 31(4):43–49.
Peng, X., Chen, Y., Duan, Y., Pan, W., and Ming, Z. (2016).RBPR: Role-based Bayesian personalized ranking for heterogeneous one-class collaborative filtering.IFUP’16.
Rendle, S., Freudenthaler, C., Gantner, Z., and Lars, S.-T. (2009).BPR: Bayesian personalized ranking from implicit feedback.UAI’09, pages 452–461.
Shi, C., Han, X., Li, S., Wang, X., Wang, S., Du, J., and Yu, P. (2019).Deep collaborative filtering with multi-aspect information in heterogeneous networks.IEEE Transactions on Knowledge and Data Engineering.
Yu, R., Zhang, Y., Ye, Y., Wu, L., Wang, C., Liu, Q., and Chen, E. (2018).Multiple pairwise ranking with implicit feedback.CIKM’18, pages 1727–1730.
Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 22 / 22