Dual Similarity Learning for Heterogeneous One...

Dual Similarity Learning for HeterogeneousOne-Class Collaborative Filtering

Xiancong Chena,b,c , Weike Pana,b,c∗, Zhong Minga,b,c∗

[email protected], {panweike,mingz}@szu.edu.cn

aNational Engineering Laboratory for Big Data System Computing Technology,Shenzhen University, Shenzhen, China

bGuangdong Laboratory of Artificial Intelligence and Digital Economy (SZ),Shenzhen University, Shenzhen, China

cCollege of Computer Science and Software Engineering,Shenzhen University, Shenzhen, China

Chen, Pan and Ming (SZU) DSLM IEEE BigComp 2020 1 / 22

Introduction

Problem Definition

Problem: In this paper, we study the heterogeneous one-classcollaborative filtering (HOCCF) problem.Input: For a user u ∈ U , we have a set of purchasing items, i.e.,IPu , and a set of examined items, i.e., IEu .Goal: Our goal is to exploit such two types of one-class feedbackand recommend a ranked list of items for each user u.


Introduction

Challenges

1 The sparsity of target feedback (i.e., purchases).2 The ambiguity and noise of auxiliary feedback (i.e., examinations).


Introduction

Overall of Our Solution

Target Item

Target User

Dual

Similarity

i'is jis

u'us wus

Dual Similarity Learning Model (DSLM):Learn the similarity si ′i between a target item i and a purchaseditem i ′, and the similarity sji between a target item i and anexamined item j .Learn the similarity su′u between a target user u and a user u′

who purchased item i , and the similarity swu between a target useru and a user w who examined item i .


Introduction

Advantages of Our Solution

By introducing the auxiliary feedback (i.e., examinations), DSLM isable to alleviate the sparsity problem to some extent.

DSLM learns not only the similarity among items, but also thesimilarity among users, which is useful to capture the correlationsbetween users and items.

DSLM strikes a good balance between the item-based similarityand user-based similarity.


Introduction

Notations

Notation Explanation

n number of usersm number of itemsu, u′,w ∈ {1, 2, . . . , n} user IDi, i′, j ∈ {1, 2, . . . ,m} item IDU = {u}, |U| = n the whole set of usersI = {i}, |I| = m the whole set of itemsRP = {(u, i)} the whole set of purchasesRE = {(u, i)} the whole set of examinationsRA = {(u, i)} the set of absent pairsIPu = {i|(u, i) ∈ RP} the set of purchased items w.r.t. uIEu = {i|(u, i) ∈ RE} the set of examined items w.r.t. uUPi = {u|(u, i) ∈ RP} the set of users that have purchased item iUEi = {u|(u, i) ∈ RE} the set of users that have examined item iUu·, Pu′·, Ew· ∈ R1×d user’s latent vectorsVi·, Pi′ , Ej· ∈ R1×d item’s latent vectorsbu , bi ∈ R user bias and item biasd latent feature numberrui predicted preference of user u on item iρ sampling parameterγ learning rateT , L, L0 iteration numbersλ∗, α∗, β∗ tradeoff parameters


Background

Factored Item Similarity Model (FISM)

In FISM [Kabbur et al., 2013], we can estimate the preference of useru towards item i by aggregating the similarity between item i and allthe items purchased by user u (i.e., IPu \{i}),

rui =1√

|IPu \{i}|

∑i ′∈IPu \{i}

Pi ′·V Ti· , (1)

where we can regard the term 1√|IPu \{i}|

∑i ′∈IPu \{i} Pi ′· as a certain

virtual user profile w.r.t. the target feedback, denoting the distinctpreference of user u.


Background

Transfer via Joint Similarity Learning (TJSL)

TJSL [Pan et al., 2016] introduces a new similarity term in a similarway to that of FISM, through which the knowledge from the auxiliaryfeedback (i.e., examinations) can be transferred. Then, the preferenceestimation of user u towards item i becomes as follows,

∑i ′∈IPu \{i}

si ′i +∑

j∈IE(`)u

sji , IE(`)u ⊆ IEu , (2)

where∑

j∈IE(`)u

sji = 1√|IE(`)

u |

∑j∈IE(`)

uEj·V T

i· , and the term

1√|IE(`)

u |

∑j∈IE(`)

uEj· can be regarded as a virtual user profile w.r.t. the

auxiliary feedback.


Method

Dual Similarity Learning Model (DSLM)

In TJSL [Pan et al., 2016], two similarities among items are learned.Symmetrically, we define the similarity among users as follows,∑

u′∈UPi \{u}

su′u +∑

w∈UE(`)i

swu, UE(`)i ⊆ UEi , (3)

where∑

u′∈UPi \{u}su′u = 1√

|UPi \{u}|

∑u′∈UPi \{u}

Pu′·UTu·,∑

w∈UE(`)i

swu = 1√|UE(`)

i |

∑w∈UE(`)

iEw ·UT

u·. Intuitively, we can also regard

the term 1√|UPi \{u}|

∑u′∈UPi \{u}

Pu′· and 1√|UE(`)

i |

∑w∈UE(`)

iEw · as the

virtual item profiles w.r.t. the target feedback and auxiliary feedback,respectively.


Method

Prediction Rule

The predicted preference of user u on item i ,

r (`)ui =

∑i ′∈IPu \{i}

si ′i +∑

j∈IE(`)u

sji + bu + bi+∑u′∈UPi \{u}

su′u +∑

w∈UE(`)i

swu, IE(`)u ⊆ IEu , U

E(`)i ⊆ UEi ,

(4)

where IE(`)u is the set of likely-to-prefer items selected from IEu , and

UE(`)i is the set of potential users that are likely to purchase item i

selected from UEi .


Method

Objective Function

The objective function of DSLM is as follows,

minΘ(`), IE(`)

u ⊆IEu ,UE(`)i ⊆UEi

∑(u,i)∈RP∪RA

f (`)ui , (5)

where f (`)ui = 1

2(rui − r (`)ui )2 + λu

2 ||Uu·||2F +λp2∑

u′∈UPi \{u}||Pu′·||2F +

λe2∑

w∈UE(`)i||Ew ·||2F + βu

2 b2u + αv

2 ||Vi·||2F +αp2∑

i ′∈IPu \{i} ||Pi ′·||2F +

αe2∑

j∈IE(`)u||Ej·||2F + βv

2 b2i , and the model parameters are

Θ(`) = {Uu·,Pu′·,Ew ·,Vi·, Pi ′·, Ej·,bu,bi}. Note that RA is a set ofnegative feedback used to complement the target feedback, whererui = 1 if (u, i) ∈ RP and rui = 0 otherwise.


Method

Gradients (1/2)

To learn the parameters Θ(`), we use the stochastic gradient decent(SGD) algorithm and have the gradients of the model parameters for arandomly sampled pair (u, i) ∈ RP ∪RA,

∇Uu· = −eui1√

|UPi \{u}|

∑u′∈UPi \{u}

Pu′· − eui1√|UE(`)

i |

∑w∈UE(`)

i

Ew · + λuUu·,

(6)

∇Vi· = −eui1√

|IPu \{i}|

∑i ′∈IPu \{i}

Pi ′· − eui1√|IE(`)

u |

∑j∈IE(`)

u

Ej· + αv Vi·,

(7)

∇Pu′· = −eui1√

|UPi \{u}|Uu· + λpPu′·,u′ ∈ UPi \{u}, (8)


Method

Gradients (2/2)

∇Ew · =− eui1√|UE(`)

i |Uu· + λeEw ·,w ∈ UE(`)

i , (9)

∇Pi ′· =− eui1√

|IPu \{i}|Vi· + αpPi ′·, i ′ ∈ IPu \{i}, (10)

∇Ej· =− eui1√|IE(`)

u |Vi· + αeEj·, j ∈ I

E(`)u , (11)

∇bu =− eui + βubu,∇bi = −eui + βv bi , (12)

where eui = rui − rui is the difference between the true preference andthe predicted preference.


Method

Update Rules

We have the update rules,

θ(`) ← θ(`) − γ∇θ(`), (13)

where γ is the learning rate, and θ(`) ∈ Θ(`) is a model parameter to belearned.


Method

Identification of IE(`)u and UE(`)

i

Note that we identify UE(`)i and IE(`)

u by the following way,For each user u ∈ UEi , we estimate the preference for the targetitem i , i.e., r (`)

ui , and take τ |UEi |(τ ∈ (0,1]) users with the highestscores as the potential users that are likely to purchase the targetitem i .For each j ∈ IEu , similarly, we estimate the preference r (`)

uj , andtake τ |IEu | items with the highest scores as the candidate items.

Finally, we save the model and data of the last L0 epochs. Theestimated preference is the average value of r (`)

ui , where ` ranges fromL− L0 + 1 to L.


Method

Algorithm of DSLM

1: Input: RP ,RE , T , L, L0, ρ, γ, λ∗, α∗, β∗.2: Output: UE(`), IE(`) and Θ(`), ` = L− L0 + 1, ..., L.3: Let UE(1) = UE , IE(1) = IE , τ = 14: for ` = 1, ..., L do5: Initialize the model Θ(`)

6: for t = 1, ..., T do7: Randomly sampleRA ⊂ R\RP with |RA| = ρ|RP |8: for t2 = 1, ..., |RP ∪RA| do9: Randomly pick up (u, i) ∈ RP ∪RA

10: Calculate r (`)ui via (4)

11: Calculate∇θ, θ ∈ Θ(`) via (6)−(12)12: Update θ, θ ∈ Θ(`) via (13)13: end for14: end for15: if ` > L− L0 then16: Save the current model and data (Θ(`),UE(`), IE(`))

17: end if18: if L > 1 and L > ` then19: τ ← τ × 0.920: Select IE(`+1)

u with |IE(`+1)u | = τ |IEu | for each u

21: Select UE(`+1)i with |UE(`+1)

i | = τ |UEi | for each i

22: end if23: end for


Experiments

Datasets and Evaluation Metrics

For direct comparison, we use the three datasets fromTJSL [Pan et al., 2016], including ML100K, ML1M andAlibaba2015.

Table: Description of the datasets used in the experiments.

Dataset |U| |I| |RP | |RE | |RP te |ML100K 943 1682 9438 45285 2153ML1M 6040 3952 90848 400083 45075

Alibaba2015 7475 5275 9290 62659 2322

For performance evaluation, we adopt two commonly usedranking-oriented metrics, i.e., Precision@5 and NDCG@5.

The data and code are available athttp://csse.szu.edu.cn/staff/panwk/publications/DSLM/


http://csse.szu.edu.cn/staff/panwk/publications/DSLM/

Experiments

Baselines and Parameter Configurations

For comparative studies, we include the competitive methods asfollows.

BPR (Bayesian personalized ranking) [Rendle et al., 2009]FISM (factored item similarity model) [Kabbur et al., 2013]TJSL (transfer via joint similarity learning)[Pan et al., 2016]RBPR (role-based Bayesian personalizedranking) [Peng et al., 2016]

For parameter configurations, we fix the number of latentdimension d = 20, the learning rate γ = 0.01 and samplingparameter ρ = 3, and search the tradeoff parameters from{0.001,0.01,0.1} and the best iteration number T from{100,500,1000} via NDCG@5 performance.


Experiments

Main Results

Dataset BPR FISM TJSL RBPR DSLM

ML100K Precision@5 0.0552± 0.0006 0.0628± 0.0015 0.0697± 0.0016 0.0654± 0.0013 0.0694± 0.0014

NDCG@5 0.0874± 0.0020 0.1029± 0.0017 0.1133± 0.0047 0.1058± 0.0047 0.1140± 0.0022

ML1M Precision@5 0.0928± 0.0008 0.0971± 0.0013 0.1012± 0.0011 0.1086± 0.0009 0.1107± 0.0015

NDCG@5 0.1121± 0.0010 0.1189± 0.0008 0.1248± 0.0010 0.1327± 0.0016 0.1353± 0.0012

Alibaba2015 Precision@5 0.0050± 0.0006 0.0046± 0.0003 0.0071± 0.0004 0.0076± 0.0005 0.0087± 0.0006

NDCG@5 0.0138± 0.0017 0.0126± 0.0009 0.0200± 0.0008 0.0220± 0.0013 0.0269± 0.0017

Observations:In all cases, TJSL, RBPR and DSLM perform significantly betterthan BPR and FISM, which shows the effectiveness of introducingthe auxiliary feedback to assist the task of learning users’preferences; andDSLM performs better than TJSL and RBPR in most cases, e.g., itis the best on ML1M and Alibaba2015, and is comparable withTJSL on ML100K, which clearly shows the usefulness of the dualsimilarity in capturing the correlations between users and items.


Conclusions and Future Work

Conclusion

We propose a novel solution, i.e., dual similarity learning model(DSLM), for a recent and important recommendation problemcalled heterogeneous one-class collaborative filtering (HOCCF).

In particular, we jointly learn the dual similarity among both usersand items so as to exploit the complementarity well. Extensiveempirical studies on three public datasets clearly show theeffectiveness of our solution.


Conclusions and Future Work

Future Work

For future work, we are interested in further improving our DSLMby exploring pairwise ranking models [Yu et al., 2018] and deeplearning models [Shi et al., 2019].


Thank you

Thank you!

We thank Weike Pan and Zhong Ming are corresponding authorsfor this work, the anonymous reviewers for their expert andconstructive comments and suggestions, and the support ofNational Natural Science Foundation of China Nos. 61872249,61836005 and 61672358.


References

Kabbur, S., Ning, X., and Karypis, G. (2013).FISM: Factored item similarity models for top-n recommender systems.KDD’13, pages 659–667.

Pan, W., Liu, M., and Ming, Z. (2016).Transfer learning for heterogeneous one-class collaborative filtering.IEEE Intelligent Systems, 31(4):43–49.

Peng, X., Chen, Y., Duan, Y., Pan, W., and Ming, Z. (2016).RBPR: Role-based Bayesian personalized ranking for heterogeneous one-class collaborative filtering.IFUP’16.

Rendle, S., Freudenthaler, C., Gantner, Z., and Lars, S.-T. (2009).BPR: Bayesian personalized ranking from implicit feedback.UAI’09, pages 452–461.

Shi, C., Han, X., Li, S., Wang, X., Wang, S., Du, J., and Yu, P. (2019).Deep collaborative filtering with multi-aspect information in heterogeneous networks.IEEE Transactions on Knowledge and Data Engineering.

Yu, R., Zhang, Y., Ye, Y., Wu, L., Wang, C., Liu, Q., and Chen, E. (2018).Multiple pairwise ranking with implicit feedback.CIKM’18, pages 1727–1730.


Dual Similarity Learning for Heterogeneous One...

Documents

Transcript of Dual Similarity Learning for Heterogeneous One...