Privacy-Preserving Eigentaste-based Collaborative Filtering
description
Transcript of Privacy-Preserving Eigentaste-based Collaborative Filtering
Privacy-Preserving Privacy-Preserving Eigentaste-based Eigentaste-based
Collaborative FilteringCollaborative Filtering
Ibrahim Yakut and Huseyin Polat{iyakut,polath}@anadolu.edu.tr
Department of Computer Engineering
Anadolu University, Turkey
Collaborative Filtering(CF)Collaborative Filtering(CF)
21.04.23 IWSEC'07 2
ProblemInformation Overload
Solution Collaborative
Filtering
Collaborative Filtering Collaborative Filtering Recent technique for filtering and
recommendationApplications
◦E-commerce◦Search engines◦Direct recommendations
21.04.23 IWSEC'07 3
21.04.23 IWSEC'074
Collaborative Filtering ProcessCollaborative Filtering Process
i1 i2 iq im
u1
u2
ua
un
Active user
Prediction
Paq = Prediction on item q for active user
Item for which prediction is sought
Proposed by Goldberg et al in 2001The main feature: Online
computation in constant time.Secondly, flexibly usage of several
clustering algorithms.Based on Principal Component
AnalysisApplication in Jester: online joke
recommendation. http://eigentaste.berkeley.edu/
21.04.23 IWSEC'07 5
EigenTasteEigenTaste
Eigentaste AlgorithmEigentaste Algorithm
Step.1 Find correlation matrix of AStep.2 Find eigenvectors(E) and eigenvalues() of
C
21.04.23 IWSEC'07 6
AAn
C T
1
1
D:nxmA: nxk
User-item matrix
n us
ers
m items k gauge items
Correlation Matrix of A
Eigentaste Algorithm Eigentaste Algorithm cont’dcont’dStep.3 Take first m=2 eigenvectors and
project A. x = AEm
T = AE2T
Step.4 Cluster the projected data using RRC.
21.04.23 IWSEC'07 7
Recursive Rectangular Clustering(RRC)
Step.5 Construct a lookup table with mean of nongauge item ratings for each clusters.
Eigentaste- onlineEigentaste- online
When active user(a) enters,◦Rate the items in gauge set.◦Using PCs of his data, a is projected◦Find representative cluster◦Recommend objects based on
preconstructed lookup table.
21.04.23 IWSEC'07 8
Disapprove Approve
MotivationMotivationMentioned algorithm is succesfulBut due to privacy risks, collection
of truthful and trustworthy data is challenge!!!
Therefore, how can users give data for CF purposes without jeopardizing their privacy?
Is it possible to use perturbed data in Eigentaste-based algorithms?
21.04.23 IWSEC'07 9
Modifications on OriginalModifications on OriginalNormalization:
◦Instead of item mean and std, user mean and std.
Clustering:◦Instead of RRC, k-means clustering is
used.Prediction
◦Instead of look up table directly, denormalize then predict.
21.04.23 IWSEC'07 10
u
uujuj
vvz
qaaaq zvp
Masking dataMasking data
21.04.23 IWSEC'07 11
CF Process
Central Database
User1
User2 Usern-1 Usern
+R1 +R2+Rn-1 +Rn
Randomized Pertubation
Technique (RPT)Aggrawal&Srikant,
2000
Masking ProcessMasking Process
1. Users and servers agree on γ, θ, δ
2. Each user u compute z-scores of their ratings
3. u selects σu over [0, γ] uniformly randomly, use it as std of masking data
4. u selects ru over [0,1], if ru<= θ, use uniform otherwise gaussian
5. u selects xer over [0, δ]. %xer of unfilled cells to be filled with noise
21.04.23 IWSEC'07 12
γ θ δ
Masking ProcessMasking Processu creates mu number of random
numbers where◦mu= number of rated cell+xer
◦std=σu, μ=0, gaussian or uniform(√3 .σu) wrt ru
Mask his private data by adding this noise data. Here empty cells are selected randomly.
21.04.23 IWSEC'07 13
Eigentaste-based CF with Eigentaste-based CF with PrivacyPrivacyNow server holds disguised user-
item matrix, D’and user-gauge matrix A’
In some steps, the effects of perturbation must be considered and handled! ◦Correlation matrix construction◦Projection◦Active user’s entry of gauge set
21.04.23 IWSEC'07 14
Correlation Matrix Correlation Matrix ConstrctionConstrction
21.04.23 IWSEC'07 15
If f≠g means for nondiagonal entries of C’
Expected values 0 0 0 since μ=0
n
uuguf zz
nC
11
1'Then
Correlation Matrix Correlation Matrix ConstrctionConstrction
21.04.23 IWSEC'07
If f=g means for diagonal entries of C’
Expected value is 0 since μ=0
n
uuf
n
uruf
n
uuf z
nr
nz
nC
1
2
1
22
1
2
1
1
1
1
1
1'
Then, assumming n≈n-1
ProjectionProjection
21.04.23 IWSEC'07 17
Similarly, expected values are 0, then approximated matrix is obtained
TEAx 2
k
lljljililij Rerzx
1
))((
k
llj
k
l
k
l
k
lilljilljilljil RrerRzez
1 1 1 1
k
lljilez
1
Remaining PartsRemaining PartsAfter determining clusters depending
on estimated data◦Z-score means of nongauge items are
stored in look up table.◦When active user, enters disguised gauge
ratings the effect of randomization is got rid of by the same way.
◦The representative cluster is defined, corresponding value from the table denormalized and the prediction is obtained!
21.04.23 IWSEC'07 18
ExperimentsExperimentsData Set
◦Jester is a web-based joke data 17,988 users, 100 jokes Ratings over a range (-10,+10),continuos 50% of all ratings are present
Evaluation Metrics
21.04.23 IWSEC'07 19
d
rpMAE
d
iii
1
minmax rr
MAENMAE
p:predicted valuer:original valued:size of test setrmax:max rating
rmin: min rating
Eigentaste vs. ModifiedEigentaste vs. Modified9000 training users, 5000 test
users(10 test items)
21.04.23 IWSEC'07 20
MAE NMAE
Eigentaste 3,740 0,187
Modified Eigentaste 3,334 0,167
Protecting active users’ Protecting active users’ privacyprivacy
M1 M2 M3
MAE 3,3508 3,4710 3,4807
NMAE 0,1676 0,1735 0,1741
21.04.23 IWSEC'07 21
M1: No disguise, but requires additional costM2: Just considering gauge mean and stdM3: Considering whole mean and std
Accuracy vs. Varying Accuracy vs. Varying Numbers of UsersNumbers of Users
n 500 1000 2000 4000 8000
MAE 4,678 4,242 3,832 3,624 3,483
NMAE 0,234 0,212 0,192 0,181 0,174
21.04.23 IWSEC'07 22
Fix 5000 users and random 10 test items
•By increasing number of users, accuracy improves since random numbers will converge to zero•n>=2000, results are satisfying!
Accuracy with Varying Accuracy with Varying δδ ValuesValuesδ 0 35 70 100
MAE 3,4460 3,4567 3,4615 3,4710
NMAE 0,1723 0,1728 0,1730 0,1735
21.04.23 IWSEC'07 23
Accuracy slightly becomes better with decreasing δ values!
ConclusionConclusionWe showed that how to achieve
privacy preserving CF tasks using Eigentaste-based algorithms?
We will study ◦whether we can employ other
clustering algorithms◦How to improve recommendation
qualitiesby using correlation based CF algorithms.
21.04.23 IWSEC'07 24
Thanks for your interests!Questions?