An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University...
![Page 1: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/1.jpg)
An architecture for Privacy Preserving Mining of Client
Information
Jaideep Vaidya
Purdue University
[email protected] is joint work with Murat Kantarcioglu
![Page 2: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/2.jpg)
Data Mining
• Data Mining is the process of finding new and potentially useful knowledge from data.
• Typical Algorithms– Classification– Association Rules– Clustering
![Page 3: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/3.jpg)
What is Privacy Preserving Data Mining?
• Term appeared in 2000:– Agrawal and Srikant, SIGMOD
• Added noise to data before delivery to the data miner
• Technique to reduce impact of noise learning a decision tree
– Lindell and Pinkas, CRYPTO• Two parties, each with a portion of the data• Learn a decision tree without sharing data
• Different Concepts of Privacy!
![Page 4: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/4.jpg)
Related Work
• Perturbation Approaches– Agrawal, Srikant, SIGMOD 2000– Agrawal, Aggarwal, – Evfimievski et al, SIGKDD 2002
• SMC approaches– Lindell, Pinkas, CRYPTO 2000– Kantarcioglu, Clifton, DMKD 2002– Vaidya, Clifton, SIGKDD 2002
• Other approaches– Rizvi, Haritsa, VLDB 2002– Du, Atallah,
![Page 5: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/5.jpg)
Motivation
PPDM
Security
Accuracy Efficiency
Improving any one aspect typically degrades the other two
![Page 6: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/6.jpg)
Motivating Example
• Assume that an attribute Y is perturbed by uniform random variable with range [-2,2].
• If we see Y’i = Yi + r = 5, then Yi [3,7]
• Assume after reconstruction of the distribution( The basic assumption of all perturbation techniques is that we can reconstruct distributions),
• This implies Yi [4,7]
0 4}3{Pr Y
![Page 7: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/7.jpg)
Motivating Example (Cont.)
• Even worse, assume that
• Therefore we could infer that
9.0}0.10.5|76{Pr TY
9.0}0.10.5{Pr iT
0.8 7}6{Pr iY
![Page 8: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/8.jpg)
Motivation
• Perfect Privacy is achievable without compromising on Accuracy
• Users do not want to be permanently online (to engage in some complex protocol)
• Outside parties can be used as long as there are strict bounds on what information they receive and what operations they are allowed to do
![Page 9: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/9.jpg)
Key Insight
• Consider using non-colluding, untrusted, semi-honest third parties to carry out computation
• Non-colluding– Should not collude with any of the original users or any
of the other parties• Untrusted
– Throughout the process, should never gain access to any information (in the clear), as long as the first assumption (non-colluding) holds true
• Semi-honest– All parties correctly follow the protocol, but are then
free to use whatever information they see during the execution of the protocols in any way
![Page 10: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/10.jpg)
The Architecture
• Use three sites with the properties defined earlier:
• Original Site (OS)– Site that collects share of the information from all
clients, and will learn the final result of the data mining process
• Non-Colluding Storage Site (NSS)– Used for storing shared part of user information
• Processing Site (PS)– Used to do data mining efficiently
![Page 11: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/11.jpg)
Finding Frequent Itemsets
OS NSS
User 1 User n
ID1ID2ID3
011
101
ID1 - 1ID2 - 1ID3 - 0
ID1 - 1ID2 - 0ID3 - 1
ID4ID5
01
10
ID4 - 1ID5 - 1
ID4 - 1ID5 - 0
ID1 - 1ID2 - 1ID3 - 0ID4 - 1ID5 - 1
ID1 - 1ID2 - 0ID3 - 1ID4 - 1ID5 - 0
From this point on, the end users are not involved in the remaining protocol, and so they do not have to stay online
![Page 12: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/12.jpg)
Interlude
• For our protocol,• total number of transactions, n = 5• number of fake transactions to add (fraction of
total), epsilon = 0.2• epsilon*n = 5*0.2 = 1• Original Site decides to make some of the fake
transactions supporting the itemset, while some don’t (it knows the exact count)
![Page 13: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/13.jpg)
ID1 - 1ID2 - 1ID3 - 0ID4 - 1ID5 - 1ID6 - 0
ID1 - 1ID2 - 0ID3 - 1ID4 - 1ID5 - 0ID6 - 1
Finding Frequent Itemsets
OS NSS
ID1 - 1ID2 - 1ID3 - 0ID4 - 1ID5 - 1
ID1 - 1ID2 - 0ID3 - 1ID4 - 1ID5 - 0
PS
ID6 - 0 ID6 - 1
325146
325146
011110
100111
111001
Result= 4
Result= 3
![Page 14: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/14.jpg)
Doing Secure Data Mining
• Once the support count of an itemset has been calculated, the process for finding association rules securely is well known
• Other data mining algorithms become easily possible by modifying the process
![Page 15: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/15.jpg)
Communication Cost:
• For each k-itemset at least bit must be transferred for the exact result.
• Let us assume that number of candidate k-itemsets is Ck .
• Let assume we have at most m-itemset candidates.
• Total communication cost for the association rule mining would be
)*( knO
)1(***1
m
ii inCO
![Page 16: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/16.jpg)
Security Analysis
• NSS view: The NSS only gets to see random numbers. Thus, it does not learn anything.
• OS view: OS learns the support count of the itemsets but does not learn which user supports any particular itemset. Essentially,
X} supports user{Pr X} supports user{Pr , jiji
![Page 17: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/17.jpg)
Security Analysis (Cont.)
• PS learns an upper bound on the
support count but it does not know for which itemset. (Ordering of the attributes randomized)
• Because of the addition of fake items and random ordering, it has no way of correlating the itemsets to any particular user.
![Page 18: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/18.jpg)
Security Analysis
As long as the three sites (OS, NSS and PS) do not collude with each other, they do not learn anything
![Page 19: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/19.jpg)
Benefits of the framework
• Perfect individual privacy is achieved
• Users do not have to stay online for a complicated protocol. Once they have split their information among the storage sites, they are done
![Page 20: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/20.jpg)
Future Work
• An extremely efficient way of generating one-itemsets securely is possible. Using this instead of the general method, will lead to great savings in communication
• Sampling should be done to further lower communication cost and increase efficiency
![Page 21: An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University jsvaidya@cs.purdue.edu This is joint work with Murat.](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d3f5503460f94a1912d/html5/thumbnails/21.jpg)
Conclusion
• Privacy and Efficiency are both important for Secure Data Mining. Compromising on either is not practical
• A framework for privacy preserving data mining has been suggested
• Need to implement and evaluate true efficiency, after including improvements such as sampling