1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen...
-
date post
21-Dec-2015 -
Category
Documents
-
view
219 -
download
1
Transcript of 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen...
![Page 1: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/1.jpg)
1
Deriving Private Information from Randomized Data
Zhengli HuangWenliang (Kevin) Du
Biao Chen
Syracuse University
![Page 2: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/2.jpg)
2
Privacy-Preserving Data Mining
Data Mining
Data Collection
Data Disguising
Central Database
ClassificationAssociation RulesClustering
![Page 3: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/3.jpg)
3
Random Perturbation
+
Original Data X Random Noise R Disguised Data Y
![Page 4: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/4.jpg)
4
How Secure is Randomization Perturbation?
![Page 5: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/5.jpg)
5
A Simple Observation We can’t perturb the same number
for several times. If we do that, we can estimate the
original data: Let t be the original data, Disguised data: t + R1, t + R2, …, t +
Rm
Let Z = [(t+R1)+ … + (t+Rm)] / m Mean: E(Z) = t
![Page 6: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/6.jpg)
6
This looks familiar … This is the data set (x, x, x, x, x, x, x,
x) Random Perturbation:
(x+r1, x+r2,……, x+rm)
We know this is NOT safe.
Observation: the data set is highly correlated.
![Page 7: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/7.jpg)
7
Let’s Generalize! Data set: (x1, x2, x3, ……, xm) If the correlation among data
attributes are high, can we use that to improve our estimation (from the disguised data)?
![Page 8: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/8.jpg)
8
Data Reconstruction (DR)
Original Data X
Disguised Data Y
Distributionof random noiseReconstructed Data X’
What’s theirdifference?
Data Reconstruction
![Page 9: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/9.jpg)
9
Reconstruction Algorithms
Principal Component Analysis (PCA)
Bayes Estimate Method
![Page 10: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/10.jpg)
10
PCA-Based Data Reconstruction
![Page 11: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/11.jpg)
11
PCA-Based Reconstruction
DisguisedInformation
ReconstructedInformation
Squeeze
Information Loss
![Page 12: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/12.jpg)
12
How? Observation:
Original data are correlated. Noise are not correlated.
Principal Component Analysis Useful for lossy compression
![Page 13: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/13.jpg)
13
PCA Introduction
The main use of PCA: reduce the dimensionality while retaining as much information as possible.
1st PC: containing the greatest amount of variation.
2nd PC: containing the next largest amount of variation.
![Page 14: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/14.jpg)
14
For the Original Data They are correlated. If we remove 50% of the
dimensions, the actual information loss might be less than 10%.
![Page 15: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/15.jpg)
15
For the Random Noises They are not correlated. Their variance is evenly distributed
to any direction. If we remove 50% of the
dimensions, the actual noise loss should be 50%.
![Page 16: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/16.jpg)
16
PCA-Based Reconstruction
Disguised Data
Reconstructed Data
PCA Compression
De-Compression
![Page 17: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/17.jpg)
17
Bayes-Estimation-Based Data Reconstruction
![Page 18: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/18.jpg)
18
A Different Perspective
What is theMost likely X?
Disguised Data Y
Possible XPossible XPossible X
Random Noise
![Page 19: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/19.jpg)
19
The Problem Formulation For each possible X, there is a
probability: P(X | Y). Find an X, s.t., P(X | Y) is
maximized. How to compute P(X | Y)?
![Page 20: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/20.jpg)
20
The Power of the Bayes Rule
P(X|Y) is difficult!
P(X|Y)?
P(Y|X)
P(Y)
P(X)*
![Page 21: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/21.jpg)
21
Computing P(X | Y)? P(X|Y) = P(Y|X)* P(X) / P(Y) P(Y|X): remember Y = X + R P(Y): A constant (we don’t care) How to get P(X)?
This is where the correlation can be used. Assume Multivariate Gaussian Distribution
The parameters are unknown.
![Page 22: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/22.jpg)
22
Multivariate Gaussian Distribution
A Multivariate Gaussian distribution Each variable is a Gaussian distribution
with mean i Mean vector = (1 ,…, m) Covariance matrix
Both and can be estimated from Y
So we can get P(X)
![Page 23: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/23.jpg)
23
Bayes-Estimate-based Data Reconstruction
Original X Disguised Data Y
Randomization
Estimated X Which X maximizes
P(X|Y)
P(X)P(Y|X)
![Page 24: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/24.jpg)
24
Evaluation
![Page 25: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/25.jpg)
25
Increasing the Number of Attributes
![Page 26: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/26.jpg)
26
Increasing Eigenvalues of the Non-Principal Components
![Page 27: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/27.jpg)
27
How to improve Random Perturbation?
![Page 28: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/28.jpg)
28
Observation from PCA
How to make it difficult to squeeze out noise? Make the correlation of the noise
similar to the original data. Noise now concentrates on the
principal components, like the original data X.
How to get the correlation of X?
![Page 29: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/29.jpg)
29
Improved Randomization
![Page 30: 1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d5a5503460f94a3ad22/html5/thumbnails/30.jpg)
30
Conclusion And Future Work When does randomization fail:
Answer: when the data correlation is high.
Can it be cured? Using correlated noise similar to the original data
Still Unknown: Is the correlated-noise approach really
better? Can other information affect privacy?