Topics in Cryptography Lecture 6 Topic: Chosen Ciphertext Security Lecturer: Moni Naor.
Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
-
date post
20-Dec-2015 -
Category
Documents
-
view
231 -
download
7
Transcript of Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
Foundations of Privacy
Lecture 3
Lecturer: Moni Naor
Recap of last week’s lecture• The Simulation Paradigm for Defining and Proving
Security of Cryptographic Protocols
• The Basic Impossibility of Disclosure Prevention: – cannot hope to obtain results that are based on all
possible auxiliary information
• Differential Privacy– For all adjacent databases – output probability is very
close
Extractors and Fuzzy Extractors
Desirable Properties from a sanitization mechanism
• Composability– Applying the sanitization several time yields a graceful
degradation – q releases , each -DP, are q¢ -DP
• Robustness to side information– No need to specify exactly what the adversary knows
Differential Privacy: satisfies both…
Adjacency: D+Me and D-Me
Differential Privacy
Protect individual participants:Probability of every bad event - or any event - increases
only by small multiplicative factor when I enter the DB.
May as well participate in DB…
εε-differentially private sanitizer -differentially private sanitizer AAFor all DBs D, all Me and all events T
PrA[A(D+Me) 2 T]PrA[A(D-Me) 2 T]
≤ eε ≈ 1+ε e-ε ≤
Handles aux input
Dwork, McSherry, Nissim and Smith
5
Differential Privacy
Bad Responses: X XX
Pr [response]
ratio bounded
A gives -differential privacy if for all neighboring D1 and D2, and
all T µ range(A ): Pr[ A (D1) 2 T] ≤ ePr[ A (D2) 2 T]Neutralizes all linkage attacks.Composes unconditionally and automatically: Σi i
Differential Privacy: Important Properties
Handles auxiliary informationComposes naturally• A1(D) is ε1-diffP• for all z1, A2(D,z1) is ε2-diffP,Then A2(D,A1(D)) is (ε1+ε2)-diffPProof:
for all adjacent D,D’ and (z1,z2):e-ε1 ≤ P[z1] / P’[z1] ≤ eε1 e-ε2 ≤ P[z2] / P’[z2] ≤ eε2
e-(ε1+ε2) ≤ P[(z1,z2)]/P’[(z1,z2)] ≤ eε1+ε2
P[z1] = Pr z~A1(D)[z=z1]
P’[z1] = Pr z~A1(D’)[z=z1]
P[z2] = Pr z~A2(D,z1)[z=z2]
P’[z2] = Pr z~A2(D’,z1)[z=z2]
Example: NO Differential Privacy
U set of (name,tag 2{0,1}) tuplesOne counting query: #of participants with tag=1
Sanitizer A: choose and release a few random tagsBad event T: Only my tag is 1, my tag releasedPrA[A(D+Me) 2 T] ≥ 1/n
PrA[A(D-Me) 2 T] = 0
Not diff private for any ε!
PrA[A(D+Me) 2 T]
PrA[A(D-Me) 2 T]≤ eε ≈ 1+ε e-ε ≤
Size of ε
How small can ε be?• Cannot be negligibleWhy?• Hybrid argument
How large can it be?• Think of a small constant
D, D’ – totally unrelated databasesUtility should be very different
Consider sequence
D0=D, D1, D2, …, Dn =D’where Di and Di+1 adjacent db.
For each output set T
Prob[T|D] ¸ Prob[T|D’] ¢ eεn
Answering a single counting query
U set of (name,tag2 {0,1}) tuplesOne counting query: #of participants with tag=1
Sanitizer A: output #of 1’s + noiseDifferentially private! If choose noise properly
Choose noise from Laplace distribution
0 1 2 3 4 5-1-2-3-4
Laplacian Noise
Laplace distribution Y=Lap(b) has density function Pr[Y=y] =1/2b e-|y|/b
Standard deviation: O(b)Take b=1/ε, get that Pr[Y=y] Ç e-|y|
Laplacian Noise: ε-Privacy
Take b=1/ε, get that Pr[Y=y] Ç e-|y|
Release: q(D) + Lap(1/ε)
For adjacent D,D’: |q(D) – q(D’)| ≤ 1
For output a: e- ≤ Prby D[a]/Prby D’[a] ≤ e
0 1 2 3 4 5-1-2-3-4
0 1 2 3 4 5-1-2-3-4
Laplacian Noise: Õ(1/ε)-Error
Take b=1/ε, get that Pr[Y=y] Ç e-|y|
Pry~Y[|y| > k·1/ε] = O(e-k)
Expected error is 1/ε, w.h.p error is Õ(1/ε)
Randomized Response• Randomized Response Technique [Warner 1965]
– Method for polling stigmatizing questions– Idea: Lie with known probability.
• Specific answers are deniable• Aggregate results are still valid
• The data is never stored “in the plain”
1
noise+
0
noise+
1
noise+
…
“trust no-one”
Popular in DB literature
Randomized Response with Laplacian Noise
Initial idea: each user i, on input xi 2 {0, 1}Add to xi independent Laplace noise with
magnitude 1/ε
Privacy: since each increment protected by Laplace noise – differentially private whether xi is 0 or 1
Accuracy: noise cancels out, error Õ(√T)
Is it too high?
T – total number of users
0 1 2 3 4 5-1-2-3-4
Scaling Noise to Sensitivity
Global sensitivity of query q:Un→RGSq = maxD,D’ |q(D) – q(D’)|
For a counting query q: GSq=1
Previous argument generalizes:For any query q:Un→ R
release q(D) + Lap(GSq/ε)• ε-private• error Õ(GSq/ε)
[0,n]
Scaling Noise to SensitivityMany dimensionsGlobal sensitivity of query q:Un→Rd
GSq = maxD,D’ ||q(D) – q(D’)||1
Previous argument generalizes:For any query q:Un→ Rd
release q(D) + (Y1, Y2, … Yd)
– Each Yi independent Lap(GSq/ε)
• ε-private• error Õ(GSq/ε)
Example: Histograms
• Say x1, x2, ..., xn in domain U• Partition U into d disjoint bins• q(x1, x2, ..., xn) = (n1, n2, ..., nd) where
nj = #{i : xi in j-th bin}
• GSq =2
• Sufficient to add Lap(2/ε) noise to each countProblem: might not look like a histogram
Covariance Matrix
• Suppose each person’s data is a real vector (r1, r2, ..., rn )
• • Database is a matrix X• The covariance matrix of X is• (roughly) the matrix• Entries measure correlation between attributes• First step of many analyses, e.g. PCA
Distance to DP with Property
• Suppose P = set of “good” databases– well-clustered databases
• Distance to P = # points in x that must be changed to put x in P
• Always has GS = 1
• Example:– Distance to data set with “good clustering”
P
x
K Means
• A clustering algorithm with iteration• Always keeping k centers
MedianMedian of x1, x2, ..., xn 2 [0,1]
• X= 0,…,0,0,1,…,1 X’= 0,…,0,1,1,…,1
median(X) = 0 median(X’) = 1• GSmedian = 1• Noise magnitude: 1 . Too much noise!• But for “most” neighbor databases X, X’
|median(X) − median(X’)|is small.Can we add less noise on ”good” instances?
(n-1)/2 (n-1)/2 (n-1)/2 (n-1)/2
Global Sensitivity vs. Local sensitivity
• Global sensitivity is worst case over inputsLocal sensitivity of query q at point DLSq(D)= maxD’ |q(D) – q(D’)|
• Reminder: GSq(D) = maxD LSq(D)
• Goal: add less noise when local sensitivity is lower
• Problem: can leak information by amount of noise
Local sensitivity of Median
• For X = x1, x2, ..., xn
• LSmedian(X) = max(xm − xm−1, xm+1 − xm)
x1, x2, ..., xm-1, xm, xm+1, ..., xn
Sensitivity of Local Sensitivity of Median
Median of x1, x2, ..., xn 2 [0,1]
• X= 0,…,0,0,0,0,1,…,1 X’= 0,…,0,0,0,1,1,…,1
LS(X) = 0 LS(X’) = 1
Noise magnitude must be an insensitive function!
(n-3)/2 (n-3)/2 (n-3)/2 (n-3)/2
Smooth Upper Bound
• Compute a “smoothed” version of local sensitivity• Design sensitivity function S(X)
• S(X) is an -smooth upper bound on LSf(X) if:– for all x: S(X) ¸ LSf(X)
– for all neighbors X, X’ : S(X) · eS(X’)
• Theorem: if A(x) = f(x) + noise(S(x)/ε) then A is 2ε-differentially private.
Smooth sensitivity
• Smooth sensitivity Sf*(X)= maxY {LSf(Y)e- dist(x,y) }
Claim: if S(X) is an -smooth upper bound on LSf(X) for Smooth sensitivity
The Exponential Mechanism McSherry Talwar
A general mechanism that yields • Differential privacy• May yield utility/approximation• Is defined (and evaluated) by considering all possible answers
The definition does not yield an efficient way of evaluating it
Application: Approximate truthfulness of auctions• Collusion resistance• Compatibility
Example of the Exponential Mechanism
• Data: xi = website visited by student i today• • Range: Y = {website names}• For each name y, let q(y; X) = #{i : xi = y}
Goal: output the most frequently visited site• Procedure: Given X, Output website y with probability prop
to e q(y,X) • • Popular sites exponentially more likely than rare ones• Website scores don’t change too quickly70
Projects
Report on a paper• Apply a notion studied to
some known domain• Checking the state of
privacy is some setting
• Privacy in GWAS• Privacy in crowd sourcing• Privacy Preserving Wordle• Unique identification
bounds• How much worse are
differential privacy guarantees in estimation
• Contextual Privacy
Planned Topics
Privacy of Data Analysis• Differential Privacy
– Definition and Properties– Statistical databases– Dynamic data
• Privacy of learning algorithms
• Privacy of genomic data
Interaction with cryptography• SFE• Voting• Entropic Security• Data Structures• Everlasting Security• Privacy Enhancing Tech.
– Mixed nets
Course InformationFoundation of Privacy - Spring 2010
Instructor: Moni NaorWhen: Mondays, 11:00--13:00 (2 points)Where: Ziskind 1
• Course web page: www.wisdom.weizmann.ac.il/~naor/COURSE/foundations_of_privacy.html
• Prerequisites: familiarity with algorithms, data structures, probability theory, and linear algebra, at an undergraduate level; a basic course in computability is assumed.
• Requirements:– Participation in discussion in class
• Best: read the papers ahead of time– Homework: There will be several homework assignments
• Homework assignments should be turned in on time (usually two weeks after they are given)!
– Class Project and presentation– Exam : none planned
Office: Ziskind 248Phone: 3701E-mail: moni.naor@