Foundations of Privacy Lecture 3 Lecturer: Moni Naor.

Foundations of Privacy

Lecture 3

Lecturer: Moni Naor

Recap of last week’s lecture• The Simulation Paradigm for Defining and Proving

Security of Cryptographic Protocols

• The Basic Impossibility of Disclosure Prevention: – cannot hope to obtain results that are based on all

possible auxiliary information

• Differential Privacy– For all adjacent databases – output probability is very

close

Extractors and Fuzzy Extractors

Desirable Properties from a sanitization mechanism

• Composability– Applying the sanitization several time yields a graceful

degradation – q releases , each -DP, are q¢ -DP

• Robustness to side information– No need to specify exactly what the adversary knows

Differential Privacy: satisfies both…

Adjacency: D+Me and D-Me

Differential Privacy

Protect individual participants:Probability of every bad event - or any event - increases

only by small multiplicative factor when I enter the DB.

May as well participate in DB…

εε-differentially private sanitizer -differentially private sanitizer AAFor all DBs D, all Me and all events T

PrA[A(D+Me) 2 T]PrA[A(D-Me) 2 T]

≤ eε ≈ 1+ε e-ε ≤

Handles aux input

Dwork, McSherry, Nissim and Smith

5

Differential Privacy

Bad Responses: X XX

Pr [response]

ratio bounded

A gives -differential privacy if for all neighboring D1 and D2, and

all T µ range(A ): Pr[ A (D1) 2 T] ≤ ePr[ A (D2) 2 T]Neutralizes all linkage attacks.Composes unconditionally and automatically: Σi i

Differential Privacy: Important Properties

Handles auxiliary informationComposes naturally• A1(D) is ε1-diffP• for all z1, A2(D,z1) is ε2-diffP,Then A2(D,A1(D)) is (ε1+ε2)-diffPProof:

for all adjacent D,D’ and (z1,z2):e-ε1 ≤ P[z1] / P’[z1] ≤ eε1 e-ε2 ≤ P[z2] / P’[z2] ≤ eε2

e-(ε1+ε2) ≤ P[(z1,z2)]/P’[(z1,z2)] ≤ eε1+ε2

P[z1] = Pr z~A1(D)[z=z1]

P’[z1] = Pr z~A1(D’)[z=z1]

P[z2] = Pr z~A2(D,z1)[z=z2]

P’[z2] = Pr z~A2(D’,z1)[z=z2]

Example: NO Differential Privacy

U set of (name,tag 2{0,1}) tuplesOne counting query: #of participants with tag=1

Sanitizer A: choose and release a few random tagsBad event T: Only my tag is 1, my tag releasedPrA[A(D+Me) 2 T] ≥ 1/n

PrA[A(D-Me) 2 T] = 0

Not diff private for any ε!

PrA[A(D+Me) 2 T]

PrA[A(D-Me) 2 T]≤ eε ≈ 1+ε e-ε ≤

Size of ε

How small can ε be?• Cannot be negligibleWhy?• Hybrid argument

How large can it be?• Think of a small constant

D, D’ – totally unrelated databasesUtility should be very different

Consider sequence

D0=D, D1, D2, …, Dn =D’where Di and Di+1 adjacent db.

For each output set T

Prob[T|D] ¸ Prob[T|D’] ¢ eεn

Answering a single counting query

U set of (name,tag2 {0,1}) tuplesOne counting query: #of participants with tag=1

Sanitizer A: output #of 1’s + noiseDifferentially private! If choose noise properly

Choose noise from Laplace distribution

0 1 2 3 4 5-1-2-3-4

Laplacian Noise

Laplace distribution Y=Lap(b) has density function Pr[Y=y] =1/2b e-|y|/b

Standard deviation: O(b)Take b=1/ε, get that Pr[Y=y] Ç e-|y|

Laplacian Noise: ε-Privacy

Take b=1/ε, get that Pr[Y=y] Ç e-|y|

Release: q(D) + Lap(1/ε)

For adjacent D,D’: |q(D) – q(D’)| ≤ 1

For output a: e- ≤ Prby D[a]/Prby D’[a] ≤ e

0 1 2 3 4 5-1-2-3-4

0 1 2 3 4 5-1-2-3-4

Laplacian Noise: Õ(1/ε)-Error

Take b=1/ε, get that Pr[Y=y] Ç e-|y|

Pry~Y[|y| > k·1/ε] = O(e-k)

Expected error is 1/ε, w.h.p error is Õ(1/ε)

Randomized Response• Randomized Response Technique [Warner 1965]

– Method for polling stigmatizing questions– Idea: Lie with known probability.

• Specific answers are deniable• Aggregate results are still valid

• The data is never stored “in the plain”

1

noise+

0

noise+

1

noise+

…

“trust no-one”

Popular in DB literature

Randomized Response with Laplacian Noise

Initial idea: each user i, on input xi 2 {0, 1}Add to xi independent Laplace noise with

magnitude 1/ε

Privacy: since each increment protected by Laplace noise – differentially private whether xi is 0 or 1

Accuracy: noise cancels out, error Õ(√T)

Is it too high?

T – total number of users

0 1 2 3 4 5-1-2-3-4

Scaling Noise to Sensitivity

Global sensitivity of query q:Un→RGSq = maxD,D’ |q(D) – q(D’)|

For a counting query q: GSq=1

Previous argument generalizes:For any query q:Un→ R

release q(D) + Lap(GSq/ε)• ε-private• error Õ(GSq/ε)

[0,n]

Scaling Noise to SensitivityMany dimensionsGlobal sensitivity of query q:Un→Rd

GSq = maxD,D’ ||q(D) – q(D’)||1

Previous argument generalizes:For any query q:Un→ Rd

release q(D) + (Y1, Y2, … Yd)

– Each Yi independent Lap(GSq/ε)

• ε-private• error Õ(GSq/ε)

Example: Histograms

• Say x1, x2, ..., xn in domain U• Partition U into d disjoint bins• q(x1, x2, ..., xn) = (n1, n2, ..., nd) where

nj = #{i : xi in j-th bin}

• GSq =2

• Sufficient to add Lap(2/ε) noise to each countProblem: might not look like a histogram

Covariance Matrix

• Suppose each person’s data is a real vector (r1, r2, ..., rn )

• • Database is a matrix X• The covariance matrix of X is• (roughly) the matrix• Entries measure correlation between attributes• First step of many analyses, e.g. PCA

Distance to DP with Property

• Suppose P = set of “good” databases– well-clustered databases

• Distance to P = # points in x that must be changed to put x in P

• Always has GS = 1

• Example:– Distance to data set with “good clustering”

P

x

K Means

• A clustering algorithm with iteration• Always keeping k centers

MedianMedian of x1, x2, ..., xn 2 [0,1]

• X= 0,…,0,0,1,…,1 X’= 0,…,0,1,1,…,1

median(X) = 0 median(X’) = 1• GSmedian = 1• Noise magnitude: 1 . Too much noise!• But for “most” neighbor databases X, X’

|median(X) − median(X’)|is small.Can we add less noise on ”good” instances?

(n-1)/2 (n-1)/2 (n-1)/2 (n-1)/2

Global Sensitivity vs. Local sensitivity

• Global sensitivity is worst case over inputsLocal sensitivity of query q at point DLSq(D)= maxD’ |q(D) – q(D’)|

• Reminder: GSq(D) = maxD LSq(D)

• Goal: add less noise when local sensitivity is lower

• Problem: can leak information by amount of noise

Local sensitivity of Median

• For X = x1, x2, ..., xn

• LSmedian(X) = max(xm − xm−1, xm+1 − xm)

x1, x2, ..., xm-1, xm, xm+1, ..., xn

Sensitivity of Local Sensitivity of Median

Median of x1, x2, ..., xn 2 [0,1]

• X= 0,…,0,0,0,0,1,…,1 X’= 0,…,0,0,0,1,1,…,1

LS(X) = 0 LS(X’) = 1

Noise magnitude must be an insensitive function!

(n-3)/2 (n-3)/2 (n-3)/2 (n-3)/2

Smooth Upper Bound

• Compute a “smoothed” version of local sensitivity• Design sensitivity function S(X)

• S(X) is an -smooth upper bound on LSf(X) if:– for all x: S(X) ¸ LSf(X)

– for all neighbors X, X’ : S(X) · eS(X’)

• Theorem: if A(x) = f(x) + noise(S(x)/ε) then A is 2ε-differentially private.

Smooth sensitivity

• Smooth sensitivity Sf*(X)= maxY {LSf(Y)e- dist(x,y) }

Claim: if S(X) is an -smooth upper bound on LSf(X) for Smooth sensitivity

The Exponential Mechanism McSherry Talwar

A general mechanism that yields • Differential privacy• May yield utility/approximation• Is defined (and evaluated) by considering all possible answers

The definition does not yield an efficient way of evaluating it

Application: Approximate truthfulness of auctions• Collusion resistance• Compatibility

Example of the Exponential Mechanism

• Data: xi = website visited by student i today• • Range: Y = {website names}• For each name y, let q(y; X) = #{i : xi = y}

Goal: output the most frequently visited site• Procedure: Given X, Output website y with probability prop

to e q(y,X) • • Popular sites exponentially more likely than rare ones• Website scores don’t change too quickly70

Projects

Report on a paper• Apply a notion studied to

some known domain• Checking the state of

privacy is some setting

• Privacy in GWAS• Privacy in crowd sourcing• Privacy Preserving Wordle• Unique identification

bounds• How much worse are

differential privacy guarantees in estimation

• Contextual Privacy

Planned Topics

Privacy of Data Analysis• Differential Privacy

– Definition and Properties– Statistical databases– Dynamic data

• Privacy of learning algorithms

• Privacy of genomic data

Interaction with cryptography• SFE• Voting• Entropic Security• Data Structures• Everlasting Security• Privacy Enhancing Tech.

– Mixed nets

Course InformationFoundation of Privacy - Spring 2010

Instructor: Moni NaorWhen: Mondays, 11:00--13:00 (2 points)Where: Ziskind 1

• Course web page: www.wisdom.weizmann.ac.il/~naor/COURSE/foundations_of_privacy.html

• Prerequisites: familiarity with algorithms, data structures, probability theory, and linear algebra, at an undergraduate level; a basic course in computability is assumed.

• Requirements:– Participation in discussion in class

• Best: read the papers ahead of time– Homework: There will be several homework assignments

• Homework assignments should be turned in on time (usually two weeks after they are given)!

– Class Project and presentation– Exam : none planned

Office: Ziskind 248Phone: 3701E-mail: moni.naor@

Foundations of Privacy Lecture 3 Lecturer: Moni Naor.

Documents

Transcript of Foundations of Privacy Lecture 3 Lecturer: Moni Naor.