Games for Exchanging Information Gillat Kol Joint work with Moni Naor.
Foundations of Privacy Lecture 4 Lecturer: Moni Naor.
-
Upload
abraham-mccormick -
Category
Documents
-
view
223 -
download
0
Transcript of Foundations of Privacy Lecture 4 Lecturer: Moni Naor.
Recap of last week’s lecture• Differential Privacy• Sensitivity:
– Global sensitivity of query q:Un→Rd
GSq = maxD,D’ ||q(D) – q(D’)||1
– Local sensitivity of query q at point DLSq(D)= maxD’ |q(D) – q(D’)|
– Smooth sensitivity
Sf*(X)= maxY {LSf(Y)e- dist(x,y) }
• Histograms• Differential privacy of median
• Exponential Mechanism
Histograms• Inputs x1, x2, ..., xn in domain U
Domain U partitioned into d disjoint bins S1,…,Sd
q(x1, x2, ..., xn) = (n1, n2, ..., nd) where
nj = #{i : xi in j-th bin}
Can view as d queries: qi counts # spoints in set Si
For adjacent D,D’, only one answer can change - it can change by 1
Global sensitivity of answer vector is 1Sufficient to add Lap(1/ε) noise to each
query, still get ε-privacy
The Exponential Mechanism [McSherry Talwar]
A general mechanism that yields • Differential privacy• May yield utility/approximation• Is defined and evaluated by considering all possible answers
The definition does not yield an efficient way of evaluating it
Application/original motivation: Approximate truthfulness of auctions
• Collusion resistance• Compatibility
Side bar: Digital Goods Auction
• Some product with 0 cost of production• n individuals with valuation v1, v2, … vn
• Auctioneer wants to maximize profit
Example of the Exponential Mechanism• Data: xi = website visited by student i today• Range: Y = {website names}• For each name y, let q(y, X) = #{i : xi = y}
Goal: output the most frequently visited site• Procedure: Given X, Output website y with probability
prop to eq(y,X)
• Popular sites exponentially more likely than rare ones Website scores don’t change too quickly
Size of subset
Setting• For input D 2 UUnn want to find r2RR• Base measure on R - R - usually uniform• Score function q’: UUn n ££ R R R
assigns any pair (D,r) a real value– Want to maximize it (approximately)
The exponential mechanism– Assign output r2RR with probability proportional to
eq’(D,r) (r)
Normalizing factor r eq’(D,r) (r)
The exponential mechanism is private
• Let = maxD,D’,r |q(D,r)-q(D’,r)|
Claim: The exponential mechanism yields a 2¢¢ differentially private solution
• Prob [output = r on input D]
= eq’(D,r) (r)/r eq’(D,r) (r)• Prob [output = r on input D’]
= eq’(D’,r) (r)/r eq’(D’,r) (r)
adjacent
Ratio isbounded by
e e
Laplace Noise as Exponential Mechanism
• On query q:Un→R let q’(D,r) = -|q(D)-r|
• Prob noise = y e-y / 2 y e-y = /2 e-y
Laplace distribution Y=Lap(b) has density function
Pr[Y=y] =1/2b e-|y|/b
y
0 1 2 3 4 5-1-2-3-4
Any Differentially Private Mechanism is an instance of the Exponential Mechanism
• Let M be a differentially private mechanism
Take q’(D,r) to be log Prob[M(D) =r]
Remaining issue: Accuracy
Private Ranking• Each element i 2 {1, … n} has a real valued score
SD(i) based on a data set D.• Goal: Output k elements with highest scores.• Privacy• Data set D consists of n entries in domain D.
– Differential privacy: Protects privacy of entries in D.
• Condition: Insensitive Scores– for any element i, for any data sets D, D’ that differ in one
entry:|SD(i)- SD’(i)| · 1
Approximate ranking
• Let Sk be the kth highest score based on data set D.• An output list is -useful if:
Soundness: No element in the output has score less than Sk -
Completeness: Every element with score greater than Sk + is in the output.
Score · Sk -
Sk + · Score
Sk - · Score · Sk +
Two Approaches• Score perturbation
– Perturb the scores of the elements with noise – Pick the top k elements in terms of noisy scores.– Fast and simple implementation Question: what sort of noise should be added?
What sort of guarantees?
• Exponential sampling– Run the exponential mechanism k times.– more complicated and slower implementationWhat sort of guarantees?
Each input affects all scores
Homework
Exponential Mechanism: Simple Example (almost free) private lunch
Database of n individuals, lunch options {1…k},each individual likes or dislikes each option (1 or 0)
Goal: output a lunch option that many likeFor each lunch option j2 [k], ℓ(j) is # of ind. who like jExponential Mechanism:
Output j with probability eεℓ(j)
Actual probability: eεℓ(j)/(∑i eεℓ(i))
Normalizer
query 1,query 2,. . .
Synthetic DB: Output is a DB
Database
answer 1answer 3
answer 2
?
Sanitizer
Synthetic DB: output also a DB (of entries from same universe X), user reconstructs answers by evaluating query on output DB
Software and people compatibleConsistent answers
Answering More QueriesUsing exponential mechanism
Differential Privacy for every set C of counting queries• Error is Õ(n2/3 log|C|)
Remarkable
Hope for rich private analysis of small DBs!• Quantitative: #queries >> DB size,
• Qualitative: output of sanitizer -synthetic DB-output is a DB itself
Counting Queries
• Queries with low sensitivity
Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?
Relaxed accuracy: answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis
Assume all queries given in advance
U
Database D of size n
Query c
Non-interactive
Utility and Privacy Can’t Always Be Achieved Simultaneously
Impossibility results for counting queries:DB with n participants can’t have o(√n) error, O(n) queries
[DiNi, DwMcTa07,DwYe08]In all these cases, strong privacy violation
What can we do? almost entire DB compromised
Huge DBs [Dwork Nissim]
DB of size n >> # queries |C|:
Add independent noise to answer on every query
Noise per query ~ #queriesFor accuracy, need #queries ≤ n
May be reasonable for hugehuge internet-scale DBs,Privacy “for free”
What about smaller DBs?
DB of size n < #queries |C|, impossibility results:
can’t have o(√n) error
Error must be Ω(√n)
The BLR Algorithm
For DBs F and Ddist(F,D) = maxq2C |q(F) – q(D)|
Intuition: far away DBs get smaller probability
Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)
DB F gets picked w.p. / e-ε·dist(F,D)
Blum Ligett Roth08
The BLR Algorithm
Idea:• In general: Do not use large DB
– Sample and answer accordingly• DB of size m guaranteeing hitting each query with
sufficient accuracy
The BLR Algorithm: 2ε-Privacy
For adjacent D,D’ for every F|dist(F,D) – dist(F,D’)| ≤ 1
Probability of F by D: e-ε·dist(F,D)/∑G of size m e-ε·dist(G,D)
Probability of F by D’:numerator and denominator can change by eε-factor 2ε-privacy
Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)
DB F gets picked w.p. / e-ε·dist(F,D)
The BLR Algorithm: Error Õ(n2/3 log|C|)
There exists Fgood of size m =Õ((n\α)2·log|C|) s.t. dist(Fgood,D) ≤ α
Pr [Fgood] ~ e-εα
For any Fbad with dist 2α, Pr [Fbad] ~ e-2εα
Union bound: ∑ bad DB Fbad Pr [Fbad] ~ |U|me-2εα
For α=Õ(n2/3log|C|), Pr [Fgood] >> ∑ Pr [Fbad]
Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)
DB F gets picked w.p. / e-ε·dist(F,D)
The BLR Algorithm: Running Time
Generating the distribution by enumeration:Need to enumerate every size-m database,where m = Õ((n\α)2·log|C|)
Running time ≈ |U|Õ((n\α)2·log|c|)
Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)
DB F gets picked w.p. / e-ε·dist(F,D)
Conclusion
Offline algorithm, 2ε-Differential Privacy for anyset C of counting queries
• Error α is Õ(n2/3 log|C|/ε)
• Super-poly running time: |U|Õ((n\α)2·log|C|)
Can we Efficiently Sanitize?
The good news
If the universe is small, Can sanitize EFFICIENTLY
The bad news cannot do much better, namely sanitize in time:
sub-poly(|C|) AND sub-poly(|U|)
Time poly(|C|,|U|)
The Good News: Can Sanitize When Universe is Small
Efficient Sanitizer for query set C• DB size n ¸ Õ(|C|o(1) log|U|)• error is ~ n2/3 • Runtime poly(|C|,|U|)
Output is a synthetic database
Compare to [Blum Ligget Roth]:
n ¸ Õ(log|C| log|U|), runtime super-poly(|C|,|U|)
Recursive Algorithm
C0=C C1 C2 Cb
Start with DB D and large query set CRepeatedly choose random subset Ci+1 of Ci:
shrink query set by (small) factor
Recursive Algorithm
Start with DB D and large query set CRepeatedly choose random subset Ci+1 of Ci:
shrink query set by (small) factorEnd recursion: sanitize D w.r.t. small query set Cb
Output is good for all queries in small set Ci+1
Extract utility on almost-all queries in large set Ci
Fix remaining “underprivileged” queries in large set Ci
C0=C C1 C2 Cb