Sergey Yekhanin Institute for Advanced Study

14
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise

description

Lower Bounds on Noise. Sergey Yekhanin Institute for Advanced Study. Setting. Database of information about individuals E.g. Medical history, Census data, Customer info. Need to guarantee confidentiality of individual entries - PowerPoint PPT Presentation

Transcript of Sergey Yekhanin Institute for Advanced Study

Page 1: Sergey Yekhanin Institute for Advanced Study

Sergey Yekhanin

Institute for Advanced Study

Lower Bounds on Noise

Page 2: Sergey Yekhanin Institute for Advanced Study

Database of information about individualsE.g. Medical history, Census data, Customer

info.Need to guarantee confidentiality of individual

entries

Want to make deductions about the database; learn large scale trends.E.g. Learn that a drug V increases likelihood of

heart diseaseDo not leak info about individual patients

Setting

Page 3: Sergey Yekhanin Institute for Advanced Study

Two approaches to database privacy:Interactive: Analyst asks questions; curator

returns approximate answers

Curator Analyst

Message

Page 4: Sergey Yekhanin Institute for Advanced Study

Two approaches to database privacy:Interactive: Analyst asks questions; curator

returns approximate answersNon-interactive: Publish a “summary” of the

database; analyst can use summary to get answers

Curator AnalystSummary

Message

Page 5: Sergey Yekhanin Institute for Advanced Study

Two approaches to database privacy:Interactive: Analyst asks questions; curator returns

approximate answersNon-interactive: Publish a “summary” of the

database; analyst can use summary to get answers

Thesis: The interactive approach is the right way to give good accuracy for a given level of privacyAny non-interactive solution permitting “too

accurate” answers to “too many” questions leaks private information.

Message

Page 6: Sergey Yekhanin Institute for Advanced Study

Mathematical model of database and queries

AttacksSomewhat accurate answers to all queries lead to

privacy leakage. (Fourier analysis) [Y] (extends [DiNi]).

Somewhat accurate answers to a fraction of queries lead to privacy leakage. (Linear programming / Polynomial interpolation) [DMT,DY]

Study of privacy leads to a variety of mathematical challenges!

Plan

Page 7: Sergey Yekhanin Institute for Advanced Study

[Dinur-Nissim] Simple Model (easily justifiable)Database: n-bit binary vector xQuery: vector aTrue answer: Dot product axResponse is ax + e = True Answer + Noise

Privacy Leakage: Attacker learns a certain bit of x.

Blatant Non-Privacy: Attacker learns n−o(n) bits of x.

Model

Page 8: Sergey Yekhanin Institute for Advanced Study

Theorem: If a curator adds o(√n) noise to every response; then an attacker can ask n questions, perform O(n log n) computation and recover n-o(n) bits of the database.

Put database records in one-to-one correspondence with elements of a group .

Think of a database as a function D from to {0,1}.

Choose queries to ask for Fourier coefficients of D.Noisy Fourier coefficients approximately determine

the Boolean function D! (Parseval identity).

Fourier attack

kZ2kZ2

Page 9: Sergey Yekhanin Institute for Advanced Study

Theorem: If a curator adds o(√n) noise to 0.773 fraction of responses; then an attacker can ask O(n) questions, perform O(n3) computation and recover n-o(n) bits of the database.

Arbitrarily large error on arbitrary and unknown 0.239 fraction on answers.

Linear programming attack

Page 10: Sergey Yekhanin Institute for Advanced Study

Ask O(n) random +1/-1 questions Obtain y=Ax+e, where e is the error vector A natural approach to recover x from y: Solve: min |e'|0 such that y=Ax'+e‘, x' in Rn

(hard!)

Solve a linear program [D, CT, MT]: min |e'|1

such that y=Ax'+e' x' in Rn

Ax'

y

Linear programming attack

Page 11: Sergey Yekhanin Institute for Advanced Study

Model: Questions have O(c) large coefficients

Theorem: If a curator adds o(c) noise to 0.501 fraction of responses; then an attacker can ask c questions, perform O(c4) computation and reliably recover any particular bit of the database.

Arbitrarily large error on arbitrary and unknown 0.499 fraction on answers.

Polynomial interpolation attack

Page 12: Sergey Yekhanin Institute for Advanced Study

Assume c is prime.Think of the space of queries as a linear space . To obtain a reliable answer to query x = (1,0, … , 0) , draw a degree two curve through x. Ask all queries that correspond to points on the

curve.Use polynomial interpolation to carefully combine the

answers.

xq1

q2 q3

q4

q5 q6

Polynomial interpolation attackncF

Page 13: Sergey Yekhanin Institute for Advanced Study

Privacy has a PriceThere is no safe way to avoid increasing the

noise as the number of queries increases

Applies to Non-Interactive SettingAny non-interactive solution permitting answers

that are “too accurate” to “too many” questions is vulnerable to attack.

Cannot just output a noisy table.

Implications

Page 14: Sergey Yekhanin Institute for Advanced Study

Non-interactive approach has inherent limitations

Interactive approach worksCan also publish a summary, as long as its clear

which stats are accurate, and which ones are not.

Future directions:Fewer queriesUnderstand what can and what cannot be done

privately