James Honaker & Salil Vadhan School of Engineering...

16
CS208: Applied Privacy for Data Science Reidentification & Reconstruction Attacks James Honaker & Salil Vadhan School of Engineering & Applied Sciences Harvard University February 4, 2019

Transcript of James Honaker & Salil Vadhan School of Engineering...

Page 1: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

CS208: Applied Privacy for Data ScienceReidentification & Reconstruction Attacks

James Honaker & Salil Vadhan

School of Engineering & Applied SciencesHarvard University

February 4, 2019

Page 2: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Cohen & NissimLinear Program Reconstruction in Practice

• Use queries of sums over random subsets toreconstruct individual data.

• Importantly, the members of the subset are reportedin each sum.

• Received the Aircloak Bounty ($5000) forreidentifying challenge data in the Diffix commercialsystem.

Page 3: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Regression Based Reconstruction

yi = β1x1,i + β2x2,i + . . .+ βNxN,i + εi

Here:

N is the Number of people in the databasei is query index

yi is i-th query releasexh,i is a {0, 1}-indicator of whether person h

was included in query iβh is h’s sensitive dataεi is the noise added to the i-th query

Page 4: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Regression Based Reconstruction

yi = β1x1,i + β2x2,i + . . .+ βNxN,i + εi

7 = 1·1 + 0·1 + 1·0 + 0·0 + . . .+ 0·1 + 24 = 1·0 + 0·1 + 1·1 + 0·1 + . . .+ 0·1 + (−1)6 = 1·0 + 0·0 + 1·0 + 0·1 + . . .+ 0·0 + 1

Here:

N is the Number of people in the databasei is query index

yi is i-th query releasexh,i is a {0, 1}-indicator of whether person h

was included in query iβh is h’s sensitive dataεi is the noise added to the i-th query

Page 5: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Regression Based Reconstruction

yi = β1x1,i + β2x2,i + . . .+ βNxN,i + εi

7 = 1·1 + 0·1 + 1·0 + 0·0 + . . .+ 0·1 + 24 = 1·0 + 0·1 + 1·1 + 0·1 + . . .+ 0·1 + (−1)6 = 1·0 + 0·0 + 1·0 + 0·1 + . . .+ 0·0 + 1

Here:

N is the Number of people in the databasei is query index

yi is i-th query releasexh,i is a {0, 1}-indicator of whether person h

was included in query iβh is h’s sensitive dataεi is the noise added to the i-th query

Page 6: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Regression Based Reconstruction

Find β1, . . . , βn s.t.:

β = argmin[∑

i

(yi − yi)2]

where yi = β1x1,i + β2x2,i + . . .+ βNxN,i

In R see:lm()In Python see for example:linear_model.LinearRegression()from scikit-learn.

Page 7: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Example

From regressionAttack.r:

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Reconstruction of Latino Variable

estimate

sens

itive

val

ue

fraction ones correct: 1

fraction zeros correct: 1

−15 −10 −5 0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

Reconstruction of Latino Variable

estimate

sens

itive

val

ue

fraction ones correct: 0.53

fraction zeros correct: 0.38

Page 8: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Garfinkel et al.Understanding Database Reconstruction Attacks on Public Data

• Demonstrates feasibility of Database ReconstructionAttacks (DRAs) on small Census blocks by usingtheir released aggregate statistics.

• 1.5M census blocks with between 1 and 7 residents• Each released statistic provides a constraint, and

some blocks have only one possible dataset thatsatisfy all constraints.

Page 9: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Example Dataset

Page 10: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

SAT Solvers

SAT Solvers state the feasibility of a solution to a series oflogical formulae, and find a solution if one exists.

PicoSAT has R and Python bindings.In R:install.packages("rpicosat")In Python:pip install pycosat

Page 11: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Conjunctive Normal FormConjunctive Normal Form (CNF) is expressed asconjunctions of disjunctions, that is, clauses entirelycomposed of OR ∨, each of which are bound together byAND ∧.Negations of literals are allowed.

Examples:• (A ∨ B) ∧ (¬C ∨D)

• (A ∨ B) ∧ (C ∨D ∨ E ∨ F)• (A ∨ B) ∧ (C)• (A)

Construction:• A→ B is expressed as (¬A ∨ B)• A↔ B is expressed as (¬A ∨ B) ∧ (A ∨ ¬B)

Page 12: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Dataset

ActualSex Married

1 1 02 1 13 0 14 0 15 0 0

LabelsSex MarriedA FB GC HD IE J

Release:∑

Sex = 2,∑Married = 3,∑

Sex ·Married = 1.

Page 13: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Dataset

ActualSex Married

1 1 02 1 13 0 14 0 15 0 0

LabelsSex MarriedA FB GC HD IE J

Release:∑

Sex = 2,∑Married = 3,∑

Sex ·Married = 1.

Page 14: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Dataset

ActualSex Married

1 1 02 1 13 0 14 0 15 0 0

LabelsSex MarriedA FB GC HD IE J

Release:∑

Sex = 2,∑Married = 3,∑

Sex ·Married = 1.

Page 15: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Challenge

Transform into CNF:(A ∧ F) ∨ (B ∧ G) ∨ (C ∧H) ∨ (D ∧ I) ∨ (E ∧ J)

What about:(A ∧ F)⊕ (B ∧ G)⊕ (C ∧H)⊕ (D ∧ I)⊕ (E ∧ J)

(requires 24 clauses)

Page 16: James Honaker & Salil Vadhan School of Engineering ...people.seas.harvard.edu/~salil/cs208/spring19/reconstruction-practi… · Harvard University February 4, 2019. Cohen & Nissim

Challenge

Transform into CNF:(A ∧ F) ∨ (B ∧ G) ∨ (C ∧H) ∨ (D ∧ I) ∨ (E ∧ J)

What about:(A ∧ F)⊕ (B ∧ G)⊕ (C ∧H)⊕ (D ∧ I)⊕ (E ∧ J)

(requires 24 clauses)