Post-processing outputs for better utility

Post on 06-Jan-2016

23 views 0 download

Tags:

description

Post-processing outputs for better utility. CompSci 590.03 Instructor: Ashwin Machanavajjhala. Announcement. Project proposal submission deadline is Fri, Oct 12 noon. Recap: Differential Privacy. For every pair of inputs that differ in one value. For every output …. D 1. D 2. O. - PowerPoint PPT Presentation

Transcript of Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 1

Post-processing outputs for better utility

CompSci 590.03Instructor: Ashwin Machanavajjhala

Lecture 10 : 590.03 Fall 12 2

Announcement

• Project proposal submission deadline is Fri, Oct 12 noon.

Lecture 10 : 590.03 Fall 12 3

Recap: Differential Privacy

For every output …

OD2D1

Adversary should not be able to distinguish between any D1 and D2 based on any O

Pr[A(D1) = O] Pr[A(D2) = O] .

For every pair of inputs that differ in one value

< ε (ε>0)log

Lecture 10 : 590.03 Fall 12 4

Recap: Laplacian Distribution

-10-4.300000000000021.4 7.09999999999990

0.10.20.30.40.50.6

Laplace Distribution – Lap(λ)

Database

Researcher

Query q

True answer q(d) q(d) + η

η

h(η) α exp(-η / λ)

Privacy depends on the λ parameter

Mean: 0, Variance: 2 λ2

Lecture 10 : 590.03 Fall 12 5

Recap: Laplace MechanismThm: If sensitivity of the query is S, then the following guarantees ε-

differential privacy.

λ = S/εSensitivity: Smallest number s.t. for any d, d’ differing in one entry,

|| q(d) – q(d’) || ≤ S(q)

Histogram query: Sensitivity = 2• Variance / error on each entry = 2x4/ε2 = O(1/ε2)

Lecture 10 : 590.03 Fall 12 6

This class• What is the optimal method to answer a batch of queries?

Lecture 10 : 590.03 Fall 12 7

How to answer a batch of queries?• Database of values {x1, x2, …, xk}

• Query Set: – Value of x1 η1 = x1 + δ1– Value of x2 η2 = x2 + δ2– Value of x1 + x2 η3 = x1 + x2 + δ3

• But we know that η1 and η2 should sum up to η3!

Lecture 10 : 590.03 Fall 12 8

Two Approaches• Constrained inference

– Ensure that the returned answers are consistent with each other.

• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A

– Universal Histograms– Wavelet Mechanism– Matrix Mechanism

Lecture 10 : 590.03 Fall 12 9

Two Approaches• Constrained inference

– Ensure that the returned answers are consistent with each other.

• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A

– Universal Histograms– Wavelet Mechanism– Matrix Mechanism

Lecture 10 : 590.03 Fall 12 10

Constrained Inference

Lecture 10 : 590.03 Fall 12 11

Constrained Inference• Let x1 and x2 be the original values. We observe noisy values η1,

η2 and η3• We would like to reconstruct the best estimators y1 (for x1/) and

y2 (for x2) from the noisy values.

• That is, we want to find the values of y1, y2 such that:

min (y1-η1)2 + (y2 – η2)2 + (y3 – η3)2

s.t., y1 + y2 = y3

Lecture 10 : 590.03 Fall 12 12

Constrained Inference [Hay et al VLDB 10]

Lecture 10 : 590.03 Fall 12 13

Sorted Unattributed Histograms• Counts of diseases

– (without associating a particular count to the corresponding disease)

• Degree sequence: List of node degrees – (without associating a degree to a particular node)

• Constraint: The values are sorted

Lecture 10 : 590.03 Fall 12 14

Sorted Unattributed HistogramsTrue Values 20, 10, 8, 8, 8, 5, 3, 2Noisy Values 25, 9, 13, 7, 10, 6, 3, 1 (noise from Lap(1/ε))

Proof:?

Lecture 10 : 590.03 Fall 12 15

Sorted Unattributed Histograms

Lecture 10 : 590.03 Fall 12 16

Sorted Unattributed Histograms• n: number of values in the histogram• d: number of distinct values in the histogram• ni: number of times ith distinct value appears in the histogram.

Lecture 10 : 590.03 Fall 12 17

Two Approaches• Constrained inference

– Ensure that the returned answers are consistent with each other.

• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A

– Universal Histograms– Wavelet Mechanism– Matrix Mechanism

Lecture 10 : 590.03 Fall 12 18

Query Strategy

IPrivate

Data

WA

Differential Privacy

A(I) A(I) W(I)~ ~

Original Query Workload

Strategy Query Workload

Noisy StrategyAnswers

Noisy WorkloadAnswers

Lecture 10 : 590.03 Fall 12 19

Range Queries• Given a set of values {x1, x2, …, xn}• Range query: q(j,k) = xj + … + xk

Q: Suppose we want to answer all range queries?

Strategy 1: Answer all range queries using Laplace mechanism

• O(n2/ε2) total error. • May reduce using constrained optimization …

Lecture 10 : 590.03 Fall 12 20

Range Queries• Given a set of values {x1, x2, …, xn}• Range query: q(j,k) = xj + … + xk

Q: Suppose we want to answer all range queries?

Strategy 1: Answer all range queries using Laplace mechanism

• Sensitivity = O(n2)• O(n4/ε2) total error across all range queries. • May reduce using constrained optimization …

Lecture 10 : 590.03 Fall 12 21

Range Queries• Given a set of values {x1, x2, …, xn}• Range query: q(j,k) = xj + … + xk

Q: Suppose we want to answer all range queries?

Strategy 2: Answer all xi queries using Laplace mechanism Answer range queries using noisy xi values.

• O(1/ε2) error for each xi. • Error(q(1,n)) = O(n/ε2)• Total error on all range queries : O(n3/ε2)

Lecture 10 : 590.03 Fall 12 22

Universal Histograms for Range Queries

Strategy 3: Answer sufficient statistics using Laplace mechanismAnswer range queries using noisy sufficient statistics.

x1 x2 x3 x4 x5 x6 x7 x8

x12 x34 x56 x78

x1234 x5678

x1-8

[Hay et al VLDB 2010]

Lecture 10 : 590.03 Fall 12 23

Universal Histograms for Range Queries• Sensitivity: log n• q(2,6) = x2+x3+x4+x5+x6 Error = 2 x 5log2n/ε2

= x2 + x34 + x56 Error = 2 x 3log2n/ε2

x1 x2 x3 x4 x5 x6 x7 x8

x12 x34 x56 x78

x1234 x5678

x1-8

Lecture 10 : 590.03 Fall 12 24

Universal Histograms for Range Queries• Every range query can be answered by summing at most log n

different noisy answers• Maximum error on any range query = O(log3n / ε2)• Total error on all range queries = O(n2 log3n / ε2)

x1 x2 x3 x4 x5 x6 x7 x8

x12 x34 x56 x78

x1234 x5678

x1-8

Lecture 10 : 590.03 Fall 12 25

Universal Histograms & Constrained Inference

• Can further reduce the error by enforcing constraintsx1234 = x12 + x34 = x1 + x2 + x3 + x4

• 2-pass algorithm to compute a consistent version of the counts

[Hay et al VLDB 2010]

Lecture 10 : 590.03 Fall 12 26

Universal Histograms & Constrained Inference

• Pass 1: (Bottom Up)

• Pass 2: (Top down)

[Hay et al VLDB 2010]

Lecture 10 : 590.03 Fall 12 27

Universal Histograms & Constrained Inference

• Resulting consistent counts – Have lower error than noisy counts (upto 10 times smaller in some cases)– Unbiased estimators – Have the least error amongst all unbiased estimators

Lecture 10 : 590.03 Fall 12 28

Next Class• Constrained inference

– Ensure that the returned answers are consistent with each other.

• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A

– Universal Histograms– Wavelet Mechanism– Matrix Mechanism