LightDP: Towards Automating Differential Privacy Proofsdbz5017/pub/popl17-slides.pdfRelated Work DP...

25
LightDP: Towards Automating Differential Privacy Proofs Danfeng Zhang Daniel Kifer Penn State University

Transcript of LightDP: Towards Automating Differential Privacy Proofsdbz5017/pub/popl17-slides.pdfRelated Work DP...

  • LightDP: Towards Automating Differential Privacy Proofs

    Danfeng Zhang Daniel KiferPenn State University

  • 2

    Database w/Alice’s data

    Database w/oAlice’s data

    Alice’s data remain private if 𝜇", 𝜇$ are close

    𝜇" 𝜇$

  • (Pure) Differential Privacy

    3

    𝜇" 𝜇$If for any adjacent databases and value 𝑣, 𝜇"(𝑣)/𝜇$(𝑣) ≤ 𝑒+for some constant 𝜖, thena computation is 𝜖-private

    𝜇"(𝑣) 𝜇$(𝑣)

    PrivacyCost

  • Motivation

    4

    Rigorous methods are needed for differential privacy proofs

    DP has seen explosive growth since 2006•U.S. Census Bureau LEHD OnTheMap tool

    [Machanavajjhala et al. 2008]•Google Chrome Browser [Erlingsson et al. 2014]•Apple’s new data collection efforts [Greenberg 2016]

    But also accompanied with flawed (paper-and-pencil) proofs• e.g., ones categorized in [Chen&Machanavajjhala’15, Lyu et al.’16]

  • Related WorkDP programming platforms (e.g., PINQ, Airavat)•Use (instead of verify) basic DP mechanisms•Cannot offer tight bounds for sophisticated algorithms

    Methods based on customized logics• Steep learning curve• Heavy annotation burden

    5

    LightDP offers a better balance between expressiveness and usability

  • LightDP: Overview

    6

    SourceProgram

    Relational,DependentTypeSystem

    TargetProgramwithdistinguished variable

    Source program type checks

    Source program is 𝜖-private

    Main Theoremv+ bounded by constant 𝜖in the target program

  • Source Language: Syntax

    7

    Random variable

    Random Expression

    (e.g., Laplace dist.)

  • Source Language: SemanticsMemory: mapping from variables to values

    8

    Initial memory

    Final memory dist.

    Adjacent memory

    Final memory dist.

    RelationalReasoningviaTypeSystem

  • Relational Types

    9

    Related Memories𝑥:u 𝑥: u𝑦: v 𝑦: v+1

    ExampleΓ 𝑥 : num6Γ(𝑦): num"

    Base Type Distance

    e.g., int, real

  • Dependent Types

    10

    Can be a program variable

    Related Memories𝑥: u 𝑥: u𝑦: v 𝑦: v + u

    ExampleΓ 𝑥 : num6Γ(𝑦): num8

  • Dependent Types

    11

    Related Memories𝑥:u 𝑥: u

    𝑦: v 𝑦: 9v + 2, u ≥ 1v, u < 1

    Can be a non-prob. expression

    ExampleΓ 𝑥 : num6

    Γ(𝑦): num8>"?$:6

    𝑚"Γ𝑚$ if 𝑚" and𝑚$are related by Γ

    Notation

  • 12

    (for the non-probabilistic subset)Types form an invariant on two related program executions:

    Then after executinga well-typed program,final memories

    𝑚" 𝑚$

    𝑚"A 𝑚$A

    If initial memories Γ

    Γ

    Enforced by a type system

  • Type System

    13

    Expression:

    e.g.,

    +| −

    < | > | = | ≤ | ≥

  • Type System

    14

    Command:

    e.g.,

    Distance must be identical

    Related executions take same branch

  • Relating Two Distributions

    15

    𝜇" 𝜇$

    Program𝜂:= Lap 𝑟

    Laplace dist. w/ mean 0 and a scale factor 𝑟

    Γ 𝜂 = num6With no

    cost

    𝜇"Γ𝜇$ w.r.t.privacycost𝝐 if ∀𝑚.𝜇"(𝑚)/𝜇$(Γ(𝑚)) ≤ 𝑒+

  • 𝜂 mayhaveanarbitrarydistance,whichaffectstheaddedcost

    Observation

    Relating Two Distributions

    16

    𝜇"Γ𝜇$ w.r.t.privacycost𝜖 if ∀𝑚.𝜇"(𝑚)/𝜇$(Γ(𝑚)) ≤ 𝑒+

    𝜇" 𝜇$

    Program𝜂:= Lap 𝑟

    Laplace dist. w/ mean 0 and a scale factor 𝑟

    Γ 𝜂 = num"With cost 𝟏/𝒓 due to

    dist. property

  • 17

    𝜂has a polymorphic type

    source program target program, explicitly tracks added privacy cost

    Non-deterministic operation

    𝜇

    Intuitively, target program computesthe added cost for one sample fromdistribution

    𝜂 mayhaveanarbitrarydistance,whichaffectstheaddedcost

    Observation

  • In General

    18

    SourceprogramTargetprogramwithdistinguished variable

    TypeSystem

    source program target program

  • Target Language

    19

    Verification task in the target language:Proving is bounded by some constant 𝜖 in any execution

    (in a non-probabilistic program)

    A safety property. Can be verified using off-the-shelf tools

    (e.g., Hoare logic, model checking)

    set x to arbitrary value

  • Putting TogetherThe Sparse Vector Method [Dwork and Roth’14]

    20

    Source Program•Correctness proof is subtle

    Incorrect variants categorized in [Chen&Machanavajjhala’15, Lyu et al.’16]

    •Formally verified very recently [Barthe et al. 2016] with heavy annotation burden

  • Required Types

    21

    Distance depends on the value of 𝑖th query

    answer (𝑞[𝑖])

    Types can be inferred by the inference algorithm of LightDP

    Type Inference

  • Target Program

    22

  • Completing the Proof

    23

    Loop Invariant

    Postcondition:

    Sourceprogramtypechecks+boundedbyconstant𝜖=sourceprogramis𝜖-private

    Main Theorem

  • More in the PaperType inference algorithm

    Searching for proof with minimum cost w/ MaxSMT

    Formal proof for the main theorem

    More verified examples (with little manual efforts)

    24

  • Summary

    25

    Automated by inference engine

    A safety property (verified by

    existing tools)

    SourceProgram

    Relational,DependentTypeSystem

    TargetProgramwithdistinguished variable

    Decomposing differential privacy into subtasks substantially simplifies language-based proof