A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of...

22
A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington 1 3/19/2013 ICDT 2013

Transcript of A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of...

Page 1: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 1

A Propagation Model for

Provenance Views of

Public/Private Workflows

Susan Davidson U. of PennsylvaniaTova Milo Tel Aviv U.

Sudeepa Roy U. of Washington

3/19/2013

Page 2: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 2

A Propagation Model for

Provenance Views of

Public/Private Workflows

Susan Davidson U. of PennsylvaniaTova Milo Tel Aviv U.

Sudeepa Roy U. of Washington

3/19/2013

Page 3: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 3

Visual representation of a number of processes that interact to produce one or more outputs given some inputs

Modeled as a directed acyclic graph

In an execution of the workflow, data values appear on the edges

3/19/2013

Workflows

Vertices = Modules/Processes

Edges = Dataflow

start

Split Entries

Align Sequences

Functional Data

Curate Annotations

Format

Format

Format

Construct Trees

end

d1

d2

d3<x1, x2, x3>

<y1, y2>

<z1>

Page 4: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 4

Which processes were executed?

3/19/2013

Data Provenance in Workflows

Track Provenance: Record and show all data values in all executions• Helps validate the experiment• Ensures repeatability and debugging

But, many private/proprietary elements …Our focus: Module Privacy

?

Run 1 Run 2 . . .

d1 <x1, x2, x3> <u1, u2, u3> . . .

d2 <y1, y2> <v1, v2> . . .

d3 <z1> <w1> . . .

. . .

start

Split Entries

Align Sequences

Functional Data

Curate Annotations

Format

Format

Format

Construct Trees

end

How has this tree been generated?

Provenanced1

d2

d3

Page 5: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 5

Motivation: Module Privacy

3/19/2013

Revealing all data as provenance in an execution can reveal module behavior

Goal: Partially hide provenanceto protect the privacy of modules when they belong to a workflow

d1 d2 d3

<x1, x2, x3> <y1, y2> <z1>

Page 6: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 6

Public/Private WorkflowsPrivate Modules (no a priori knowledge to the user)

e.g. Modules for gene sequencing, drug synthesis, etc.

Public Modules (full knowledge to the user)e.g. Modules for reformatting, sorting, display, etc.

3/19/2013

Private

Reformatting

Public

Public

Private

Page 7: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 7

Module f takes input x, produces output y = f(x)

3/19/2013

Module f

x1 x2 x3 x4

y1y2 y3

f(x1, x2, x3, x4) = <y1, y2, y3>

Given privacy requirement L, for all inputs x to a private module f,

f(x) has ≥ L ‘equivalent’ candidate values w.r.t. visible provenance data

(similar to L-diversity [MKGV’07])

Definition: Module Privacy

Page 8: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 8

‘Equivalent Candidates’ and Provenance Views

3/19/2013

=

x y

0 0

1 1

x

y

z

x y

0 0

1 0

x y

0 1

1 1

x y

0 1

1 0

y z

0 1

1 0

y z

0 0

1 1

Output a provenance view (incomplete provenance): Projection on visible attributes

Possible Worlds: Same projection and respect the functional dependency

Standalone-private View: Each input maps to L=2 different outputs by possible worlds

Workflow-private View: Possible worlds should respect all func. dep

Possible worlds

Func. dep. x y

Func. dep. y z

y y z

0 1

1 1

Not a possible world

x y z

0 0 1

1 1 0

x y y z

Workflow Module executions as relations with func. dep.

Run 1Run 2

Run 1Run 2

Page 9: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 93/19/2013

“Composability Theorem”

if all modules are private (no public modules)

Any combination of standalone-private-views

gives

workflow-private-views for all of them

Previous Work: Module Privacy for Workflow Provenance

[Davidson–Khanna-Milo-Panigrahi-R. : PODS’11]

Hiding union of hidden attributes

in standalone solutions

x y

0 0

1 1

z

y z

0 1

1 0

x y z

0 0 1

1 1 0y

x

No public modules

Page 10: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 10

Why care about composability?• Compose local standalone-private solutions arbitrarily to get a global workflow-private solution (which is hard)

• Local solutions are NP-hard too, but in the #attributes of a single module – smaller than all attributes in a workflow

• We can do preprocessing, or exploit module designers’ knowledge

• But composability fails with public modules common in workflows

3/19/2013

Page 11: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 11

=

Problem with Public Modules

3/19/2013

This work: Propagate hiding through public modules

Public

Private Composability theorem does not hold any more

Our solution in [DKMPR ’11]: “Privatize” some public modules

Does not work when module’s identity can be guessed from attribute names, connections etc.

0

1

1

Page 12: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013

This paper: A Propagation Model

• Find standalone-private solution for private modules (only outputs are hidden, hiding inputs may not work in public/private workflows)

• In a workflow, propagate hiding attributes through public successors

• Repeatedly propagate hiding

• Can we stop at a private successor?– Yes: For single-predecessor workflows– No: For general workflows 12

=

=

3/19/2013

Page 13: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013

Single-Predecessor Workflows• (Intuitively) Every public module has at most one private predecessor

• Still can have complex structure

• Special cases: Chains/Trees

• Propagate hiding in “public closure” (reachable through undirected public path from a hidden output attribute)

• Next, how much to hide133/19/2013

Page 14: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 14

Upstream/Downstream Safety for Public Modules

• Visible attributes of public modules should not reveal any information

• Upstream/Downstream-safe (UD-safe):Equivalent inputs Equivalent outputsEquivalent outputs (all) Equivalent inputs

Hiding everything is trivially UD-safe

3/19/2013

a11 a2 a3 a4

0 0 1 0

0 1 1 0

1 0 0 1

1 1 0 1

a2a1

a3 a4

UD-safe Not UD-safe

Inputs Outputs

Page 15: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 15

Composability Theorem forSingle-Predecessor Workflows

3/19/2013

Theorem: Each private module is workflow-privateif the hidden attributes satisfy …

1. The private module is standalone-private

2. Public modules in public-closure are UD-safe

3. No unnecessary hiding

Two levels of composability1. Inside public closure for a given private module2. Among different private modules

Single-pred wf, UD-safety are necessary

Page 16: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 16

Optimal Composition for Single-Predecessor workflows

3/19/2013

Theorem: Each private module is workflow-privateif the hidden attributes satisfy …

1. The private module is standalone-private

2. Public modules in public-closure are UD-safe

3. No unnecessary hiding

Find list of standalone-private solutions for private modules

Find list of UD-safesolutions for public modules

Optimally compose to find solution for a single private module

Arbitrarily compose to find solution for all private modulesEasy for single-pred wfs

• NP-hard for general DAG• PTIME for trees/chains

Page 17: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 17

Proof Sketch of Composability Theorem - 1

3/19/2013

Step 1: Assume only one composite modulein public closure

If individual modules are UD-safe, the composite module is also UD-safe(by induction)

Analysis for a single-private module is sufficient:Public closures are disjoint

Page 18: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 18

Step 2: Standalone to Workflow Privacy

• Privacy Many candidates for f(x)

• If y is a candidate of f(x) when f is standalone,

y is still a candidate when f is in a workflow

• Show existence of possible worlds by redefining private modules

Proof Sketch of Composability Theorem - 2

3/19/2013

f

g

h

x

z y

Expected ObservedConflict

No conflict

Need to handle new conflicts at other inputs/outputs • Cannot redefine public modules: UD-safety helps• More complex structure in general

Page 19: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013

About General Workflows

• Find standalone-private solution for private modules (only outputs are hidden, hiding inputs may not work with public modules)

• In a workflow, propagate hiding attributes through public successors

• Repeatedly propagate hiding

• Can we stop at a private successor?– Yes: For single-predecessor workflows– No: For general workflows– Solution: propagate through private successors as well

19

=

=

=

=

3/19/2013

Page 20: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 20

Related Work• Workflow privacy (mainly access control)

– Chebotko et. al. ’08, Gil et. al. ’07, ’10

• Secure Provenance– Tan et. al. ’06, Hasan et. al. ’07, Braun et. al. ’08, Ni et. al. ’09, Chong ’09, Cadenhead et. al. ’11, Cheney ’11

• L-Diversity and its limitations– Machanavajjhala et. al. ’06, Ganta et. al. ’08, Kifer ’09, Fang et. al. ’08, Cormode et. al. ’11, Xiao et. al. ’10, Wong et. al. ’07

• Privacy-preserving data mining– Surveys by Aggarwal-Yu ’08, Verykios et. al. ’04

• Differential Privacy/Privacy in statistical databases– Survey by Dwork ’08

3/19/2013

Page 21: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 21

Conclusions• Workflow-Privacy of modules by data hiding in public/private wfs• Propagating hiding through public modules• Composability Theorem and Optimization Problems

Future Work:• Extend to stronger notion of privacy

– Differential Privacy? – Randomization may not work for Sc. Expts.– Can our possible world model be useful?

• Applicability in practice

3/19/2013

Page 22: A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington.

ICDT 2013 22

Thank You

Questions

3/19/2013