A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of...
-
Upload
iyanna-hind -
Category
Documents
-
view
214 -
download
0
Transcript of A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of...
ICDT 2013 1
A Propagation Model for
Provenance Views of
Public/Private Workflows
Susan Davidson U. of PennsylvaniaTova Milo Tel Aviv U.
Sudeepa Roy U. of Washington
3/19/2013
ICDT 2013 2
A Propagation Model for
Provenance Views of
Public/Private Workflows
Susan Davidson U. of PennsylvaniaTova Milo Tel Aviv U.
Sudeepa Roy U. of Washington
3/19/2013
ICDT 2013 3
Visual representation of a number of processes that interact to produce one or more outputs given some inputs
Modeled as a directed acyclic graph
In an execution of the workflow, data values appear on the edges
3/19/2013
Workflows
Vertices = Modules/Processes
Edges = Dataflow
start
Split Entries
Align Sequences
Functional Data
Curate Annotations
Format
Format
Format
Construct Trees
end
d1
d2
d3<x1, x2, x3>
<y1, y2>
<z1>
ICDT 2013 4
Which processes were executed?
3/19/2013
Data Provenance in Workflows
Track Provenance: Record and show all data values in all executions• Helps validate the experiment• Ensures repeatability and debugging
But, many private/proprietary elements …Our focus: Module Privacy
?
Run 1 Run 2 . . .
d1 <x1, x2, x3> <u1, u2, u3> . . .
d2 <y1, y2> <v1, v2> . . .
d3 <z1> <w1> . . .
. . .
start
Split Entries
Align Sequences
Functional Data
Curate Annotations
Format
Format
Format
Construct Trees
end
How has this tree been generated?
Provenanced1
d2
d3
ICDT 2013 5
Motivation: Module Privacy
3/19/2013
Revealing all data as provenance in an execution can reveal module behavior
Goal: Partially hide provenanceto protect the privacy of modules when they belong to a workflow
d1 d2 d3
<x1, x2, x3> <y1, y2> <z1>
ICDT 2013 6
Public/Private WorkflowsPrivate Modules (no a priori knowledge to the user)
e.g. Modules for gene sequencing, drug synthesis, etc.
Public Modules (full knowledge to the user)e.g. Modules for reformatting, sorting, display, etc.
3/19/2013
Private
Reformatting
Public
Public
Private
ICDT 2013 7
Module f takes input x, produces output y = f(x)
3/19/2013
Module f
x1 x2 x3 x4
y1y2 y3
f(x1, x2, x3, x4) = <y1, y2, y3>
Given privacy requirement L, for all inputs x to a private module f,
f(x) has ≥ L ‘equivalent’ candidate values w.r.t. visible provenance data
(similar to L-diversity [MKGV’07])
Definition: Module Privacy
ICDT 2013 8
‘Equivalent Candidates’ and Provenance Views
3/19/2013
=
x y
0 0
1 1
x
y
z
x y
0 0
1 0
x y
0 1
1 1
x y
0 1
1 0
y z
0 1
1 0
y z
0 0
1 1
Output a provenance view (incomplete provenance): Projection on visible attributes
Possible Worlds: Same projection and respect the functional dependency
Standalone-private View: Each input maps to L=2 different outputs by possible worlds
Workflow-private View: Possible worlds should respect all func. dep
Possible worlds
Func. dep. x y
Func. dep. y z
y y z
0 1
1 1
Not a possible world
x y z
0 0 1
1 1 0
x y y z
Workflow Module executions as relations with func. dep.
Run 1Run 2
Run 1Run 2
ICDT 2013 93/19/2013
“Composability Theorem”
if all modules are private (no public modules)
Any combination of standalone-private-views
gives
workflow-private-views for all of them
Previous Work: Module Privacy for Workflow Provenance
[Davidson–Khanna-Milo-Panigrahi-R. : PODS’11]
Hiding union of hidden attributes
in standalone solutions
x y
0 0
1 1
z
y z
0 1
1 0
x y z
0 0 1
1 1 0y
x
No public modules
ICDT 2013 10
Why care about composability?• Compose local standalone-private solutions arbitrarily to get a global workflow-private solution (which is hard)
• Local solutions are NP-hard too, but in the #attributes of a single module – smaller than all attributes in a workflow
• We can do preprocessing, or exploit module designers’ knowledge
• But composability fails with public modules common in workflows
3/19/2013
ICDT 2013 11
=
Problem with Public Modules
3/19/2013
This work: Propagate hiding through public modules
Public
Private Composability theorem does not hold any more
Our solution in [DKMPR ’11]: “Privatize” some public modules
Does not work when module’s identity can be guessed from attribute names, connections etc.
0
1
1
ICDT 2013
This paper: A Propagation Model
• Find standalone-private solution for private modules (only outputs are hidden, hiding inputs may not work in public/private workflows)
• In a workflow, propagate hiding attributes through public successors
• Repeatedly propagate hiding
• Can we stop at a private successor?– Yes: For single-predecessor workflows– No: For general workflows 12
=
=
3/19/2013
ICDT 2013
Single-Predecessor Workflows• (Intuitively) Every public module has at most one private predecessor
• Still can have complex structure
• Special cases: Chains/Trees
• Propagate hiding in “public closure” (reachable through undirected public path from a hidden output attribute)
• Next, how much to hide133/19/2013
ICDT 2013 14
Upstream/Downstream Safety for Public Modules
• Visible attributes of public modules should not reveal any information
• Upstream/Downstream-safe (UD-safe):Equivalent inputs Equivalent outputsEquivalent outputs (all) Equivalent inputs
Hiding everything is trivially UD-safe
3/19/2013
a11 a2 a3 a4
0 0 1 0
0 1 1 0
1 0 0 1
1 1 0 1
a2a1
a3 a4
UD-safe Not UD-safe
Inputs Outputs
ICDT 2013 15
Composability Theorem forSingle-Predecessor Workflows
3/19/2013
Theorem: Each private module is workflow-privateif the hidden attributes satisfy …
1. The private module is standalone-private
2. Public modules in public-closure are UD-safe
3. No unnecessary hiding
Two levels of composability1. Inside public closure for a given private module2. Among different private modules
Single-pred wf, UD-safety are necessary
ICDT 2013 16
Optimal Composition for Single-Predecessor workflows
3/19/2013
Theorem: Each private module is workflow-privateif the hidden attributes satisfy …
1. The private module is standalone-private
2. Public modules in public-closure are UD-safe
3. No unnecessary hiding
Find list of standalone-private solutions for private modules
Find list of UD-safesolutions for public modules
Optimally compose to find solution for a single private module
Arbitrarily compose to find solution for all private modulesEasy for single-pred wfs
• NP-hard for general DAG• PTIME for trees/chains
ICDT 2013 17
Proof Sketch of Composability Theorem - 1
3/19/2013
Step 1: Assume only one composite modulein public closure
If individual modules are UD-safe, the composite module is also UD-safe(by induction)
Analysis for a single-private module is sufficient:Public closures are disjoint
ICDT 2013 18
Step 2: Standalone to Workflow Privacy
• Privacy Many candidates for f(x)
• If y is a candidate of f(x) when f is standalone,
y is still a candidate when f is in a workflow
• Show existence of possible worlds by redefining private modules
Proof Sketch of Composability Theorem - 2
3/19/2013
f
g
h
x
z y
Expected ObservedConflict
No conflict
Need to handle new conflicts at other inputs/outputs • Cannot redefine public modules: UD-safety helps• More complex structure in general
ICDT 2013
About General Workflows
• Find standalone-private solution for private modules (only outputs are hidden, hiding inputs may not work with public modules)
• In a workflow, propagate hiding attributes through public successors
• Repeatedly propagate hiding
• Can we stop at a private successor?– Yes: For single-predecessor workflows– No: For general workflows– Solution: propagate through private successors as well
19
=
=
=
=
3/19/2013
ICDT 2013 20
Related Work• Workflow privacy (mainly access control)
– Chebotko et. al. ’08, Gil et. al. ’07, ’10
• Secure Provenance– Tan et. al. ’06, Hasan et. al. ’07, Braun et. al. ’08, Ni et. al. ’09, Chong ’09, Cadenhead et. al. ’11, Cheney ’11
• L-Diversity and its limitations– Machanavajjhala et. al. ’06, Ganta et. al. ’08, Kifer ’09, Fang et. al. ’08, Cormode et. al. ’11, Xiao et. al. ’10, Wong et. al. ’07
• Privacy-preserving data mining– Surveys by Aggarwal-Yu ’08, Verykios et. al. ’04
• Differential Privacy/Privacy in statistical databases– Survey by Dwork ’08
3/19/2013
ICDT 2013 21
Conclusions• Workflow-Privacy of modules by data hiding in public/private wfs• Propagating hiding through public modules• Composability Theorem and Optimization Problems
Future Work:• Extend to stronger notion of privacy
– Differential Privacy? – Randomization may not work for Sc. Expts.– Can our possible world model be useful?
• Applicability in practice
3/19/2013
ICDT 2013 22
Thank You
Questions
3/19/2013