Challenges in Privacy-Preserving Analysis of Structured Data
Transcript of Challenges in Privacy-Preserving Analysis of Structured Data
![Page 1: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/1.jpg)
Challenges in Privacy-Preserving Analysis of Structured Data
Kamalika Chaudhuri
University of California, San Diego
Computer Science and Engineering
![Page 2: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/2.jpg)
Sensitive Structured Data
Medical Records
Search Logs
Social Networks
![Page 3: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/3.jpg)
This Talk: Two Case Studies
1. Privacy-preserving HIV Epidemiology
2. Privacy in Time-series data
![Page 4: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/4.jpg)
HIV Epidemiology
Goal: Understand how HIV spreads among people
![Page 5: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/5.jpg)
HIV Transmission Data
distance (Seq-A, Seq-B) < t
HIV transmission
Virus Seq-A
A
Virus Seq-B
B
![Page 6: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/6.jpg)
From Sequences to Transmission Graphs
Node = Patient
Edge = Plausible transmission
Viral Sequences
![Page 7: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/7.jpg)
…Growing over Time
Node = Patient
Edge = Transmission
2015
![Page 8: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/8.jpg)
…Growing over Time
Node = Patient
Edge = Transmission
2015 2016
![Page 9: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/9.jpg)
…Growing over Time
Node = Patient
Edge = Transmission
2015 2016 2017
![Page 10: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/10.jpg)
…Growing over Time
2015 2016 2017
Release properties of G with privacy across timeGoal:
![Page 11: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/11.jpg)
Problem: Continual Graph Statistics Release
Given: (Growing) graph GAt time t, nodes and adjacent edges arrive(@Vt, @Et)
Goal: At time t, release f(Gt), where f = graph statistic, and Gt = ([st@Vs,[st@Es)
while preserving patient privacy and high accuracy
![Page 12: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/12.jpg)
What kind of Privacy?
Patient A is in the graphHide:
Release: Large scale properties
Node = Patient
Edge = Transmission
![Page 13: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/13.jpg)
What kind of Privacy?
Node = Patient
Edge = Transmission
A particular patient has HIVHide:
Release: Statistical properties (degree distribution, clusters, does therapy help, etc)
Privacy notion: Node Differential Privacy
![Page 14: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/14.jpg)
Talk Outline
• The Problem: Private HIV Epidemiology
• Privacy Definition: Differential Privacy
![Page 15: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/15.jpg)
Differential Privacy [DMNS06]
“similar”
RandomizedAlgorithm
Randomized Algorithm
Data +
Data +
Participation of a single person does not change output
![Page 16: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/16.jpg)
Differential Privacy: Attacker’s View
Prior Knowledge +
AlgorithmOutput on Data &
=Conclusion
on
Prior Knowledge +
AlgorithmOutput on Data &
=Conclusion
on
a. Algorithm could draw personal conclusions about Alice
b. Alice has the agency to participate or not
Note:
![Page 17: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/17.jpg)
Differential Privacy [DMNS06]
For all D, D’ that differ in one person’s value,t
D D’p[A(D) = t] p[A(D’) = t]
If A = -differentially private randomized algorithm, then:✏
sup
t
��� logp(A(D) = t)
p(A(D0) = t)
��� ✏
![Page 18: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/18.jpg)
Differential Privacy
1. Provably strong notion of privacy
2. Good approximations for many functions
e.g, means, histograms, etc.
![Page 19: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/19.jpg)
Node Differential Privacy
Node = Patient
Edge = Transmission
![Page 20: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/20.jpg)
Node Differential Privacy
Node = Patient
Edge = Transmission
One person’s value = One node + adjacent edges
![Page 21: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/21.jpg)
Talk Outline
• The Problem: Private HIV Epidemiology
• Privacy Definition: Node Differential Privacy
• Challenges
![Page 22: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/22.jpg)
Problem: Continual Graph Statistics Release
Given: (Growing) graph GAt time t, nodes and adjacent edges arrive(@Vt, @Et)
Goal: At time t, release f(Gt), where f = graph statistic, and Gt = ([st@Vs,[st@Es)
with node differential privacy and high accuracy
![Page 23: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/23.jpg)
Why is Continual Release of Graphs with Node Differential Privacy hard?
1. Node DP challenging in static graphs [KNRS13, BBDS13]
2. Continual release of graph data has extra challenges
![Page 24: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/24.jpg)
Challenge 1: Node DP
Removing one node can change properties by a lot (even for static graphs)
#edges = 6 (size of V) #edges = 0
Hiding one node needs high noise low accuracy
![Page 25: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/25.jpg)
Prior Work: Node DP in Static Graphs
- Project to low degree graph G’ and use node DP on G’- Projection algorithm needs to be “smooth” and
computationally efficient
Approach 1 [BCS15]:
Approach 2 [KNRS13, RS15]:
- Assume bounded max degree
![Page 26: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/26.jpg)
Challenge 2: Continual Release of Graphs
- Methods for tabular data [DNPR10, CSS10] do not apply
- Sequential composition gives poor utility
- Graph projection methods are not “smooth” over time
![Page 27: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/27.jpg)
Talk Outline
• The Problem: Private HIV Epidemiology
• Privacy Definition: Node Differential Privacy
• Challenges
• Approach
![Page 28: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/28.jpg)
Algorithm: Main Ideas
Strategy 1: Assume bounded max degree of G (from domain)
Strategy 2: Privately release “difference sequence” of statistic(instead of the direct statistic)
![Page 29: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/29.jpg)
Difference Sequence
GraphSequence:
G1 G2 G3
StatisticSequence: f(G1) f(G2) f(G3)
DifferenceSequence:
f(G1) f(G2) - f(G1) f(G3) - f(G2)
![Page 30: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/30.jpg)
Key Observation
Key Observation: For many graph statistics, when G is degree bounded, the difference sequence has low sensitivity
Example Theorem: If max degree(G) = D, then sensitivity of the difference sequence for #high degree nodes is at most 2D + 1.
![Page 31: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/31.jpg)
From Observation to Algorithm
Algorithm:
1. Add noise to each item of difference sequence to hide effect of single node and publish
2. Reconstruct private statistic sequence from private difference sequence
![Page 32: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/32.jpg)
How does this work?
![Page 33: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/33.jpg)
Experiments - Privacy vs. Utility
#high degree nodes
Our Algorithm, DP Composition 1, DP Composition 2
#edges
Baselines:
![Page 34: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/34.jpg)
Experiments - #Releases vs. Utility
#high degree nodes#edges
Our Algorithm, DP Composition 1, DP Composition 2Baselines:
![Page 35: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/35.jpg)
Talk Agenda
Privacy is application-dependent!
Two applications:
1. HIV Epidemiology
2. Privacy of time-series data - activity monitoring, power consumption, etc
![Page 36: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/36.jpg)
Time Series Data
Physical ActivityMonitoring
Location traces
![Page 37: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/37.jpg)
Example: Activity Monitoring
Hide: Activity at each time against adversary with prior knowledge
Data: Activity trace of a subject
Release: (Approximate) aggregate activity
![Page 38: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/38.jpg)
Why is Differential Privacy not Right for Correlated data?
![Page 39: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/39.jpg)
1-DP: Output histogram of activities + noise with stdev T
Correlation Network
Example: Activity Monitoring
D = (x1, .., xT), xt = activity at time t
Too much noise - no utility!
Data from a single subject
![Page 40: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/40.jpg)
Correlation Network
Example: Activity Monitoring
D = (x1, .., xT), xt = activity at time t
1-entry-DP: Output activity histogram + noise with stdev 1
Not enough noise - activities across time are correlated!
![Page 41: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/41.jpg)
Correlation Network
Example: Activity Monitoring
D = (x1, .., xT), xt = activity at time t
1-entry-group DP: Output activity histogram + noise with stdev T
Too much noise - no utility!
![Page 42: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/42.jpg)
How to define privacy for Correlated Data ?
![Page 43: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/43.jpg)
Pufferfish Privacy [KM12]
Secret Set S
S: Information to be protected
e.g: Alice’s age is 25, Bob has a disease
![Page 44: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/44.jpg)
Pufferfish Privacy [KM12]
Secret Set SSecret Pairs Set Q
Q: Pairs of secrets we want to be indistinguishable
e.g: (Alice’s age is 25, Alice’s age is 40)
(Bob is in dataset, Bob is not in dataset)
![Page 45: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/45.jpg)
Pufferfish Privacy [KM12]
Secret Set SSecret Pairs Set Q
Distribution Class ⇥
e.g: (connection graph G, disease transmits w.p [0.1, 0.5])
(Markov Chain with transition matrix in set P)
: A set of distributions that plausibly generate the data⇥
May be used to model correlation in data
![Page 46: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/46.jpg)
Pufferfish Privacy [KM12]
Secret Set SSecret Pairs Set Q
Distribution Class ⇥
whenever P (si|✓), P (sj |✓) > 0
p(A(X)|sj , ✓)p(A(X)|si, ✓)
t
p✓,A(A(X) = t|si, ✓) e✏ · p✓,A(A(X) = t|sj , ✓)
An algorithm A is -Pufferfish private with parameters
(S,Q,⇥) if for all (si, sj) in Q, for all , all t,✓ 2 ⇥ X ⇠ ✓,
✏
![Page 47: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/47.jpg)
Pufferfish Interpretation of DP
Theorem: Pufferfish = Differential Privacy when:
S = { si,a := Person i has value a, for all i, all a in domain X }
Q = { (si,a si,b), for all i and (a, b) pairs in X x X }
= { Distributions where each person i is independent }⇥
![Page 48: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/48.jpg)
Pufferfish Interpretation of DP
Theorem: Pufferfish = Differential Privacy when:
S = { si,a := Person i has value a, for all i, all a in domain X }
Q = { (si,a si,b), for all i and (a, b) pairs in X x X }
= { Distributions where each person i is independent }⇥
Theorem: No utility possible when:
= { All possible distributions }⇥
![Page 49: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/49.jpg)
How to get Pufferfish privacy?
Special case mechanisms [KM12, HMD12]
Is there a more general Pufferfish mechanism for a large class of correlated data?
Our work: Yes, the Markov Quilt Mechanism
(Also concurrent work [GK16])
![Page 50: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/50.jpg)
Correlation Measure: Bayesian Networks
Node: variable
Directed Acyclic Graph
Pr(X1, X2, . . . , Xn) =Y
i
Pr(Xi|parents(Xi))
Joint distribution of variables:
![Page 51: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/51.jpg)
A Simple Example
X1 X2 X3 Xn
Xi in {0, 1}
Model:
State Transition Probabilities:
0 1
1 - p
1 - p
pp
![Page 52: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/52.jpg)
A Simple Example
X1 X2 X3 Xn
Xi in {0, 1}
Model:
State Transition Probabilities:
0 1
1 - p
1 - p
pp
Pr(X2 = 0| X1 = 0) = p
….
Pr(X2 = 0| X1 = 1) = 1 - p
![Page 53: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/53.jpg)
A Simple Example
X1 X2 X3 Xn
Xi in {0, 1}
Model:
State Transition Probabilities:
0 1
1 - p
1 - p
pp
Pr(X2 = 0| X1 = 0) = p
….
Influence of X1 diminishes with distance
Pr(Xi = 0| X1 = 0) =1
2+
1
2(2p� 1)i�1
Pr(X2 = 0| X1 = 1) = 1 - p
1
2� 1
2(2p� 1)i�1Pr(Xi = 0| X1 = 1) =
![Page 54: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/54.jpg)
Algorithm: Main Idea
Goal: Protect X1
X1 X2 X3 Xn
![Page 55: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/55.jpg)
Algorithm: Main Idea
Goal: Protect X1
X1 X2 X3 Xn
Local nodes Rest(high correlation) (almost independent)
![Page 56: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/56.jpg)
Algorithm: Main Idea
Goal: Protect X1
X1 X2 X3 Xn
Add noise to hidelocal nodes
Small correctionfor rest+
Local nodes Rest(high correlation) (almost independent)
![Page 57: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/57.jpg)
Measuring “Independence”
Max-influence of Xi on a set of nodes XR:
To protect Xi, correction term needed for XR is exp(e(XR|Xi))
e(X
R
|Xi
) = max
a,b
sup
✓2⇥max
xR
log
Pr(X
R
= x
R
|Xi
= a, ✓)
Pr(X
R
= x
R
|Xi
= b, ✓)
Low e(XR|Xi) means XR is almost independent of Xi
![Page 58: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/58.jpg)
How to find large “almost independent” sets
Brute force search is expensive
Use structural properties of the Bayesian network
![Page 59: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/59.jpg)
Markov Blanket
Markov Blanket(Xi) =Set of nodes XS s.t Xi is independent of X\(Xi U XS)given XS
(usually, parents, children,other parents of children)
Xi
XS
Markov Blanket (Xi)
![Page 60: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/60.jpg)
Define: Markov Quilt
XQ is a Markov Quilt of Xi if:
2. Xi lies in XN
1. Deleting XQ breaks graph into XN and XR
3. XR is independent of Xi
given XQ
Xi XQ
XR
XN
(For Markov Blanket XN = Xi)
![Page 61: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/61.jpg)
Why do we need Markov Quilts?
Given a Markov Quilt,
Xi XQ
XR
XN
XN = local nodes for Xi XQ U XR = rest
![Page 62: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/62.jpg)
From Markov Quilts to Amount of Noise
Xi XQ
XR
XN Stdev of noise to protect Xi:
Score(XQ) =
Correction for XQ U XR
Noise due to XN
Let XQ = Markov Quilt for Xi
card(XN )
✏� e(XQ|Xi)
Search all Markov Quilts to find one that needs min noise
![Page 63: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/63.jpg)
Privacy Properties
Privacy: MQM is -Pufferfish private✏
![Page 64: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/64.jpg)
Graceful Composition
MQM for Markov Chains has:
- Additive sequential composition
- Parallel composition with a correction term
X1 X2 X3 Xn
![Page 65: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/65.jpg)
Simulations - Task
X1 X2 X3 Xn
Xi in {0, 1}
Model:
State Transition Probabilities:
0 1
1 - p
q
1-qp
Model Class:⇥ = [`, 1� `]
(implies p and q can lieanywhere in )⇥
Sequence length = 100
![Page 66: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/66.jpg)
Simulations - Results
Methods: - Two versions of Markov Quilt Mechanism (MQMExact, MQMApprox)- GK16
0.1 0.15 0.2 0.25 0.3 0.35 0.40
1
2
3
4
5
L1er
ror
GK16MQM ApproxMQM Exact
0.1 0.15 0.2 0.25 0.3 0.35 0.40
0.2
0.4
0.6
0.8
1
L1er
ror
GK16MQM ApproxMQM Exact
` `✏ = 0.2 ✏ = 1
![Page 67: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/67.jpg)
Real Data - Activity Measurement
Dataset on physical activity by three groups of subjects: 40 cyclists, 16 older women and 36 overweight women
4 states (active, standing still, standing moving, sedentary)
Over 9,000 observations per subject
Methods:
MQMExact and MQMApprox
GK16 does not apply
GroupDP
⇥ = { Empirical data generating distribution }
![Page 68: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/68.jpg)
Real Data - Activity Measurement
Active Stand Still Stand Moving Sedentary0
0.2
0.4
0.6
0.8
1
Rel
ativ
e Fr
eque
ncy
Group-DPMQM ApproxMQM Exact
Active Stand Still Stand Moving Sedentary0
0.2
0.4
0.6
0.8
1
Rel
ativ
e Fr
eque
ncy
Group-DPMQM ApproxMQM Exact
Active Stand Still Stand Moving Sedentary0
0.2
0.4
0.6
0.8
1
Rel
ativ
e Fr
eque
ncy
Group-DPMQM ApproxMQM Exact
Cyclists
Older Overweight
Aggregated results (over groups)
✏ = 1
![Page 69: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/69.jpg)
Real Data - Power Consumption
Dataset on power consumption in a single household
Power consumption discretized to 51 levels (51 states)
Over 1 million observations
Methods:
MQMExact vs. MQMApprox
GK16 does not apply
GroupDP has too little utility
⇥ = { Empirical data generating distribution }
![Page 70: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/70.jpg)
Real Data - Power Consumption
Methods: Two versions of Markov Quilt Mechanism (MQMExact, MQMApprox)
✏ = 0.2 ✏ = 1
![Page 71: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/71.jpg)
Conclusion
• Real problems have complex privacy challenges
• Rigorous privacy definitions are available
• For any privacy problem, important to think:
• What do we need to hide?
• What do we need to reveal?
![Page 72: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/72.jpg)
References
• “Differentially Private Continual Release of Graph Statistics”, S. Song, S. Mehta, S. Vinterbo, S. Little and K. Chaudhuri, Arxiv, 2018
• “Pufferfish Privacy Mechanisms for Correlated Data”, S. Song, Y. Wang and K. Chaudhuri, SIGMOD 2018.
• “Composition Properties of Inferential Privacy on Time-Series Data”, S. Song and K. Chaudhuri, Allerton 2018.
![Page 73: Challenges in Privacy-Preserving Analysis of Structured Data](https://reader036.fdocuments.in/reader036/viewer/2022081400/628efe6a1e20b306b651fd6f/html5/thumbnails/73.jpg)
Thanks!