Role of Data Privacy on International Business: Innovating ...
The Role of History and Prediction in Data Privacy
-
Upload
ariel-singleton -
Category
Documents
-
view
38 -
download
1
description
Transcript of The Role of History and Prediction in Data Privacy
![Page 1: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/1.jpg)
The Role of History and Prediction in Data Privacy
Kristen LeFevre
University of Michigan
May 13, 2009
QuickTime™ and a decompressor
are needed to see this picture.
![Page 2: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/2.jpg)
2
Data Privacy
• Personal information collected every day
Healthcare, insurance information
Supermarket transaction data
RFID, GPS Data
E-mailEmployment history
Web search / clickstream
![Page 3: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/3.jpg)
3
Data Privacy
• Legal, ethical, technical issues surrounding– Data ownership– Data collection– Data dissemination and use
• Considerable recent interest from technical community– High-profile mishaps and lawsuits– Compliance with data-sharing mandates QuickTime™ and a
decompressorare needed to see this picture.
![Page 4: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/4.jpg)
4
Privacy Protection Technologies for Public Datasets
• Goal: Protect sensitive personal information while preserving data utility
• Privacy Policies and Mechanisms• Example Policies:
– Protect individual identities– Protect the values of sensitive attributes– Differential privacy [Dwork 06]
• Example Mechanisms:– Generalize (“coarsen”) the data– Aggregate the data– Add random noise to the data– Add random noise to query results
![Page 5: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/5.jpg)
5
Observations
• Much work has focused on static data– One-time snapshot publishing– Disclosure by composing multiple different
snapshots of a static database [Xiao 07, Ganta 08]
– Auditing queries on a static database [Chin 81, Kenthapadi 06, …]
• What are the unique challenges when the data evolves over time?
![Page 6: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/6.jpg)
6
Outline
• Sample Problem: Continuously publishing privacy-sensitive GPS traces– Motivation & problem setup– Framework for reasoning about privacy– Algorithms for continuous publishing– Experimental results
• Applications to other dynamic dataspeculation
![Page 7: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/7.jpg)
7
GPS Traces(ongoing work w/ Wen Jin, Jignesh Patel)
• GPS devices attached to phones, cars• Interest in collecting and distributing
location traces in real time– Real-time traffic reporting– Adaptive pricing / placement of outdoor ads
• Simultaneous concern for personal privacy• Challenge: Can we continuously collect
and publish location traces without compromising individual privacy?
![Page 8: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/8.jpg)
8
Data Recipient
QuickTime™ and a decompressor
are needed to see this picture.
Problem Setting
QuickTime™ and a decompressor
are needed to see this picture.
Central TraceRepository
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
GPS Users (7 AM)P
riva
cy P
oli
cy
“Sanitized” LocationSnapshot
“Sanitized” LocationSnapshot
GPS Users (7:05 AM)
“Sanitized” LocationSnapshot
“Sanitized” LocationSnapshot
![Page 9: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/9.jpg)
9
Problem Setting
• Finite population of n users with unique identifiers {u1,…,un}
• Assume users’ locations are reported and published in discrete epochs t1,t2,…
• Location snapshot D(tj)– Associates each user with a location during
epoch tj
• Publish sanitized version D*(tj )
![Page 10: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/10.jpg)
10
Threat Model
• Attacker wants to determine the location of a target user ui during epoch tj
• Auxiliary Information: Attacker knows location information during some other epochs (e.g., Yellow Pages)
QuickTime™ and a decompressor
are needed to see this picture.
![Page 11: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/11.jpg)
11
Some Naïve Solutions
• Strawman 1: Replace users’ identifiers ({u1,…,un}) with pseudonyms ({p1,…,pn})
– Problem: Once attacker “unmasks” user pi, he can track her location forever
• Strawman 2: New pseudonyms ({p1j,…,pn
j}) at each epoch tj
– Problem: Users can still be tracked using multi-target tracking tools [Gruteser 05, Krumm 07]
![Page 12: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/12.jpg)
12
Key Problem: Motion Prediction
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture. QuickTime™ and a decompressor
are needed to see this picture.
1
2 3{Alice, Bob, Charlie}
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
4
5
6{Alice, Bob, Charlie}
What if the speedlimit is 60 mph?
Alice Alice
![Page 13: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/13.jpg)
13
Threat Model
• Attacker wants to determine the location of a target user ui during epoch tj
• Auxiliary Information: Attacker knows location information during some other epochs (e.g., Yellow Pages)
• Motion prediction: Given one or more locations for ui, attacker can predict (probabilistically) ui’s location during following and preceding epochs
![Page 14: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/14.jpg)
14
Privacy Principle: Temporal Unlinkability
• Consider an attacker who is able to identify (locate) target user uj during m sequential epochs
• Under reasonable assumptions, he should not be able to locate uj with high confidence during any other epochs*
*Similar in spirit to “mix zones” [Beresford 03], which addressed a related problem in a less-formal way.
![Page 15: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/15.jpg)
15
Sanitization Mechanism
• Needed to select a sanitization mechanism; chose one for maximum flexibility
• Assign each user ui consistent pseudonym pi
• Divide users into clusters– Within each cluster, break association between
pseudonym, location
• Release candidate for D(tj)
D*(tj) = {(C1(tj), L1(tj)),…, (CB(tj), LB(tj))} i=1..B Ci(tj) = {p1,…,pn}– Ci(tj) Ch(tj) = (i h)– Each Li(tj) contains the locations of users in Ci(tj)
![Page 16: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/16.jpg)
16
Sanitization Mechanism: Example
• Pseudonyms {p1, p2, p3, p4}
{p1,p2}
{p3,p4}
t0
QuickTime™ and a decompressor
are needed to see this picture.1QuickTime™ and a
decompressorare needed to see this picture.2
QuickTime™ and a decompressor
are needed to see this picture.3
QuickTime™ and a decompressor
are needed to see this picture.4
{p1,p2}
{p3,p4}
t1
QuickTime™ and a decompressor
are needed to see this picture.5QuickTime™ and a
decompressorare needed to see this picture.6
QuickTime™ and a decompressor
are needed to see this picture.7
QuickTime™ and a decompressor
are needed to see this picture.8
{p1,p3}
{p2,p4}
t2
QuickTime™ and a decompressor
are needed to see this picture.9
QuickTime™ and a decompressor
are needed to see this picture.10
QuickTime™ and a decompressor
are needed to see this picture.11QuickTime™ and a
decompressorare needed to see this picture.12
![Page 17: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/17.jpg)
17
Reasoning about Privacy
• How can we guarantee temporal unlinkability under the threats of auxiliary information and motion prediction?– (Using the cluster-based sanitization mechanism)
• Novel framework with two key components– Motion model describes location correlations
between epochs– Breach probability function describes an
attacker’s ability to compromise temporal unlinkability
![Page 18: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/18.jpg)
18
Motion Models
• Model motion using an h-step Markov chain– Conditional probability for user’s location, given his location
during h prior (future) epochs– Same motion model used by attacker and publisher
• Forward motion model template
– Pr[Loc(P,Tj) = Lj | Loc(P,Tj-1) = Lj-1, …, Loc(P,Tj-h) = Lj-h]
• Backward motion model template
– Pr[Loc(P,Tj) = Lj | Loc(P,Tj+1) = Lj+1, …, Loc(P,Tj+h) = Lj+h]
• Independent and replaceable component– For this work, used 1-step motion model based on velocity
distribution (speed and direction)
![Page 19: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/19.jpg)
19
Motion Models: Example
{p1,p2}
{p3,p4}
t0 t1
• Pseudonyms {p1, p2, p3, p4}• Epochs t0, t1, t2
QuickTime™ and a decompressor
are needed to see this picture.p1QuickTime™ and a
decompressorare needed to see this picture.p2
QuickTime™ and a decompressor
are needed to see this picture.p3
QuickTime™ and a decompressor
are needed to see this picture.p4
QuickTime™ and a decompressor
are needed to see this picture.aQuickTime™ and a
decompressorare needed to see this picture.b
QuickTime™ and a decompressor
are needed to see this picture.c
QuickTime™ and a decompressor
are needed to see this picture.d
t2
QuickTime™ and a decompressor
are needed to see this picture.p3
QuickTime™ and a decompressor
are needed to see this picture.p1
QuickTime™ and a decompressor
are needed to see this picture.p2QuickTime™ and a
decompressorare needed to see this picture.p4
Pr[loc(p1,t1) = a|Loc(p1,t0)=x]
Pr[loc(p1,t1) = b|Loc(p1,t0)=x]Pr[loc(p1,t1) = a|Loc(p1,t2)=y]
![Page 20: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/20.jpg)
20
Privacy Breaches
• Forward breach probability– Pr[Loc(P,Tj) = Lj | D(Tj-1), …, D(Tj-h), D*(Tj)]
• Backward breach probability– Pr[Loc(P,Tj) = Lj | D(Tj+1), …, D(Tj+h), D*(Tj)]
• Privacy Breach: Release candidate D*(Tj) causes a breach iff either of the following is true for threshold Cmax P, Lj Pr[Loc(P,Tj) = Lj | D(Tj-1), …, D(Tj-h), D*(Tj)] > C
max P, Lj Pr[Loc(P,Tj) = Lj | D(Tj+1), …, D(Tj-h), D*(Tj)] > C
![Page 21: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/21.jpg)
21
Privacy Breaches: Example
{p1,p2}
{p3,p4}
t0 t1
QuickTime™ and a decompressor
are needed to see this picture.p1QuickTime™ and a
decompressorare needed to see this picture.p2
QuickTime™ and a decompressor
are needed to see this picture.p3
QuickTime™ and a decompressor
are needed to see this picture.p4
QuickTime™ and a decompressor
are needed to see this picture.aQuickTime™ and a
decompressorare needed to see this picture.b
QuickTime™ and a decompressor
are needed to see this picture.c
QuickTime™ and a decompressor
are needed to see this picture.d
e1 = Pr[loc(p1,t1) = a|Loc(p1,t0)=x]
e2 = Pr[loc(p1,t1) = b|Loc(p1,t0)=x]
e3 = Pr[loc(p2,t1) = a|Loc(p2,t0)=y]
e4 = Pr[loc(p2,t1) = b|Loc(p2,t0)=y]
Pr[loc(p1,t1) = a|D(T0), D*(T1)] =
e1 * e4
e1 * e4 + e2 * e3
…Goal: Verify that all (forward and
backward) breach probabilities < threshold C
x
y
![Page 22: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/22.jpg)
22
Checking for Breaches
• Does release candidate D*(Tj) cause a breach?
• Brute force algorithm– Exponential in release candidate cluster size
• Heuristic pruning tools– Reduce the search space considerably in
practice
![Page 23: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/23.jpg)
23
Publishing Algorithms
• How to publish useful data, without causing a privacy breach?
• Cluster-based sanitization mechanism offers two main options– Increase cluster size (or change composition)– Reduce publication frequency
![Page 24: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/24.jpg)
24
Publishing Algorithms
• General Case– At each epoch Tj, publish the most compact release
candidate D*(Tj) that does not cause a breach– Need to delay publishing until epoch Tj+h to check for
backward breaches– NP-hard optimization problem; proposed alternative
heuristics
• Special Case– Durable clusters (same individuals at each epoch)– Motion model satisfies symmetry property– No need to delay publishing
![Page 25: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/25.jpg)
25
Experimental Study
• Used real highway traffic data from UM Transportation Research Institute
– GPS data sampled from cars of 72 volunteers– Sampling rate (epoch) = 0.01 seconds– Speed range 0-170 km/hour
• Also synthetic data– Able to control the generative motion distribution
![Page 26: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/26.jpg)
26
Experimental Study
• All static “snapshot” anonymization mechanisms vulnerable to motion prediction attacks– Applied two representative algorithms (r-Gather
[Aggarwal 06] and k-Condense [Aggarwal 04])– Each produces a set of clusters with k users each
QuickTime™ and a decompressor
are needed to see this picture.
r-Gather
QuickTime™ and a decompressor
are needed to see this picture.
k-Condense
![Page 27: The Role of History and Prediction in Data Privacy](https://reader035.fdocuments.in/reader035/viewer/2022062314/5681301f550346895d959e19/html5/thumbnails/27.jpg)
27
Speculation / Future Work
• GPS example illustrates importance of reasoning about data dynamics and history, and predictable patterns of change in privacy
• Dynamic private data in other apps.– E.g., longitudinal social science data
• Study subjects age predictably • Most people don’t move very far• Income changes predictably
• Hypothesis: History and prediction are important in these settings, too!