Privacy Preserving Serial Data Publishing By Role Composition
description
Transcript of Privacy Preserving Serial Data Publishing By Role Composition
Privacy Preserving Serial Data Publishing By Role Composition
Yingyi Bu1, Ada Wai-Chee Fu1, Raymond Chi-Wing Wong2,Lei Chen2, Jiuyong Li3
The Chinese University of Hong Kong1
The Hong Kong University of Science and Technology2 University of South Australia3
Prepared by Raymond Chi-Wing WongPresented by Raymond Chi-Wing Wong
Outline1. Sequential Releases2. Existing Privacy Models
m-invariance Privacy breaches
3. Our Proposed Privacy Model l-scarcity
4. Experiments5. Conclusion
1. Sequential Releases
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
PublicTime = 1
Release the data set to public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data This table satisfies some privacy requirements(e.g., m-invariance)
1. Sequential Releases
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
PublicTime = 1
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 2
Release the data set to publicHospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
PublicName PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
This table satisfies some privacy requirements(e.g., m-invariance)
Insertions, deletions and updates
1. Sequential Releases
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
PublicTime = 1
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 2
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
PublicName PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 3
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
PublicName PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
This table satisfies some privacy requirements(e.g., m-invariance)
Insertions, deletions and updates
1. Sequential Releases
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
PublicTime = 1
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 2
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
PublicName PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 3
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
PublicName PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t
2. Existing Privacy Models1. Byun et al., “Secure Anonymization for
Incremental datasets”, Secure Data Management, 2006
2. Fung et al, “Anonymity for Continuous Data Publishing”, EDBT, 2008
3. Xiao et al, “m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007
Considers insertions onlyDoes not consider deletions and updates
Considers insertions onlyDoes not consider deletions and updatesConsiders insertions and deletions onlyDoes not consider updates
Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t
Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together
Updates cannot simply be regarded as “a deletion and then an insertion” when privacy is considered.
2. Existing Privacy Models Sensitive Diseases
Transient diseases
Permanent diseases
e.g., If an individual is linked to flu in a published table, s/he can be linked to flu or not in the later published table.
curable E.g. flu, fever
incurable E.g., HIV
e.g., If an individual is linked to HIV in a published table, s/he MUST be linked to HIV in the later published table (that they exist in).
We are the first to study these two kinds of sensitive values.
Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t
2. Existing Privacy Models1. Byun et al., “Secure Anonymization for
Incremental datasets”, Secure Data Management, 2006
2. Fung et al, “Anonymity for Continuous Data Publishing”, EDBT, 2008
3. Xiao et al, “m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007
Considers insertions onlyDoes not consider deletions and updates
Considers insertions onlyDoes not consider deletions and updatesConsiders insertions and deletions onlyDoes not consider updates
Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t
Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together
Does not consider transient/permanent values
Also considers transient/permanent values
Contributions:We consider a more realistic setting of sequential releases. •Insertions, deletions and updates•Transient/permanent valuesWe cannot simply adapt these existing privacy models to this realistic setting.
2. Existing Privacy Models1. Byun et al., “Secure Anonymization for
Incremental datasets”, Secure Data Management, 2006
2. Fung et al, “Anonymity for Continuous Data Publishing”, EDBT, 2008
3. Xiao et al, “m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007
Problem (m-invariance): At the current time t, we want to generate a tablewhich satisfies the following.
Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/m.
Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t
Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together
Problem (l-scarcity): At the current time t, we want to generate a tablewhich satisfies the following.
Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/l.
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Name PID Age Zip Code
Disease
Raymond p1 23 16355 FluPeter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Release the data set to public
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code
Disease
23 16355 Flu22 15500 HIV21 12900 Fever26 18310 HIV25 25000 Flu20 29000 Fever
Medical Data + Some Useful Attributes
Release the data set to public
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Medical Data + Some Useful Attributes
Release the data set to public
Generalization
3-diversity
Each individual is linked to “HIV” with probability at most 1/3 in THIS PUBLISHED TABLE3-diversity only focuses on ONE-TIME publishing
3-invariance focuses on MULTIPLE-TIME publishingIt also makes use of the idea of 3-diversityIdea:Each individual is linked to “HIV” with probability at most 1/3 with respect to MULTIPLE PUBLISHED TABLES
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Medical Data + Some Useful Attributes
Release the data set to public
3-invariance
Time = 1
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Release the data set to public
3-invariance
Time = 1 Time =
1PID Signaturep1
p2
p3
p4
p5
p6
{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}
p1 p2 p3
p4 p5 p6
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Release the data set to public
3-invariance
Time = 1 Time =
1PID Signaturep1
p2
p3
p4
p5
p6
{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Release the data set to public
3-invariance
Time = 1 Time =
1PID Signaturep1
p2
p3
p4
p5
p6
{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 1
Time = 1
PID Signaturep1
p2
p3
p4
p5
p6
{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 1
Time = 1
PID Signaturep1
p2
p3
p4
p5
p6
{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 1
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 1
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 2
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
PID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 2
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
Medical Data + Some Useful Attributes
PID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
This table satisfies 3-invariance.This is because each individual is linked to the SAME signature.
p2 p3 p6
p1 p4 p5
Idea of 3-invariance: Each individual is linked to the SAME signature in each published table.
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 2
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
PID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
Time = 2
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 2
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
PID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
Age Zip Code Disease
[20,22] [12k,29k] HIV
[20,22] [12k,29k] Flu
[20,22] [12k,29k] Fever
[23,26] [16k,25k] Flu
[23,26] [16k,25k] HIV
[23,26] [16k,25k] Fever
Time = 2
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 2
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[20,22] [12k,29k] HIV
[20,22] [12k,29k] Flu
[20,22] [12k,29k] Fever
[23,26] [16k,25k] Flu
[23,26] [16k,25k] HIV
[23,26] [16k,25k] FeverPID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
Time = 2
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 2
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[20,22] [12k,29k] HIV
[20,22] [12k,29k] Flu
[20,22] [12k,29k] Fever
[23,26] [16k,25k] Flu
[23,26] [16k,25k] HIV
[23,26] [16k,25k] Fever
Time = 2
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 2
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[20,22] [12k,29k] HIV
[20,22] [12k,29k] Flu
[20,22] [12k,29k] Fever
[23,26] [16k,25k] Flu
[23,26] [16k,25k] HIV
[23,26] [16k,25k] Fever
Time = 2
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 3
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[20,22] [12k,29k] HIV
[20,22] [12k,29k] Flu
[20,22] [12k,29k] Fever
[23,26] [16k,25k] Flu
[23,26] [16k,25k] HIV
[23,26] [16k,25k] Fever
Time = 2
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
PID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 3
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[20,22] [12k,29k] HIV
[20,22] [12k,29k] Flu
[20,22] [12k,29k] Fever
[23,26] [16k,25k] Flu
[23,26] [16k,25k] HIV
[23,26] [16k,25k] Fever
Time = 2
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
PID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
Age Zip Code
Disease
[21,25]
[12k,16k]
HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
Medical Data + Some Useful Attributes
This table satisfies 3-invariance.This is because each individual is linked to the SAME signature.
p2 p3 p5
p1 p4 p6
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 3
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[20,22] [12k,29k] HIV
[20,22] [12k,29k] Flu
[20,22] [12k,29k] Fever
[23,26] [16k,25k] Flu
[23,26] [16k,25k] HIV
[23,26] [16k,25k] Fever
Time = 2
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
PID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
Age Zip Code
Disease
[21,25]
[12k,16k]
HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
Time = 3
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 3
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[20,22] [12k,29k] HIV
[20,22] [12k,29k] Flu
[20,22] [12k,29k] Fever
[23,26] [16k,25k] Flu
[23,26] [16k,25k] HIV
[23,26] [16k,25k] Fever
Time = 2
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
PID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
Age Zip Code Disease
[21,25] [12k,16k] HIV
[21,25] [12k,16k] Flu
[21,25] [12k,16k] Fever
[20,26] [16k,29k] Flu
[20,26] [16k,29k] HIV
[20,26] [16k,29k] Fever
Time = 3
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 3
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[20,22] [12k,29k] HIV
[20,22] [12k,29k] Flu
[20,22] [12k,29k] Fever
[23,26] [16k,25k] Flu
[23,26] [16k,25k] HIV
[23,26] [16k,25k] Fever
Time = 2
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
PID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
Age Zip Code Disease
[21,25] [12k,16k] HIV
[21,25] [12k,16k] Flu
[21,25] [12k,16k] Fever
[20,26] [16k,29k] Flu
[20,26] [16k,29k] HIV
[20,26] [16k,29k] Fever
Time = 3
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 3
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[20,22] [12k,29k] HIV
[20,22] [12k,29k] Flu
[20,22] [12k,29k] Fever
[23,26] [16k,25k] Flu
[23,26] [16k,25k] HIV
[23,26] [16k,25k] Fever
Time = 2
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[21,25] [12k,16k] HIV
[21,25] [12k,16k] Flu
[21,25] [12k,16k] Fever
[20,26] [16k,29k] Flu
[20,26] [16k,29k] HIV
[20,26] [16k,29k] Fever
Time = 3
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000
… … …David p|RL| 31 31000
Public
Hospital
Voter Registration List
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Medical DataName PID Age Zip
CodeDiseas
eRaymond p1 23 16355 Flu
Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever
Medical Data + Some Useful Attributes
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Release the data set to public
3-invariance
Time = 3
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[20,22] [12k,29k] HIV
[20,22] [12k,29k] Flu
[20,22] [12k,29k] Fever
[23,26] [16k,25k] Flu
[23,26] [16k,25k] HIV
[23,26] [16k,25k] Fever
Time = 2
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[21,25] [12k,16k] HIV
[21,25] [12k,16k] Flu
[21,25] [12k,16k] Fever
[20,26] [16k,29k] Flu
[20,26] [16k,29k] HIV
[20,26] [16k,29k] Fever
Time = 3
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[21,23] [12k,17k] Flu
[21,23] [12k,17k] HIV
[21,23] [12k,17k] Fever
[20,26] [18k,29k] HIV
[20,26] [18k,29k] Flu
[20,26] [18k,29k] Fever
Time = 1
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[20,22] [12k,29k] HIV
[20,22] [12k,29k] Flu
[20,22] [12k,29k] Fever
[23,26] [16k,25k] Flu
[23,26] [16k,25k] HIV
[23,26] [16k,25k] Fever
Time = 2
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code Disease
[21,25] [12k,16k] HIV
[21,25] [12k,16k] Flu
[21,25] [12k,16k] Fever
[20,26] [16k,29k] Flu
[20,26] [16k,29k] HIV
[20,26] [16k,29k] Fever
Time = 3
PID
Signature
p1 {Flu, HIV, Fever}
p2 {Flu, HIV, Fever}
p3 {Flu, HIV, Fever}
p4 {Flu, HIV, Fever}
p5 {Flu, HIV, Fever}
p6 {Flu, HIV, Fever}
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1 Age Zip
CodeDiseas
e[20,2
2][12k,29
k]HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p1 p2 p3
p4 p5 p6
p2 p3 p6
p1 p4 p5
p2 p3 p5
p1 p4 p6
Time = 3
PID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
3-invariance
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
PID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
3-invariance I know all voter registration lists
Knowledge 2
Knowledge 1
Name PID Age
Zip Code
Raymond
p1 23 16355
Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000
… … …David p|RL| 31 31000
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
PID Signaturep1 {Flu, HIV,
Fever}p2 {Flu, HIV,
Fever}p3 {Flu, HIV,
Fever}p4 {Flu, HIV,
Fever}p5 {Flu, HIV,
Fever}p6 {Flu, HIV,
Fever}
3-invariance I know all voter registration lists
Knowledge 2
Knowledge 1
I know that HIV is a permanent sensitive value.Knowledge 3
I can deduce that p1 and p6 cannot be linked to HIV.
There are TWO HIVs in the published table. Knowledge 4
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
I know all voter registration listsKnowledge 2
Knowledge 1
I know that HIV is a permanent sensitive value.Knowledge 3
I can deduce that p1 and p6 cannot be linked to HIV.
PID HIV?p1
p2
p3
p4
p5
p6
Proof by contradiction.Suppose p1 is linked to HIV.
YesNoNo
There are TWO HIVs in the published table. Knowledge 4
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
I know all voter registration listsKnowledge 2
Knowledge 1
I know that HIV is a permanent sensitive value.Knowledge 3
I can deduce that p1 and p6 cannot be linked to HIV.
PID HIV?p1
p2
p3
p4
p5
p6
Proof by contradiction.Suppose p1 is linked to HIV.
YesNoNoNoNo
There are TWO HIVs in the published table. Knowledge 4
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
I know all voter registration listsKnowledge 2
Knowledge 1
I know that HIV is a permanent sensitive value.Knowledge 3
I can deduce that p1 and p6 cannot be linked to HIV.
PID HIV?p1
p2
p3
p4
p5
p6
Proof by contradiction.Suppose p1 is linked to HIV.
YesNoNoNoNoNo
There are TWO HIVs in the published table. Knowledge 4
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
I know all voter registration listsKnowledge 2
Knowledge 1
I know that HIV is a permanent sensitive value.Knowledge 3
I can deduce that p1 and p6 cannot be linked to HIV.
PID HIV?p1
p2
p3
p4
p5
p6
Proof by contradiction.Suppose p1 is linked to HIV.
YesNoNoNoNoNo
There are TWO HIVs in the published table. Knowledge 4
Contradiction!
p1 CANNOT be linked to HIV.
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
I know all voter registration listsKnowledge 2
Knowledge 1
I know that HIV is a permanent sensitive value.Knowledge 3
I can deduce that p1 and p6 cannot be linked to HIV.
PID HIV?p1
p2
p3
p4
p5
p6
Proof by contradiction.Suppose p6 is linked to HIV.
Yes
NoNo
There are TWO HIVs in the published table. Knowledge 4
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
I know all voter registration listsKnowledge 2
Knowledge 1
I know that HIV is a permanent sensitive value.Knowledge 3
I can deduce that p1 and p6 cannot be linked to HIV.
PID HIV?p1
p2
p3
p4
p5
p6
Proof by contradiction.Suppose p6 is linked to HIV.
There are TWO HIVs in the published table. Knowledge 4
Yes
NoNo
NoNo
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
I know all voter registration listsKnowledge 2
Knowledge 1
I know that HIV is a permanent sensitive value.Knowledge 3
I can deduce that p1 and p6 cannot be linked to HIV.
PID HIV?p1
p2
p3
p4
p5
p6
Proof by contradiction.Suppose p6 is linked to HIV.
There are TWO HIVs in the published table. Knowledge 4
Yes
NoNo
NoNo
No
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
I know all voter registration listsKnowledge 2
Knowledge 1
I know that HIV is a permanent sensitive value.Knowledge 3
I can deduce that p1 and p6 cannot be linked to HIV.
PID HIV?p1
p2
p3
p4
p5
p6
Proof by contradiction.Suppose p6 is linked to HIV.
There are TWO HIVs in the published table. Knowledge 4
Contradiction!
p6 CANNOT be linked to HIV.
Yes
NoNo
NoNo
No
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
I know all voter registration listsKnowledge 2
Knowledge 1
I know that HIV is a permanent sensitive value.Knowledge 3
I can deduce that p1 and p6 cannot be linked to HIV.
There are TWO HIVs in the published table. Knowledge 4
I can deduce that p4 MUST be linked to HIV.Privacy breaches! Why
?3-invariance
Problem (m-invariance): At the current time t, we want to generate a tablewhich satisfies the following.
Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/m.
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
Knowledge 1I can deduce that p4 MUST be linked to HIV.
Privacy breaches! Why
?
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Original Medical DataTime = 1
p2 is an HIV-holder.p1 is an HIV-decoy.
p3 is an HIV-decoy.
HIV-decoys (i.e., p1 and p3) are used to reduce the strong linkage between p2 and HIV.
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
Knowledge 1I can deduce that p4 MUST be linked to HIV.
Privacy breaches! Why
?
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Original Medical DataTime = 1
p2 is an HIV-holder.p1 is an HIV-decoy.
p3 is an HIV-decoy.
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
HIV-holder
HIV-decoy
HIV-decoy
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
Knowledge 1I can deduce that p4 MUST be linked to HIV.
Privacy breaches! Why
?
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Original Medical DataTime = 1
p4 is an HIV-holder.
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
HIV-holder
HIV-decoy
HIV-decoy
p4 p6 p5
p5 is an HIV-decoy.
p6 is an HIV-decoy.
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
Knowledge 1I can deduce that p4 MUST be linked to HIV.
Privacy breaches! Why
?
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Original Medical DataTime = 1
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
HIV-holder
HIV-decoy
HIV-decoy
p4 p6 p5
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
Knowledge 1I can deduce that p4 MUST be linked to HIV.
Privacy breaches! Why
?
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Original Medical DataTime = 1
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
HIV-holder
HIV-decoy
HIV-decoy
p4 p6 p5
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
Knowledge 1I can deduce that p4 MUST be linked to HIV.
Privacy breaches! Why
?
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Original Medical DataTime = 1
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
HIV-holder
HIV-decoy
HIV-decoy
p4 p6 p5
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
Knowledge 1I can deduce that p4 MUST be linked to HIV.
Privacy breaches! Why
?
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Original Medical DataTime = 1
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
HIV-holder
HIV-decoy
HIV-decoy
p4 p6 p5
p1 and p6 are in the same cohort.Besides, they are in the same group of the published table at time = 3
Idea: This kind of grouping can lead to privacy breaches. We can protect privacy by avoiding this kind of grouping.
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2 Age Zip
CodeDiseas
e[21,2
5][12k,16
k]HIV
[21,25]
[12k,16k]
Flu
[21,25]
[12k,16k]
Fever
[20,26]
[16k,29k]
Flu
[20,26]
[16k,29k]
HIV
[20,26]
[16k,29k]
Fever
p2 p3 p5
p1 p4 p6
Time = 3
Knowledge 1
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
HIV-holder
HIV-decoy
HIV-decoy
p4 p6 p5Age Zip Code
Disease
[22,25]
[15k,17k]
HIV
[22,25]
[15k,17k]
Flu
[22,25]
[15k,17k]
Fever
[20,26]
[12k,29k]
Flu
[20,26]
[12k,29k]
HIV
[20,26]
[12k,29k]
Fever
p1 p2 p5
p3 p4 p6
Time = 3
3-invariance
3-scarcity
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2
Knowledge 1
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
HIV-holder
HIV-decoy
HIV-decoy
p4 p6 p5Age Zip Code
Disease
[22,25]
[15k,17k]
HIV
[22,25]
[15k,17k]
Flu
[22,25]
[15k,17k]
Fever
[20,26]
[12k,29k]
Flu
[20,26]
[12k,29k]
HIV
[20,26]
[12k,29k]
Fever
p1 p2 p5
p3 p4 p6
Time = 3
3-scarcity
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2
Knowledge 1
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
HIV-holder
HIV-decoy
HIV-decoy
p4 p6 p5
Age Zip Code
Disease
[22,25]
[15k,17k]
HIV
[22,25]
[15k,17k]
Flu
[22,25]
[15k,17k]
Fever
[20,26]
[12k,29k]
Flu
[20,26]
[12k,29k]
HIV
[20,26]
[12k,29k]
Fever
p1 p2 p5
p3 p4 p6
Time = 3
3-scarcity
Probability that an individual is linked to a sensitive value wrt these three tables is at most 1/3.
3. Algorithm Propose an algorithm which follows
the principle Whenever we form one group, choose one member from each cohort
3. Guarantee Theorem:
Our proposed algorithm can generate a table which satisfies the following.
Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/l (i.e., l-scarcity)
4. Experiments Real Data Set (CADRMP)
http://www.hc-sc.gc.ca/dhp-mps/medeff/databasdon/index_e.html
Real hospital database Patient Information (Voter Registration List)
40,478 tuples Medical Record
105,420 tuples Each patient can be linked to multiple diseases
4. Experiments Studies
Privacy Breaches of an existing model m-invariance
Performance of our proposed algorithm
4.1 Privacy Breaches of an existing model Breach Rate
The proportion of tuples with privacy breaches
m-invariance
4.2 Performance of our proposed algorithm Measurements
Computation Cost Relative Average Error
Variations Parameter l (used in l-scarcity) No. of published tables
4.2 Performance of our proposed algorithm
5. Conclusion Sequential Releases
QID values can be updated Sensitive values can be updated
Sensitive Values Permanent Transient
Identify the insufficiency of existing models
Algorithm Experiments
Q&A
4.2 Performance of our proposed algorithm
Cohort 1 Cohort 2 Cohort 3
CI(p1)
CI(p5)
CI(p3)
CI(p4)
CI(p2)p2 p1
p6
HIV-holder
HIV-decoy
HIV-decoy
p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.e.g., (Age, Zip Code) = ([20,26], [29k,33k])
CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)
Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.
Idea: We switch the role of l-1 HIV-deocys from PRESENT individuals to ABSENT individuals
Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).
e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],
[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)
HIV-decoyHIV-buddyHIV-buddy
e.g. p4 (HIV-holder) is absent in this current table.If other HIV-decoys are still present, the adversary can figure out that p4 is an HIV-holder.
HIV-decoy
presentpresentabsent
Case 1: HIV-decoyCase 2: HIV-holder
HIV-buddy
Since one HIV-holder and l-1 HIV decoys become ABSENT together, the adversary cannot figure out who is the REAL HIV-holder.
3. AlgorithmCohort 1
p2
Cohort 2 Cohort 3
p1 p3
p4 p6 p5
HIV-holder
HIV-decoy
HIV-decoy
CI(p6)
CI(p1)
CI(p5)
CI(p3)
CI(p4)
CI(p2) We have just discussed how to update the role
of each individual (i.e., decoy/holder) according to different scenarios when there is a new medical raw data Algorithm: For the first medical raw table,
Use some existing privacy algorithm (e.g., l-diversity) to generate a temporary table T’
Find HIV-holders and HIV-decoys from T’ Construct the cohorts according to HIV-holders/decoys Form containers for each HIV-holder/decoy Generate a published table according to the cohorts
Whenever there is a new medical raw data Update the role of individuals according to different
scenarios Generate some containers (if necessary) Generate a published table according to the cohorts
Repeat pick one container from each Cohort form one group by generalizing all these containersUntil Cohort 1 is empty
Cohort 1 Cohort 2 Cohort 3
HIV-holder
HIV-decoy
HIV-decoy
Algorithm
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
… … …
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …
Time = 1
Medical Data
Published Data
p2 p1 p3
p4 p6 p5
p1 p2 p3
p4 p5 p6
We can make use of some “existing” approaches to generate this table which satisfies 3-diverisity.
Cohort 1 Cohort 2 Cohort 3
HIV-holder
HIV-decoy
HIV-decoy
Algorithm
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
… … …
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …
Time = 1
Medical Data
Published Data
p2 p1 p3
p4 p6 p5We create the container of p2.That is, finding some present individuals (e.g., p7) and some absent individuals (e.g., p8).We can find a generalized QID values which cover the QID values of these individuals and p2.
CI(p6)
CI(p1)
CI(p5)
CI(p3)
CI(p4)
CI(p2)
p1 p2 p3
p4 p5 p6
… Some additional individuals in CI(p1) which are present
Cohort 1 Cohort 2 Cohort 3
HIV-holder
HIV-decoy
HIV-decoy
Algorithm
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
… … …
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …
Time = 1
Medical Data
Published Data
p2 p1 p3
p4 p6 p5We create the container of p2.That is, finding some present individuals (e.g., p7) and some absent individuals (e.g., p8).We can find a generalized QID values which cover the QID values of these individuals and p2.
CI(p6)
CI(p1)
CI(p5)
CI(p3)
CI(p4)
CI(p2)
p1 p2 p3
p4 p5 p6
… Some additional individuals in CI(p2) which are present
…
Cohort 1 Cohort 2 Cohort 3
HIV-holder
HIV-decoy
HIV-decoy
Algorithm
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
… … …
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …
Time = 1
Medical Data
Published Data
p2 p1 p3
p4 p6 p5We create the container of p2.That is, finding some present individuals (e.g., p7) and some absent individuals (e.g., p8).We can find a generalized QID values which cover the QID values of these individuals and p2.
CI(p6)
CI(p1)
CI(p5)
CI(p3)
CI(p4)
CI(p2)
p1 p2 p3
p4 p5 p6
… Some additional individuals in CI(p3) which are present
… …
Cohort 1 Cohort 2 Cohort 3
HIV-holder
HIV-decoy
HIV-decoy
Algorithm
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
… … …[20,2
6][18k,29
k]HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
… … …
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …
Time = 1
Medical Data
Published Data
p2 p1 p3
p4 p6 p5We create the container of p2.That is, finding some present individuals (e.g., p7) and some absent individuals (e.g., p8).We can find a generalized QID values which cover the QID values of these individuals and p2.
CI(p6)
CI(p1)
CI(p5)
CI(p3)
CI(p4)
CI(p2)
p1 p2 p3
p4 p5 p6
…
Some additional individuals in CI(p4), CI(p5) and CI(p6) which are present
… …
… … …
Cohort 1 Cohort 2 Cohort 3
HIV-holder
HIV-decoy
HIV-decoy
p2 p1 p3
p4 p6 p5CI(p6)
CI(p1)
CI(p5)
CI(p3)
CI(p4)
CI(p2)Algorithm
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
… … …[20,2
6][18k,29
k]HIV
[20,26]
[18k,29k]
Flu
… … …… … …
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
… … …[23,2
6][16k,25
k]Flu
[23,26]
[16k,25k]
HIV
… … …… … …
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever… … …
Time = 1
Time = 2
Medical Data
Published Data
Medical Data
p1 p2 p3
p4 p5 p6
… … …
… … …
Published Data
1. Update the role of each individual (i.e., decoy/holder) according to different scenarios
2. Pin some individuals if necessary
p2 p3 p6
… … …
p1 p4 p5
… … …
Cohort 1 Cohort 2 Cohort 3
HIV-holder
HIV-decoy
HIV-decoy
p2 p1 p3
p4 p6 p5CI(p6)
CI(p1)
CI(p5)
CI(p3)
CI(p4)
CI(p2)Algorithm
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
… … …[20,2
6][18k,29
k]HIV
[20,26]
[18k,29k]
Flu
… … …… … …
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
… … …[23,2
6][16k,25
k]Flu
[23,26]
[16k,25k]
HIV
… … …… … …
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever… … …
Time = 1
Time = 2
Medical Data
Published Data
Medical Data
p1 p2 p3
p4 p5 p6
… … …
… … …
Published Data
1. Update the role of each individual (i.e., decoy/holder) according to different scenarios
2. Pin some individuals if necessary
p2 p3 p6
… … …
p1 p4 p5
… … …
Age Zip Code
Disease
[22,25]
[15k,17k]
HIV
[22,25]
[15k,17k]
Flu
… … …[20,2
6][12k,29
k]Flu
[20,26]
[12k,29k]
HIV
… … …… … …
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever… … …
Time = 3
Published Data
p1 p2 p5
… … …
p3 p4 p6
… … …
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
p4 p6 p5
HIV-holder
HIV-decoy
HIV-decoy
p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.e.g., (Age, Zip Code) = ([20,26], [29k,33k])
CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)
Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.
Idea: We find another individual to replace its original role (i.e., an HIV-decoy).
Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).
e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],
[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)
HIV-decoyHIV-buddyHIV-buddy
e.g. p6 suffers from HIV in this current table.p6 loses its functionality as an HIV-decoy.
HIV-decoy
From the adversary’s point of view,the adversary cannot know p6 or p7 is the original HIV-
decoy.
Thus, the role replacement still protects privacy.
presentpresentabsent
This idea is valid when there EXISTS another individual for replacement.If not, then?
e.g. p7 suffers from HIV in some later tables.p7 loses its functionality as an HIV-decoy.
We cannot find other HIV-buddies for replacement.Then, we pin p7.That is, the original HIV value of p7 will be modified/suppressed to a transient value (e.g., Flu).Once it is pinned, it will be acted as an HIV-decoy forever until it disappears.
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
HIV-holder
HIV-decoy
HIV-decoy
p4 p6 p5
Age Zip Code
Disease
[22,25]
[15k,17k]
HIV
[22,25]
[15k,17k]
Flu
[22,25]
[15k,17k]
Fever
[20,26]
[12k,29k]
Flu
[20,26]
[12k,29k]
HIV
[20,26]
[12k,29k]
Fever
p1 p2 p5
p3 p4 p6
Time = 3
We just show a simple case for anonymization.In this case,•Scenario 1: If the individual does not suffer from HIV, s/he will not suffer from HIV in the later published tables.
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Time = 1
Time = 2
Time = 3How should we anonymize when
these individuals may develop a new permanent disease?
HIV
p6 originally is used as an HIV-decoy.Now, it changes its role from an HIV-decoy to an HIV-holder.It loses its functionality to protect other HIV-holders (in Cohort 1).
Idea: We find another individual to replace its original role (i.e., an HIV-decoy).
Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
HIV-holder
HIV-decoy
HIV-decoy
p4 p6 p5
Age Zip Code
Disease
[22,25]
[15k,17k]
HIV
[22,25]
[15k,17k]
Flu
[22,25]
[15k,17k]
Fever
[20,26]
[12k,29k]
Flu
[20,26]
[12k,29k]
HIV
[20,26]
[12k,29k]
Fever
p1 p2 p5
p3 p4 p6
Time = 3
We just show a simple case for anonymization.In this case,•Scenario 2: If an individual is present in an earlier published table, s/he is also present in all later published tables.
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Time = 1
Time = 2
Time = 3How should we anonymize when
some individuals are absent in a later published table. p6 originally is used as an HIV-
decoy.Now, it disappears in this published table.It loses its functionality to protect other HIV-holders (in Cohort 1).
Idea: We find another individual to replace its original role (i.e., an HIV-decoy).
Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.
Age Zip Code
Disease
[21,23]
[12k,17k]
Flu
[21,23]
[12k,17k]
HIV
[21,23]
[12k,17k]
Fever
[20,26]
[18k,29k]
HIV
[20,26]
[18k,29k]
Flu
[20,26]
[18k,29k]
Fever
Time = 1
p1 p2 p3
p4 p5 p6
Age Zip Code
Disease
[20,22]
[12k,29k]
HIV
[20,22]
[12k,29k]
Flu
[20,22]
[12k,29k]
Fever
[23,26]
[16k,25k]
Flu
[23,26]
[16k,25k]
HIV
[23,26]
[16k,25k]
Fever
p2 p3 p6
p1 p4 p5
Time = 2
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
HIV-holder
HIV-decoy
HIV-decoy
p4 p6 p5
Age Zip Code
Disease
[22,25]
[15k,17k]
HIV
[22,25]
[15k,17k]
Flu
[22,25]
[15k,17k]
Fever
[20,26]
[12k,29k]
Flu
[20,26]
[12k,29k]
HIV
[20,26]
[12k,29k]
Fever
p1 p2 p5
p3 p4 p6
Time = 3
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Name PID DiseaseRaymon
dp1 Flu
Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever
Time = 1
Time = 2
Time = 3
Idea: We find another individual to replace its original role (i.e., an HIV-decoy).
Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.
There are other scenarios. e.g., Some individuals who are absent in some earlier published tables are present in this table.
In this talk, we focus on Scenario 1 and Scenario 2.You can find other scenarios in the paper.
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
p4 p6 p5
HIV-holder
HIV-decoy
HIV-decoy
p6 has the QID attributes (Age, Zip Code) = (20, 29000)
Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.
Idea: We find another individual to replace its original role (i.e., an HIV-decoy).
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
p4 p6 p5
HIV-holder
HIV-decoy
HIV-decoy
p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.e.g., (Age, Zip Code) = ([20,26], [29k,33k])
CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)
Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.
Idea: We find another individual to replace its original role (i.e., an HIV-decoy).
Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).
e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],
[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)
HIV-decoyHIV-buddy
present
HIV-buddypresentabsent
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
p4 p6 p5
HIV-holder
HIV-decoy
HIV-decoy
p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.e.g., (Age, Zip Code) = ([20,26], [29k,33k])
CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)
Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.
Idea: We find another individual to replace its original role (i.e., an HIV-decoy).
Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).
e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],
[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)
HIV-decoyHIV-buddyHIV-buddy
e.g. p6 suffers from HIV in this current table.p6 loses its functionality as an HIV-decoy.
HIV-decoy
From the adversary’s point of view,the adversary cannot know p6 or p7 is the original HIV-
decoy.
Thus, the role replacement still protects privacy.
presentpresentabsent
Note that we are updating the role of some individuals (i.e., decoy/ holder/ buddy) for Scenario 1 when there is a new medical raw data (e.g. time=3)
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
p4 p6 p5
HIV-holder
HIV-decoy
HIV-decoy
p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.e.g., (Age, Zip Code) = ([20,26], [29k,33k])
CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)
Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.
Idea: We find another individual to replace its original role (i.e., an HIV-decoy).
Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).
e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],
[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)
HIV-decoyHIV-buddyHIV-buddy
e.g. p6 (HIV-decoy) is absent in this current table.p6 loses its functionality as an HIV-decoy.
HIV-decoy
From the adversary’s point of view,the adversary cannot know p6 or p7 is the original HIV-
decoy.
Thus, the role replacement still protects privacy.
presentpresentabsent
absent
Case 1: HIV-decoyCase 2: HIV-holder
Note that we are updating the role of some individuals (i.e., decoy/ holder/ buddy) for Scenario 2 when there is a new medical raw data (e.g. time=3)
Cohort 1
p2
Cohort 2 Cohort 3
p1 p3
p4 p6 p5
HIV-holder
HIV-decoy
HIV-decoy
CI(p6)
Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.
Idea: We find another individual to replace its original role (i.e., an HIV-decoy).
CI(p1)
CI(p5)
CI(p3)
CI(p4)
CI(p2)
3. AlgorithmCohort 1
p2
Cohort 2 Cohort 3
p1 p3
p4 p6 p5
HIV-holder
HIV-decoy
HIV-decoy
CI(p6)
CI(p1)
CI(p5)
CI(p3)
CI(p4)
CI(p2)
We have just discussed how to update the role of each individual (i.e., decoy/holder) according to different scenarios when there is a new medical raw data Algorithm:
For the first medical raw table, Construct the cohorts with some methods Generate a published table according to the cohorts
Whenever there is a new medical raw data Update the role of individuals according to different
scenarios Generate some containers (if necessary) Generate a published table according to the cohorts
Repeat pick one container from each Cohort form one group by generalizing all these containersUntil Cohort 1 is empty
3. Multiple Diseases We just consider that each
individual is linked to one disease We can extend to handle that each
individual is linked to multiple diseases