Privacy Preserving Serial Data Publishing By Role Composition

87
Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1 , Ada Wai-Chee Fu 1 , Raymond Chi-Wing Wong 2 , Lei Chen 2 , Jiuyong Li 3 The Chinese University of Hong Kong 1 The Hong Kong University of Science and Technology 2 University of South Australia 3 Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong

description

Privacy Preserving Serial Data Publishing By Role Composition. Yingyi Bu 1 , Ada Wai-Chee Fu 1 , Raymond Chi-Wing Wong 2 , Lei Chen 2 , Jiuyong Li 3 The Chinese University of Hong Kong 1 The Hong Kong University of Science and Technology 2 University of South Australia 3. - PowerPoint PPT Presentation

Transcript of Privacy Preserving Serial Data Publishing By Role Composition

Page 1: Privacy Preserving Serial Data Publishing By Role Composition

Privacy Preserving Serial Data Publishing By Role Composition

Yingyi Bu1, Ada Wai-Chee Fu1, Raymond Chi-Wing Wong2,Lei Chen2, Jiuyong Li3

The Chinese University of Hong Kong1

The Hong Kong University of Science and Technology2 University of South Australia3

Prepared by Raymond Chi-Wing WongPresented by Raymond Chi-Wing Wong

Page 2: Privacy Preserving Serial Data Publishing By Role Composition

Outline1. Sequential Releases2. Existing Privacy Models

m-invariance Privacy breaches

3. Our Proposed Privacy Model l-scarcity

4. Experiments5. Conclusion

Page 3: Privacy Preserving Serial Data Publishing By Role Composition

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

PublicTime = 1

Release the data set to public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data This table satisfies some privacy requirements(e.g., m-invariance)

Page 4: Privacy Preserving Serial Data Publishing By Role Composition

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

PublicTime = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Release the data set to publicHospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

PublicName PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

This table satisfies some privacy requirements(e.g., m-invariance)

Insertions, deletions and updates

Page 5: Privacy Preserving Serial Data Publishing By Role Composition

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

PublicTime = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

PublicName PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 3

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

PublicName PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

This table satisfies some privacy requirements(e.g., m-invariance)

Insertions, deletions and updates

Page 6: Privacy Preserving Serial Data Publishing By Role Composition

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

PublicTime = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

PublicName PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 3

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

PublicName PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

Page 7: Privacy Preserving Serial Data Publishing By Role Composition

2. Existing Privacy Models1. Byun et al., “Secure Anonymization for

Incremental datasets”, Secure Data Management, 2006

2. Fung et al, “Anonymity for Continuous Data Publishing”, EDBT, 2008

3. Xiao et al, “m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007

Considers insertions onlyDoes not consider deletions and updates

Considers insertions onlyDoes not consider deletions and updatesConsiders insertions and deletions onlyDoes not consider updates

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together

Updates cannot simply be regarded as “a deletion and then an insertion” when privacy is considered.

Page 8: Privacy Preserving Serial Data Publishing By Role Composition

2. Existing Privacy Models Sensitive Diseases

Transient diseases

Permanent diseases

e.g., If an individual is linked to flu in a published table, s/he can be linked to flu or not in the later published table.

curable E.g. flu, fever

incurable E.g., HIV

e.g., If an individual is linked to HIV in a published table, s/he MUST be linked to HIV in the later published table (that they exist in).

We are the first to study these two kinds of sensitive values.

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

Page 9: Privacy Preserving Serial Data Publishing By Role Composition

2. Existing Privacy Models1. Byun et al., “Secure Anonymization for

Incremental datasets”, Secure Data Management, 2006

2. Fung et al, “Anonymity for Continuous Data Publishing”, EDBT, 2008

3. Xiao et al, “m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007

Considers insertions onlyDoes not consider deletions and updates

Considers insertions onlyDoes not consider deletions and updatesConsiders insertions and deletions onlyDoes not consider updates

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together

Does not consider transient/permanent values

Also considers transient/permanent values

Contributions:We consider a more realistic setting of sequential releases. •Insertions, deletions and updates•Transient/permanent valuesWe cannot simply adapt these existing privacy models to this realistic setting.

Page 10: Privacy Preserving Serial Data Publishing By Role Composition

2. Existing Privacy Models1. Byun et al., “Secure Anonymization for

Incremental datasets”, Secure Data Management, 2006

2. Fung et al, “Anonymity for Continuous Data Publishing”, EDBT, 2008

3. Xiao et al, “m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007

Problem (m-invariance): At the current time t, we want to generate a tablewhich satisfies the following.

Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/m.

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together

Problem (l-scarcity): At the current time t, we want to generate a tablewhich satisfies the following.

Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/l.

Page 11: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Name PID Age Zip Code

Disease

Raymond p1 23 16355 FluPeter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Release the data set to public

Page 12: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code

Disease

23 16355 Flu22 15500 HIV21 12900 Fever26 18310 HIV25 25000 Flu20 29000 Fever

Medical Data + Some Useful Attributes

Release the data set to public

Page 13: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Medical Data + Some Useful Attributes

Release the data set to public

Generalization

3-diversity

Each individual is linked to “HIV” with probability at most 1/3 in THIS PUBLISHED TABLE3-diversity only focuses on ONE-TIME publishing

3-invariance focuses on MULTIPLE-TIME publishingIt also makes use of the idea of 3-diversityIdea:Each individual is linked to “HIV” with probability at most 1/3 with respect to MULTIPLE PUBLISHED TABLES

Page 14: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Medical Data + Some Useful Attributes

Release the data set to public

3-invariance

Time = 1

Page 15: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Release the data set to public

3-invariance

Time = 1 Time =

1PID Signaturep1

p2

p3

p4

p5

p6

{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}

p1 p2 p3

p4 p5 p6

Page 16: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Release the data set to public

3-invariance

Time = 1 Time =

1PID Signaturep1

p2

p3

p4

p5

p6

{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}

Page 17: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Release the data set to public

3-invariance

Time = 1 Time =

1PID Signaturep1

p2

p3

p4

p5

p6

{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}

Page 18: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 1

Time = 1

PID Signaturep1

p2

p3

p4

p5

p6

{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}

Page 19: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 1

Time = 1

PID Signaturep1

p2

p3

p4

p5

p6

{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}{Flu, HIV, Fever}

Page 20: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 1

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Page 21: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FeverAlice p4 26 18310 HIVBob p5 25 25000 FluJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 1

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Page 22: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

Page 23: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

Medical Data + Some Useful Attributes

PID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

This table satisfies 3-invariance.This is because each individual is linked to the SAME signature.

p2 p3 p6

p1 p4 p5

Idea of 3-invariance: Each individual is linked to the SAME signature in each published table.

Page 24: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

PID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

Time = 2

Page 25: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

Page 26: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] FeverPID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

Time = 2

Page 27: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Page 28: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 25000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Page 29: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

Page 30: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

Age Zip Code

Disease

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

Medical Data + Some Useful Attributes

This table satisfies 3-invariance.This is because each individual is linked to the SAME signature.

p2 p3 p5

p1 p4 p6

Page 31: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

Age Zip Code

Disease

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

Time = 3

Page 32: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

Age Zip Code Disease

[21,25] [12k,16k] HIV

[21,25] [12k,16k] Flu

[21,25] [12k,16k] Fever

[20,26] [16k,29k] Flu

[20,26] [16k,29k] HIV

[20,26] [16k,29k] Fever

Time = 3

Page 33: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

Age Zip Code Disease

[21,25] [12k,16k] HIV

[21,25] [12k,16k] Flu

[21,25] [12k,16k] Fever

[20,26] [16k,29k] Flu

[20,26] [16k,29k] HIV

[20,26] [16k,29k] Fever

Time = 3

Page 34: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[21,25] [12k,16k] HIV

[21,25] [12k,16k] Flu

[21,25] [12k,16k] Fever

[20,26] [16k,29k] Flu

[20,26] [16k,29k] HIV

[20,26] [16k,29k] Fever

Time = 3

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Page 35: Privacy Preserving Serial Data Publishing By Role Composition

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 15000John p6 20 29000

… … …David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Medical DataName PID Age Zip

CodeDiseas

eRaymond p1 23 16355 Flu

Peter p2 22 15500 HIVMary p3 21 12900 FluAlice p4 26 18310 HIVBob p5 25 15000 FeverJohn p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[21,25] [12k,16k] HIV

[21,25] [12k,16k] Flu

[21,25] [12k,16k] Fever

[20,26] [16k,29k] Flu

[20,26] [16k,29k] HIV

[20,26] [16k,29k] Fever

Time = 3

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Page 36: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[21,25] [12k,16k] HIV

[21,25] [12k,16k] Flu

[21,25] [12k,16k] Fever

[20,26] [16k,29k] Flu

[20,26] [16k,29k] HIV

[20,26] [16k,29k] Fever

Time = 3

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1 Age Zip

CodeDiseas

e[20,2

2][12k,29

k]HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p1 p2 p3

p4 p5 p6

p2 p3 p6

p1 p4 p5

p2 p3 p5

p1 p4 p6

Time = 3

PID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

3-invariance

Page 37: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

PID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

3-invariance I know all voter registration lists

Knowledge 2

Knowledge 1

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500Mary p3 21 12900Alice p4 26 18310Bob p5 25 25000John p6 20 29000

… … …David p|RL| 31 31000

Page 38: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

PID Signaturep1 {Flu, HIV,

Fever}p2 {Flu, HIV,

Fever}p3 {Flu, HIV,

Fever}p4 {Flu, HIV,

Fever}p5 {Flu, HIV,

Fever}p6 {Flu, HIV,

Fever}

3-invariance I know all voter registration lists

Knowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

There are TWO HIVs in the published table. Knowledge 4

Page 39: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration listsKnowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?p1

p2

p3

p4

p5

p6

Proof by contradiction.Suppose p1 is linked to HIV.

YesNoNo

There are TWO HIVs in the published table. Knowledge 4

Page 40: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration listsKnowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?p1

p2

p3

p4

p5

p6

Proof by contradiction.Suppose p1 is linked to HIV.

YesNoNoNoNo

There are TWO HIVs in the published table. Knowledge 4

Page 41: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration listsKnowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?p1

p2

p3

p4

p5

p6

Proof by contradiction.Suppose p1 is linked to HIV.

YesNoNoNoNoNo

There are TWO HIVs in the published table. Knowledge 4

Page 42: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration listsKnowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?p1

p2

p3

p4

p5

p6

Proof by contradiction.Suppose p1 is linked to HIV.

YesNoNoNoNoNo

There are TWO HIVs in the published table. Knowledge 4

Contradiction!

p1 CANNOT be linked to HIV.

Page 43: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration listsKnowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?p1

p2

p3

p4

p5

p6

Proof by contradiction.Suppose p6 is linked to HIV.

Yes

NoNo

There are TWO HIVs in the published table. Knowledge 4

Page 44: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration listsKnowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?p1

p2

p3

p4

p5

p6

Proof by contradiction.Suppose p6 is linked to HIV.

There are TWO HIVs in the published table. Knowledge 4

Yes

NoNo

NoNo

Page 45: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration listsKnowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?p1

p2

p3

p4

p5

p6

Proof by contradiction.Suppose p6 is linked to HIV.

There are TWO HIVs in the published table. Knowledge 4

Yes

NoNo

NoNo

No

Page 46: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration listsKnowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?p1

p2

p3

p4

p5

p6

Proof by contradiction.Suppose p6 is linked to HIV.

There are TWO HIVs in the published table. Knowledge 4

Contradiction!

p6 CANNOT be linked to HIV.

Yes

NoNo

NoNo

No

Page 47: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration listsKnowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

There are TWO HIVs in the published table. Knowledge 4

I can deduce that p4 MUST be linked to HIV.Privacy breaches! Why

?3-invariance

Problem (m-invariance): At the current time t, we want to generate a tablewhich satisfies the following.

Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/m.

Page 48: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Original Medical DataTime = 1

p2 is an HIV-holder.p1 is an HIV-decoy.

p3 is an HIV-decoy.

HIV-decoys (i.e., p1 and p3) are used to reduce the strong linkage between p2 and HIV.

Page 49: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Original Medical DataTime = 1

p2 is an HIV-holder.p1 is an HIV-decoy.

p3 is an HIV-decoy.

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

Page 50: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Original Medical DataTime = 1

p4 is an HIV-holder.

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

p5 is an HIV-decoy.

p6 is an HIV-decoy.

Page 51: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Original Medical DataTime = 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Page 52: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Original Medical DataTime = 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Page 53: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Original Medical DataTime = 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Page 54: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Original Medical DataTime = 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

p1 and p6 are in the same cohort.Besides, they are in the same group of the published table at time = 3

Idea: This kind of grouping can lead to privacy breaches. We can protect privacy by avoiding this kind of grouping.

Page 55: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e[21,2

5][12k,16

k]HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

[22,25]

[15k,17k]

Fever

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

[20,26]

[12k,29k]

Fever

p1 p2 p5

p3 p4 p6

Time = 3

3-invariance

3-scarcity

Page 56: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2

Knowledge 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

[22,25]

[15k,17k]

Fever

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

[20,26]

[12k,29k]

Fever

p1 p2 p5

p3 p4 p6

Time = 3

3-scarcity

Page 57: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2

Knowledge 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

[22,25]

[15k,17k]

Fever

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

[20,26]

[12k,29k]

Fever

p1 p2 p5

p3 p4 p6

Time = 3

3-scarcity

Probability that an individual is linked to a sensitive value wrt these three tables is at most 1/3.

Page 58: Privacy Preserving Serial Data Publishing By Role Composition

3. Algorithm Propose an algorithm which follows

the principle Whenever we form one group, choose one member from each cohort

Page 59: Privacy Preserving Serial Data Publishing By Role Composition

3. Guarantee Theorem:

Our proposed algorithm can generate a table which satisfies the following.

Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/l (i.e., l-scarcity)

Page 60: Privacy Preserving Serial Data Publishing By Role Composition

4. Experiments Real Data Set (CADRMP)

http://www.hc-sc.gc.ca/dhp-mps/medeff/databasdon/index_e.html

Real hospital database Patient Information (Voter Registration List)

40,478 tuples Medical Record

105,420 tuples Each patient can be linked to multiple diseases

Page 61: Privacy Preserving Serial Data Publishing By Role Composition

4. Experiments Studies

Privacy Breaches of an existing model m-invariance

Performance of our proposed algorithm

Page 62: Privacy Preserving Serial Data Publishing By Role Composition

4.1 Privacy Breaches of an existing model Breach Rate

The proportion of tuples with privacy breaches

m-invariance

Page 63: Privacy Preserving Serial Data Publishing By Role Composition

4.2 Performance of our proposed algorithm Measurements

Computation Cost Relative Average Error

Variations Parameter l (used in l-scarcity) No. of published tables

Page 64: Privacy Preserving Serial Data Publishing By Role Composition

4.2 Performance of our proposed algorithm

Page 65: Privacy Preserving Serial Data Publishing By Role Composition

5. Conclusion Sequential Releases

QID values can be updated Sensitive values can be updated

Sensitive Values Permanent Transient

Identify the insufficiency of existing models

Algorithm Experiments

Page 66: Privacy Preserving Serial Data Publishing By Role Composition

Q&A

Page 67: Privacy Preserving Serial Data Publishing By Role Composition

4.2 Performance of our proposed algorithm

Page 68: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1 Cohort 2 Cohort 3

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)p2 p1

p6

HIV-holder

HIV-decoy

HIV-decoy

p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.e.g., (Age, Zip Code) = ([20,26], [29k,33k])

CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We switch the role of l-1 HIV-deocys from PRESENT individuals to ABSENT individuals

Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).

e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],

[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)

HIV-decoyHIV-buddyHIV-buddy

e.g. p4 (HIV-holder) is absent in this current table.If other HIV-decoys are still present, the adversary can figure out that p4 is an HIV-holder.

HIV-decoy

presentpresentabsent

Case 1: HIV-decoyCase 2: HIV-holder

HIV-buddy

Since one HIV-holder and l-1 HIV decoys become ABSENT together, the adversary cannot figure out who is the REAL HIV-holder.

Page 69: Privacy Preserving Serial Data Publishing By Role Composition

3. AlgorithmCohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2) We have just discussed how to update the role

of each individual (i.e., decoy/holder) according to different scenarios when there is a new medical raw data Algorithm: For the first medical raw table,

Use some existing privacy algorithm (e.g., l-diversity) to generate a temporary table T’

Find HIV-holders and HIV-decoys from T’ Construct the cohorts according to HIV-holders/decoys Form containers for each HIV-holder/decoy Generate a published table according to the cohorts

Whenever there is a new medical raw data Update the role of individuals according to different

scenarios Generate some containers (if necessary) Generate a published table according to the cohorts

Repeat pick one container from each Cohort form one group by generalizing all these containersUntil Cohort 1 is empty

Page 70: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

… … …

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …

Time = 1

Medical Data

Published Data

p2 p1 p3

p4 p6 p5

p1 p2 p3

p4 p5 p6

We can make use of some “existing” approaches to generate this table which satisfies 3-diverisity.

Page 71: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

… … …

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …

Time = 1

Medical Data

Published Data

p2 p1 p3

p4 p6 p5We create the container of p2.That is, finding some present individuals (e.g., p7) and some absent individuals (e.g., p8).We can find a generalized QID values which cover the QID values of these individuals and p2.

CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)

p1 p2 p3

p4 p5 p6

… Some additional individuals in CI(p1) which are present

Page 72: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

… … …

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …

Time = 1

Medical Data

Published Data

p2 p1 p3

p4 p6 p5We create the container of p2.That is, finding some present individuals (e.g., p7) and some absent individuals (e.g., p8).We can find a generalized QID values which cover the QID values of these individuals and p2.

CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)

p1 p2 p3

p4 p5 p6

… Some additional individuals in CI(p2) which are present

Page 73: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

… … …

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …

Time = 1

Medical Data

Published Data

p2 p1 p3

p4 p6 p5We create the container of p2.That is, finding some present individuals (e.g., p7) and some absent individuals (e.g., p8).We can find a generalized QID values which cover the QID values of these individuals and p2.

CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)

p1 p2 p3

p4 p5 p6

… Some additional individuals in CI(p3) which are present

… …

Page 74: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

… … …[20,2

6][18k,29

k]HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

… … …

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …

Time = 1

Medical Data

Published Data

p2 p1 p3

p4 p6 p5We create the container of p2.That is, finding some present individuals (e.g., p7) and some absent individuals (e.g., p8).We can find a generalized QID values which cover the QID values of these individuals and p2.

CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)

p1 p2 p3

p4 p5 p6

Some additional individuals in CI(p4), CI(p5) and CI(p6) which are present

… …

… … …

Page 75: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

p2 p1 p3

p4 p6 p5CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

… … …[20,2

6][18k,29

k]HIV

[20,26]

[18k,29k]

Flu

… … …… … …

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

… … …[23,2

6][16k,25

k]Flu

[23,26]

[16k,25k]

HIV

… … …… … …

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever… … …

Time = 1

Time = 2

Medical Data

Published Data

Medical Data

p1 p2 p3

p4 p5 p6

… … …

… … …

Published Data

1. Update the role of each individual (i.e., decoy/holder) according to different scenarios

2. Pin some individuals if necessary

p2 p3 p6

… … …

p1 p4 p5

… … …

Page 76: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

p2 p1 p3

p4 p6 p5CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

… … …[20,2

6][18k,29

k]HIV

[20,26]

[18k,29k]

Flu

… … …… … …

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

… … …[23,2

6][16k,25

k]Flu

[23,26]

[16k,25k]

HIV

… … …… … …

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever… … …

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever… … …

Time = 1

Time = 2

Medical Data

Published Data

Medical Data

p1 p2 p3

p4 p5 p6

… … …

… … …

Published Data

1. Update the role of each individual (i.e., decoy/holder) according to different scenarios

2. Pin some individuals if necessary

p2 p3 p6

… … …

p1 p4 p5

… … …

Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

… … …[20,2

6][12k,29

k]Flu

[20,26]

[12k,29k]

HIV

… … …… … …

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever… … …

Time = 3

Published Data

p1 p2 p5

… … …

p3 p4 p6

… … …

Page 77: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.e.g., (Age, Zip Code) = ([20,26], [29k,33k])

CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).

e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],

[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)

HIV-decoyHIV-buddyHIV-buddy

e.g. p6 suffers from HIV in this current table.p6 loses its functionality as an HIV-decoy.

HIV-decoy

From the adversary’s point of view,the adversary cannot know p6 or p7 is the original HIV-

decoy.

Thus, the role replacement still protects privacy.

presentpresentabsent

This idea is valid when there EXISTS another individual for replacement.If not, then?

e.g. p7 suffers from HIV in some later tables.p7 loses its functionality as an HIV-decoy.

We cannot find other HIV-buddies for replacement.Then, we pin p7.That is, the original HIV value of p7 will be modified/suppressed to a transient value (e.g., Flu).Once it is pinned, it will be acted as an HIV-decoy forever until it disappears.

Page 78: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

[22,25]

[15k,17k]

Fever

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

[20,26]

[12k,29k]

Fever

p1 p2 p5

p3 p4 p6

Time = 3

We just show a simple case for anonymization.In this case,•Scenario 1: If the individual does not suffer from HIV, s/he will not suffer from HIV in the later published tables.

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Time = 1

Time = 2

Time = 3How should we anonymize when

these individuals may develop a new permanent disease?

HIV

p6 originally is used as an HIV-decoy.Now, it changes its role from an HIV-decoy to an HIV-holder.It loses its functionality to protect other HIV-holders (in Cohort 1).

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.

Page 79: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

[22,25]

[15k,17k]

Fever

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

[20,26]

[12k,29k]

Fever

p1 p2 p5

p3 p4 p6

Time = 3

We just show a simple case for anonymization.In this case,•Scenario 2: If an individual is present in an earlier published table, s/he is also present in all later published tables.

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Time = 1

Time = 2

Time = 3How should we anonymize when

some individuals are absent in a later published table. p6 originally is used as an HIV-

decoy.Now, it disappears in this published table.It loses its functionality to protect other HIV-holders (in Cohort 1).

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Page 80: Privacy Preserving Serial Data Publishing By Role Composition

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

[22,25]

[15k,17k]

Fever

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

[20,26]

[12k,29k]

Fever

p1 p2 p5

p3 p4 p6

Time = 3

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FeverAlice p4 HIVBob p5 FluJohn p6 Fever

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Name PID DiseaseRaymon

dp1 Flu

Peter p2 HIVMary p3 FluAlice p4 HIVBob p5 FeverJohn p6 Fever

Time = 1

Time = 2

Time = 3

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

There are other scenarios. e.g., Some individuals who are absent in some earlier published tables are present in this table.

In this talk, we focus on Scenario 1 and Scenario 2.You can find other scenarios in the paper.

Page 81: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

p6 has the QID attributes (Age, Zip Code) = (20, 29000)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Page 82: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.e.g., (Age, Zip Code) = ([20,26], [29k,33k])

CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).

e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],

[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)

HIV-decoyHIV-buddy

present

HIV-buddypresentabsent

Page 83: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.e.g., (Age, Zip Code) = ([20,26], [29k,33k])

CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).

e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],

[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)

HIV-decoyHIV-buddyHIV-buddy

e.g. p6 suffers from HIV in this current table.p6 loses its functionality as an HIV-decoy.

HIV-decoy

From the adversary’s point of view,the adversary cannot know p6 or p7 is the original HIV-

decoy.

Thus, the role replacement still protects privacy.

presentpresentabsent

Note that we are updating the role of some individuals (i.e., decoy/ holder/ buddy) for Scenario 1 when there is a new medical raw data (e.g. time=3)

Page 84: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.e.g., (Age, Zip Code) = ([20,26], [29k,33k])

CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.1. At least one of these individuals is present (in the medical table). 2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).

e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],

[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)

HIV-decoyHIV-buddyHIV-buddy

e.g. p6 (HIV-decoy) is absent in this current table.p6 loses its functionality as an HIV-decoy.

HIV-decoy

From the adversary’s point of view,the adversary cannot know p6 or p7 is the original HIV-

decoy.

Thus, the role replacement still protects privacy.

presentpresentabsent

absent

Case 1: HIV-decoyCase 2: HIV-holder

Note that we are updating the role of some individuals (i.e., decoy/ holder/ buddy) for Scenario 2 when there is a new medical raw data (e.g. time=3)

Page 85: Privacy Preserving Serial Data Publishing By Role Composition

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

CI(p6)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)

Page 86: Privacy Preserving Serial Data Publishing By Role Composition

3. AlgorithmCohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)

We have just discussed how to update the role of each individual (i.e., decoy/holder) according to different scenarios when there is a new medical raw data Algorithm:

For the first medical raw table, Construct the cohorts with some methods Generate a published table according to the cohorts

Whenever there is a new medical raw data Update the role of individuals according to different

scenarios Generate some containers (if necessary) Generate a published table according to the cohorts

Repeat pick one container from each Cohort form one group by generalizing all these containersUntil Cohort 1 is empty

Page 87: Privacy Preserving Serial Data Publishing By Role Composition

3. Multiple Diseases We just consider that each

individual is linked to one disease We can extend to handle that each

individual is linked to multiple diseases