Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1,...

87
Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1 , Ada Wai-Chee Fu 1 , Raymond Chi-Wing Wong 2 , Lei Chen 2 , Jiuyong Li 3 The Chinese University of Hong Kong 1 The Hong Kong University of Science and Technology 2 University of South Australia 3 Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1,...

Privacy Preserving Serial Data Publishing By Role Composition

Yingyi Bu1, Ada Wai-Chee Fu1, Raymond Chi-Wing Wong2,Lei Chen2, Jiuyong Li3

The Chinese University of Hong Kong1

The Hong Kong University of Science and Technology2 University of South Australia3

Prepared by Raymond Chi-Wing WongPresented by Raymond Chi-Wing Wong

Outline

1. Sequential Releases2. Existing Privacy Models

m-invariance Privacy breaches

3. Our Proposed Privacy Model l-scarcity

4. Experiments5. Conclusion

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Time = 1

Release the data set to public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data This table satisfies some privacy requirements(e.g., m-invariance)

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Time = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Release the data set to publicHospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

This table satisfies some privacy requirements(e.g., m-invariance)

Insertions, deletions and updates

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Time = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 3

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

This table satisfies some privacy requirements(e.g., m-invariance)

Insertions, deletions and updates

1. Sequential Releases

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Time = 1

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 2

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Time = 3

Hospital

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical Data

Public

Name PID Disease

Raymond p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Published Data

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

2. Existing Privacy Models

1. Byun et al., “Secure Anonymization for Incremental datasets”, Secure Data Management, 2006

2. Fung et al, “Anonymity for Continuous Data Publishing”, EDBT, 2008

3. Xiao et al, “m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007

Considers insertions onlyDoes not consider deletions and updates

Considers insertions onlyDoes not consider deletions and updates

Considers insertions and deletions onlyDoes not consider updates

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together

Updates cannot simply be regarded as “a deletion and then an insertion” when privacy is considered.

2. Existing Privacy Models

Sensitive Diseases Transient diseases

Permanent diseases

e.g., If an individual is linked to flu in a published table, s/he can be linked to flu or not in the later published table.

curable E.g. flu, fever

incurable E.g., HIV

e.g., If an individual is linked to HIV in a published table, s/he MUST be linked to HIV in the later published table (that they exist in).

We are the first to study these two kinds of sensitive values.

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

2. Existing Privacy Models

1. Byun et al., “Secure Anonymization for Incremental datasets”, Secure Data Management, 2006

2. Fung et al, “Anonymity for Continuous Data Publishing”, EDBT, 2008

3. Xiao et al, “m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007

Considers insertions onlyDoes not consider deletions and updates

Considers insertions onlyDoes not consider deletions and updates

Considers insertions and deletions onlyDoes not consider updates

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together

Does not consider transient/permanent values

Also considers transient/permanent values

Contributions:We consider a more realistic setting of sequential releases. •Insertions, deletions and updates•Transient/permanent valuesWe cannot simply adapt these existing privacy models to this realistic setting.

2. Existing Privacy Models

1. Byun et al., “Secure Anonymization for Incremental datasets”, Secure Data Management, 2006

2. Fung et al, “Anonymity for Continuous Data Publishing”, EDBT, 2008

3. Xiao et al, “m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007

Problem (m-invariance): At the current time t, we want to generate a tablewhich satisfies the following.

Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/m.

Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t

Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together

Problem (l-scarcity): At the current time t, we want to generate a tablewhich satisfies the following.

Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/l.

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Fever

Alice p4 26 18310 HIV

Bob p5 25 25000 Flu

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Name PID Age Zip Code

Disease

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Fever

Alice p4 26 18310 HIV

Bob p5 25 25000 Flu

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Release the data set to public

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Fever

Alice p4 26 18310 HIV

Bob p5 25 25000 Flu

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code

Disease

23 16355 Flu

22 15500 HIV

21 12900 Fever

26 18310 HIV

25 25000 Flu

20 29000 Fever

Medical Data + Some Useful Attributes

Release the data set to public

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Fever

Alice p4 26 18310 HIV

Bob p5 25 25000 Flu

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Medical Data + Some Useful Attributes

Release the data set to public

Generalization

3-diversity

Each individual is linked to “HIV” with probability at most 1/3 in THIS PUBLISHED TABLE3-diversity only focuses on ONE-TIME publishing

3-invariance focuses on MULTIPLE-TIME publishingIt also makes use of the idea of 3-diversity

Idea:

Each individual is linked to “HIV” with probability at most 1/3 with respect to MULTIPLE PUBLISHED TABLES

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Fever

Alice p4 26 18310 HIV

Bob p5 25 25000 Flu

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Medical Data + Some Useful Attributes

Release the data set to public

3-invariance

Time = 1

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Fever

Alice p4 26 18310 HIV

Bob p5 25 25000 Flu

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Release the data set to public

3-invariance

Time = 1

Time = 1

PID Signature

p1

p2

p3

p4

p5

p6

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

p1 p2 p3

p4 p5 p6

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Fever

Alice p4 26 18310 HIV

Bob p5 25 25000 Flu

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Release the data set to public

3-invariance

Time = 1

Time = 1

PID Signature

p1

p2

p3

p4

p5

p6

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Fever

Alice p4 26 18310 HIV

Bob p5 25 25000 Flu

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Release the data set to public

3-invariance

Time = 1

Time = 1

PID Signature

p1

p2

p3

p4

p5

p6

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Fever

Alice p4 26 18310 HIV

Bob p5 25 25000 Flu

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 1

Time = 1

PID Signature

p1

p2

p3

p4

p5

p6

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Fever

Alice p4 26 18310 HIV

Bob p5 25 25000 Flu

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 1

Time = 1

PID Signature

p1

p2

p3

p4

p5

p6

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

{Flu, HIV, Fever}

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Fever

Alice p4 26 18310 HIV

Bob p5 25 25000 Flu

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 1

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Fever

Alice p4 26 18310 HIV

Bob p5 25 25000 Flu

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 1

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 25000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 25000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

Medical Data + Some Useful Attributes

PID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

This table satisfies 3-invariance.

This is because each individual is linked to the SAME signature.

p2 p3 p6

p1 p4 p5

Idea of 3-invariance:

Each individual is linked to the SAME signature in each published table.

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 25000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

PID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Time = 2

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 25000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 25000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] FeverPID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Time = 2

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 25000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 25000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 2

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 15000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 15000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 15000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 15000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code

Disease

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

Medical Data + Some Useful Attributes

This table satisfies 3-invariance.

This is because each individual is linked to the SAME signature.

p2 p3 p5

p1 p4 p6

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 15000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 15000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code

Disease

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

Time = 3

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 15000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 15000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[21,25] [12k,16k] HIV

[21,25] [12k,16k] Flu

[21,25] [12k,16k] Fever

[20,26] [16k,29k] Flu

[20,26] [16k,29k] HIV

[20,26] [16k,29k] Fever

Time = 3

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 15000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 15000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

PID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[21,25] [12k,16k] HIV

[21,25] [12k,16k] Flu

[21,25] [12k,16k] Fever

[20,26] [16k,29k] Flu

[20,26] [16k,29k] HIV

[20,26] [16k,29k] Fever

Time = 3

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 15000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 15000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[21,25] [12k,16k] HIV

[21,25] [12k,16k] Flu

[21,25] [12k,16k] Fever

[20,26] [16k,29k] Flu

[20,26] [16k,29k] HIV

[20,26] [16k,29k] Fever

Time = 3

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 15000

John p6 20 29000

… … …

David p|RL| 31 31000

Public

Hospital

Voter Registration List

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Medical DataName PID Age Zip

CodeDiseas

e

Raymond p1 23 16355 Flu

Peter p2 22 15500 HIV

Mary p3 21 12900 Flu

Alice p4 26 18310 HIV

Bob p5 25 15000 Fever

John p6 20 29000 Fever

Medical Data + Some Useful Attributes

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Release the data set to public

3-invariance

Time = 3

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[21,25] [12k,16k] HIV

[21,25] [12k,16k] Flu

[21,25] [12k,16k] Fever

[20,26] [16k,29k] Flu

[20,26] [16k,29k] HIV

[20,26] [16k,29k] Fever

Time = 3

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[21,23] [12k,17k] Flu

[21,23] [12k,17k] HIV

[21,23] [12k,17k] Fever

[20,26] [18k,29k] HIV

[20,26] [18k,29k] Flu

[20,26] [18k,29k] Fever

Time = 1

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[20,22] [12k,29k] HIV

[20,22] [12k,29k] Flu

[20,22] [12k,29k] Fever

[23,26] [16k,25k] Flu

[23,26] [16k,25k] HIV

[23,26] [16k,25k] Fever

Time = 2

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code Disease

[21,25] [12k,16k] HIV

[21,25] [12k,16k] Flu

[21,25] [12k,16k] Fever

[20,26] [16k,29k] Flu

[20,26] [16k,29k] HIV

[20,26] [16k,29k] Fever

Time = 3

PID

Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1 Age Zip

CodeDiseas

e

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p1 p2 p3

p4 p5 p6

p2 p3 p6

p1 p4 p5

p2 p3 p5

p1 p4 p6

Time = 3

PID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

3-invariance

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

PID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

3-invariance

I know all voter registration lists

Knowledge 2

Knowledge 1

Name PID Age

Zip Code

Raymond

p1 23 16355

Peter p2 22 15500

Mary p3 21 12900

Alice p4 26 18310

Bob p5 25 25000

John p6 20 29000

… … …

David p|RL| 31 31000

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

PID Signature

p1 {Flu, HIV, Fever}

p2 {Flu, HIV, Fever}

p3 {Flu, HIV, Fever}

p4 {Flu, HIV, Fever}

p5 {Flu, HIV, Fever}

p6 {Flu, HIV, Fever}

3-invariance

I know all voter registration lists

Knowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.

Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

There are TWO HIVs in the published table.

Knowledge 4

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration lists

Knowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.

Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?

p1

p2

p3

p4

p5

p6

Proof by contradiction.

Suppose p1 is linked to HIV.

Yes

No

NoThere are TWO HIVs in the published table.

Knowledge 4

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration lists

Knowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.

Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?

p1

p2

p3

p4

p5

p6

Proof by contradiction.

Suppose p1 is linked to HIV.

Yes

No

NoNo

No

There are TWO HIVs in the published table.

Knowledge 4

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration lists

Knowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.

Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?

p1

p2

p3

p4

p5

p6

Proof by contradiction.

Suppose p1 is linked to HIV.

Yes

No

NoNo

NoNo

There are TWO HIVs in the published table.

Knowledge 4

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration lists

Knowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.

Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?

p1

p2

p3

p4

p5

p6

Proof by contradiction.

Suppose p1 is linked to HIV.

Yes

No

NoNo

NoNo

There are TWO HIVs in the published table.

Knowledge 4

Contradiction!

p1 CANNOT be linked to HIV.

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration lists

Knowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.

Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?

p1

p2

p3

p4

p5

p6

Proof by contradiction.

Suppose p6 is linked to HIV.

Yes

No

No

There are TWO HIVs in the published table.

Knowledge 4

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration lists

Knowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.

Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?

p1

p2

p3

p4

p5

p6

Proof by contradiction.

Suppose p6 is linked to HIV.

There are TWO HIVs in the published table.

Knowledge 4

Yes

No

No

No

No

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration lists

Knowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.

Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?

p1

p2

p3

p4

p5

p6

Proof by contradiction.

Suppose p6 is linked to HIV.

There are TWO HIVs in the published table.

Knowledge 4

Yes

No

No

No

No

No

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration lists

Knowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.

Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

PID HIV?

p1

p2

p3

p4

p5

p6

Proof by contradiction.

Suppose p6 is linked to HIV.

There are TWO HIVs in the published table.

Knowledge 4

Contradiction!

p6 CANNOT be linked to HIV.

Yes

No

No

No

No

No

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

I know all voter registration lists

Knowledge 2

Knowledge 1

I know that HIV is a permanent sensitive value.

Knowledge 3

I can deduce that p1 and p6 cannot be linked to HIV.

There are TWO HIVs in the published table.

Knowledge 4

I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?3-invariance

Problem (m-invariance): At the current time t, we want to generate a tablewhich satisfies the following.

Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/m.

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1

I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Original Medical Data

Time = 1

p2 is an HIV-holder.p1 is an HIV-decoy.

p3 is an HIV-decoy.

HIV-decoys (i.e., p1 and p3) are used to reduce the strong linkage between p2 and HIV.

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1

I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Original Medical Data

Time = 1

p2 is an HIV-holder.p1 is an HIV-decoy.

p3 is an HIV-decoy.

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1

I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Original Medical Data

Time = 1

p4 is an HIV-holder.

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

p5 is an HIV-decoy.

p6 is an HIV-decoy.

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1

I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Original Medical Data

Time = 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1

I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Original Medical Data

Time = 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1

I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Original Medical Data

Time = 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1

I can deduce that p4 MUST be linked to HIV.

Privacy breaches! Why

?

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Original Medical Data

Time = 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

p1 and p6 are in the same cohort.Besides, they are in the same group of the published table at time = 3

Idea: This kind of grouping can lead to privacy breaches.

We can protect privacy by avoiding this kind of grouping.

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2 Age Zip

CodeDiseas

e

[21,25]

[12k,16k]

HIV

[21,25]

[12k,16k]

Flu

[21,25]

[12k,16k]

Fever

[20,26]

[16k,29k]

Flu

[20,26]

[16k,29k]

HIV

[20,26]

[16k,29k]

Fever

p2 p3 p5

p1 p4 p6

Time = 3

Knowledge 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

[22,25]

[15k,17k]

Fever

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

[20,26]

[12k,29k]

Fever

p1 p2 p5

p3 p4 p6

Time = 3

3-invariance

3-scarcity

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2

Knowledge 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

[22,25]

[15k,17k]

Fever

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

[20,26]

[12k,29k]

Fever

p1 p2 p5

p3 p4 p6

Time = 3

3-scarcity

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2

Knowledge 1

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

[22,25]

[15k,17k]

Fever

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

[20,26]

[12k,29k]

Fever

p1 p2 p5

p3 p4 p6

Time = 3

3-scarcity

Probability that an individual is linked to a sensitive value wrt these three tables is at most 1/3.

3. Algorithm

Propose an algorithm which follows the principle Whenever we form one group, choose one member from each cohort

3. Guarantee

Theorem: Our proposed algorithm can generate a table which satisfies the following.

Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/l (i.e., l-scarcity)

4. Experiments

Real Data Set (CADRMP) http://www.hc-sc.gc.ca/dhp-mps/

medeff/databasdon/index_e.html Real hospital database

Patient Information (Voter Registration List) 40,478 tuples

Medical Record 105,420 tuples Each patient can be linked to multiple diseases

4. Experiments

Studies Privacy Breaches of an existing model

m-invariance Performance of our proposed

algorithm

4.1 Privacy Breaches of an existing model Breach Rate

The proportion of tuples with privacy breaches

m-invariance

4.2 Performance of our proposed algorithm

Measurements Computation Cost Relative Average Error

Variations Parameter l (used in l-scarcity) No. of published tables

4.2 Performance of our proposed algorithm

5. Conclusion Sequential Releases

QID values can be updated Sensitive values can be updated

Sensitive Values Permanent Transient

Identify the insufficiency of existing models

Algorithm Experiments

Q&A

4.2 Performance of our proposed algorithm

Cohort 1 Cohort 2 Cohort 3

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)p2 p1

p6

HIV-holder

HIV-decoy

HIV-decoy

p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.

e.g., (Age, Zip Code) = ([20,26], [29k,33k])

CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.

Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We switch the role of l-1 HIV-deocys from PRESENT individuals to ABSENT individuals

Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.

1. At least one of these individuals is present (in the medical table).

2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).

e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],

[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)

HIV-decoyHIV-buddyHIV-buddy

e.g. p4 (HIV-holder) is absent in this current table.If other HIV-decoys are still present, the adversary can figure out that p4 is an HIV-holder.

HIV-decoy

present

present

absent

Case 1: HIV-decoy

Case 2: HIV-holder

HIV-buddy

Since one HIV-holder and l-1 HIV decoys become ABSENT together, the adversary cannot figure out who is the REAL HIV-holder.

3. AlgorithmCohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2) We have just discussed how to update the role

of each individual (i.e., decoy/holder) according to different scenarios when there is a new medical raw data Algorithm: For the first medical raw table,

Use some existing privacy algorithm (e.g., l-diversity) to generate a temporary table T’

Find HIV-holders and HIV-decoys from T’ Construct the cohorts according to HIV-holders/decoys Form containers for each HIV-holder/decoy Generate a published table according to the cohorts

Whenever there is a new medical raw data Update the role of individuals according to different

scenarios Generate some containers (if necessary) Generate a published table according to the cohorts

Repeat

pick one container from each Cohort

form one group by generalizing all these containers

Until Cohort 1 is empty

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

… … …

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

… … …

Time = 1

Medical Data

Published Data

p2 p1 p3

p4 p6 p5

p1 p2 p3

p4 p5 p6

We can make use of some “existing” approaches to generate this table which satisfies 3-diverisity.

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

… … …

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

… … …

Time = 1

Medical Data

Published Data

p2 p1 p3

p4 p6 p5

We create the container of p2.That is, finding some present individuals (e.g., p7) and some absent individuals (e.g., p8).

We can find a generalized QID values which cover the QID values of these individuals and p2.

CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)

p1 p2 p3

p4 p5 p6

…Some additional individuals in CI(p1) which are present

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

… … …

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

… … …

Time = 1

Medical Data

Published Data

p2 p1 p3

p4 p6 p5

We create the container of p2.That is, finding some present individuals (e.g., p7) and some absent individuals (e.g., p8).

We can find a generalized QID values which cover the QID values of these individuals and p2.

CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)

p1 p2 p3

p4 p5 p6

…Some additional individuals in CI(p2) which are present

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

… … …

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

… … …

Time = 1

Medical Data

Published Data

p2 p1 p3

p4 p6 p5

We create the container of p2.That is, finding some present individuals (e.g., p7) and some absent individuals (e.g., p8).

We can find a generalized QID values which cover the QID values of these individuals and p2.

CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)

p1 p2 p3

p4 p5 p6

…Some additional individuals in CI(p3) which are present

… …

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

… … …

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

… … …

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

… … …

Time = 1

Medical Data

Published Data

p2 p1 p3

p4 p6 p5

We create the container of p2.That is, finding some present individuals (e.g., p7) and some absent individuals (e.g., p8).

We can find a generalized QID values which cover the QID values of these individuals and p2.

CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)

p1 p2 p3

p4 p5 p6

Some additional individuals in CI(p4), CI(p5) and CI(p6) which are present

… …

… … …

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

p2 p1 p3

p4 p6 p5CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

… … …

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

… … …

… … …

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

… … …

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

… … …

… … …

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

… … …

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

… … …

Time = 1

Time = 2

Medical Data

Published Data

Medical Data

p1 p2 p3

p4 p5 p6

… … …

… … …

Published Data

1. Update the role of each individual (i.e., decoy/holder) according to different scenarios

2. Pin some individuals if necessary

p2 p3 p6

… … …

p1 p4 p5

… … …

Cohort 1 Cohort 2 Cohort 3

HIV-holder

HIV-decoy

HIV-decoy

p2 p1 p3

p4 p6 p5CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)Algorithm

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

… … …

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

… … …

… … …

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

… … …

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

… … …

… … …

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

… … …

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

… … …

Time = 1

Time = 2

Medical Data

Published Data

Medical Data

p1 p2 p3

p4 p5 p6

… … …

… … …

Published Data

1. Update the role of each individual (i.e., decoy/holder) according to different scenarios

2. Pin some individuals if necessary

p2 p3 p6

… … …

p1 p4 p5

… … …

Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

… … …

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

… … …

… … …

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

… … …

Time = 3

Published Data

p1 p2 p5

… … …

p3 p4 p6

… … …

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.

e.g., (Age, Zip Code) = ([20,26], [29k,33k])

CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.

Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.

1. At least one of these individuals is present (in the medical table).

2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).

e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],

[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)

HIV-decoyHIV-buddyHIV-buddy

e.g. p6 suffers from HIV in this current table.p6 loses its functionality as an HIV-decoy.

HIV-decoy

From the adversary’s point of view,the adversary cannot know p6 or p7 is the original HIV-

decoy.

Thus, the role replacement still protects privacy.

present

present

absent

This idea is valid when there EXISTS another individual for replacement.

If not, then?

e.g. p7 suffers from HIV in some later tables.p7 loses its functionality as an HIV-decoy.

We cannot find other HIV-buddies for replacement.

Then, we pin p7.That is, the original HIV value of p7 will be modified/suppressed to a transient value (e.g., Flu).Once it is pinned, it will be acted as an HIV-decoy forever until it disappears.

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

[22,25]

[15k,17k]

Fever

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

[20,26]

[12k,29k]

Fever

p1 p2 p5

p3 p4 p6

Time = 3

We just show a simple case for anonymization.In this case,•Scenario 1: If the individual does not suffer from HIV, s/he will not suffer from HIV in the later published tables.

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Time = 1

Time = 2

Time = 3

How should we anonymize when these individuals may develop a new permanent disease?

HIV

p6 originally is used as an HIV-decoy.Now, it changes its role from an HIV-decoy to an HIV-holder.

It loses its functionality to protect other HIV-holders (in Cohort 1).

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

[22,25]

[15k,17k]

Fever

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

[20,26]

[12k,29k]

Fever

p1 p2 p5

p3 p4 p6

Time = 3

We just show a simple case for anonymization.In this case,•Scenario 2: If an individual is present in an earlier published table, s/he is also present in all later published tables.

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Time = 1

Time = 2

Time = 3

How should we anonymize when some individuals are absent in a later published table. p6 originally is used as an HIV-

decoy.Now, it disappears in this published table.

It loses its functionality to protect other HIV-holders (in Cohort 1).

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.

Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Age Zip Code

Disease

[21,23]

[12k,17k]

Flu

[21,23]

[12k,17k]

HIV

[21,23]

[12k,17k]

Fever

[20,26]

[18k,29k]

HIV

[20,26]

[18k,29k]

Flu

[20,26]

[18k,29k]

Fever

Time = 1

p1 p2 p3

p4 p5 p6

Age Zip Code

Disease

[20,22]

[12k,29k]

HIV

[20,22]

[12k,29k]

Flu

[20,22]

[12k,29k]

Fever

[23,26]

[16k,25k]

Flu

[23,26]

[16k,25k]

HIV

[23,26]

[16k,25k]

Fever

p2 p3 p6

p1 p4 p5

Time = 2

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

HIV-holder

HIV-decoy

HIV-decoy

p4 p6 p5

Age Zip Code

Disease

[22,25]

[15k,17k]

HIV

[22,25]

[15k,17k]

Flu

[22,25]

[15k,17k]

Fever

[20,26]

[12k,29k]

Flu

[20,26]

[12k,29k]

HIV

[20,26]

[12k,29k]

Fever

p1 p2 p5

p3 p4 p6

Time = 3

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Fever

Alice p4 HIV

Bob p5 Flu

John p6 Fever

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Name PID Disease

Raymond

p1 Flu

Peter p2 HIV

Mary p3 Flu

Alice p4 HIV

Bob p5 Fever

John p6 Fever

Time = 1

Time = 2

Time = 3

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.

Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

There are other scenarios. e.g., Some individuals who are absent in some earlier published tables are present in this table.

In this talk, we focus on Scenario 1 and Scenario 2.You can find other scenarios in the paper.

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

p6 has the QID attributes (Age, Zip Code) = (20, 29000)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.

Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.

e.g., (Age, Zip Code) = ([20,26], [29k,33k])

CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.

Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.

1. At least one of these individuals is present (in the medical table).

2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).

e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],

[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)

HIV-decoyHIV-buddy

present

HIV-buddy

present

absent

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.

e.g., (Age, Zip Code) = ([20,26], [29k,33k])

CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.

Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.

1. At least one of these individuals is present (in the medical table).

2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).

e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],

[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)

HIV-decoyHIV-buddyHIV-buddy

e.g. p6 suffers from HIV in this current table.p6 loses its functionality as an HIV-decoy.

HIV-decoy

From the adversary’s point of view,the adversary cannot know p6 or p7 is the original HIV-

decoy.

Thus, the role replacement still protects privacy.

present

present

absent

Note that we are updating the role of some individuals (i.e., decoy/ holder/ buddy) for Scenario 1 when there is a new medical raw data (e.g. time=3)

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

p6 is replaced with a container CI(p6) where the QID attributes of this container (Age, Zip Code) cover p6’s QID attributes.

e.g., (Age, Zip Code) = ([20,26], [29k,33k])

CI(p6)p6’s QID attributes (Age, Zip Code) = (20, 29000)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.

Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

Since the container has broader ranges of QID values, this container covers some ADDITIONAL individuals which are NOT HIV-holders and HIV-decoys.

1. At least one of these individuals is present (in the medical table).

2. At least one of these individuals is absent (in the medical table) (but we can find these individuals in the Voter Registration List).

e.g.,Container CI(p6) (Age, Zip Code) = ([20,26],

[29k,33k]) 1. p6 (Age, Zip Code) = (20, 29000)2. p7 (Age, Zip Code) = (25, 33000)3. p8 (Age, Zip Code) = (26, 30000)

HIV-decoyHIV-buddyHIV-buddy

e.g. p6 (HIV-decoy) is absent in this current table.p6 loses its functionality as an HIV-decoy.

HIV-decoy

From the adversary’s point of view,the adversary cannot know p6 or p7 is the original HIV-

decoy.

Thus, the role replacement still protects privacy.

present

present

absent

absent

Case 1: HIV-decoy

Case 2: HIV-holder

Note that we are updating the role of some individuals (i.e., decoy/ holder/ buddy) for Scenario 2 when there is a new medical raw data (e.g. time=3)

Cohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

CI(p6)

Scenario 1: Some individuals who do not suffer from HIV in some earlier published tables suffer from HIV in this table.

Scenario 2: Some individuals who are present in some earlier published tables are absent in this table.

Idea: We find another individual to replace its original role (i.e., an HIV-decoy).

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)

3. AlgorithmCohort 1

p2

Cohort 2 Cohort 3

p1 p3

p4 p6 p5

HIV-holder

HIV-decoy

HIV-decoy

CI(p6)

CI(p1)

CI(p5)

CI(p3)

CI(p4)

CI(p2)

We have just discussed how to update the role of each individual (i.e., decoy/holder) according to different scenarios when there is a new medical raw data Algorithm:

For the first medical raw table, Construct the cohorts with some methods Generate a published table according to the cohorts

Whenever there is a new medical raw data Update the role of individuals according to different

scenarios Generate some containers (if necessary) Generate a published table according to the cohorts

Repeat

pick one container from each Cohort

form one group by generalizing all these containers

Until Cohort 1 is empty

3. Multiple Diseases

We just consider that each individual is linked to one disease

We can extend to handle that each individual is linked to multiple diseases