Privacy-preserving Anonymization of Set Value Data

Manolis TerrovitisInstitute for the Management of Information Systems

(IMIS), RC AthenaNikos Mamoulis

University of Hong Kong (HKU)Panos Kalnis

King Abdullah University of Science and Technology (KAUST)

Motivation

Attacker can see up to m items Any m items No distinction between sensitive and non-sensitive items

0% Milk

Motivation (cont.)

Helen: Beer, 0% Milk, Pregnancy testJohn: Cola, CheeseTom: 2% Milk, Coffee….Mary: Wine, Beer, Full-fat Milk

Database

t1: Beer, 0%Milk, Pregnancy testt2: Cola, Cheeset3: 2% Milk, Coffee….tn: Wine, Beer, Full-fat Milk

Published

AttackerFind all transactions that contain Beer & 0% Milk

t1: Beer, Milk, Pregnancy testt2: Cola, Cheeset3: Milk, Coffee….tn: Wine, Beer, Milk

km-anonymity

,...,,

Set of items

TransactionDatabase

tqsDttres |

kresres 0

mqs Query terms

km-anonymity:

Related Work: K-Anonymity [Swe02]

Age ZipCode Disease42 25000 Flu46 35000 AIDS50 20000 Cancer54 40000 Gastritis48 50000 Dyspepsia56 55000 Bronchitis

[Swe02] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570, 2002.

(a) Microdata

Quasi-identifier

Age ZipCode Disease42-46 25000-35000 Flu42-46 25000-35000 AIDS50-54 20000-40000 Cancer50-54 20000-40000 Gastritis48-56 50000-55000 Dyspepsia48-56 50000-55000 Bronchitis

(a) 2-anonymous microdata

NOT suitable for high-dimensionality

Related Work: L-diversity in Transactions

[GTK08] G. Ghinita, Y. Tao, P. Kalnis, “On the Anonymization of Sparse High-Dimensional Data”, ICDE, 2008

Requires knowledge of (non)-sensitive attributes

Our Approach: Employs Generalization

Aaa 21,

otherwise , node leaf ,0

Information loss

k=2m=2

Lattice of Generalizations

Optimal Algorithm

Q: Q: Q:

Count Tree

1212122 ,,,

,,,,,,,,

baBaAbABbaBA

BAbabat

1 2a 1 1b 1

All generalized forms of the paths reside in the tree We can find easily which anonymizations are needed

Apriori-based Anonymization

Global Optimal vs Local Optimal Solution for each path

We examine the paths By size (A priori principle) Paths with invalid nodes are skipped

Apriori-based Anonymization1. Initialize gen_map2. For i := 1 to m do

1. For all t D do1. Extend t acccording to gen_map2. Add all i-subsets of extended t to

count-tree3. Check all paths in count tree and update

gen_map

Small Datasets (2-15K, BMS-WebView2)

|I|=40..60, k=100, m=3

Small Datasets (BMS-WebView2)

|D|=10K, k=100, m=1..4

Apriori Anonymization for Large Datasets

ec |D| |I|515K 165759K 49777K 3340

k=5 m=3

Points to Remember Anonymization of Transactional Data

Attacker knows m items Any m items can be the quasi-identifier

Global recoding method Optimal solution: too slow Apriori Anonymization: fast and low information

loss Extensions (VLDBJ 2010)

Local recoding (sort by Gray order and partition)

Global recoding (by partitioning the data domain)

Privacy-preserving Anonymization of Set Value Data

Documents

Transcript of Privacy-preserving Anonymization of Set Value Data

Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore.

Towards privacy preserving social recommendation under ...jundongl/paper/Towards privacy preserving social recommendation under personalized privacy settings ... privacy-preserving

SCALABLE AND PRIVACY-PRESERVING DATA INTEGRATION … · PRIVACY-PRESERVING RECORD LINKAGE (PPRL) 14. PRIVACY-PRESERVING RECORD LINKAGE 15 data encryption blocking similarity functions

Privacy-Preserving Matrix Factorization

Towards a Practical Deployment of Privacy-preserving … · Towards a Practical Deployment of Privacy-preserving Crowd-sensing Tasks ... a Practical Deployment of Privacy-preserving

Privacy Preserving Data Mining Lecture 3 Non-Cryptographic Approaches for Preserving Privacy

Privacy-Preserving Data Mining - Yale University · •Policies for privacy-preserving data mining: languages, reconciliation, and enforcement. •Incentive-compatible privacy-preserving

Eﬃcient Privacy-Preserving Face Recognition · privacy-preserving face recognition systems [14]. 3 In this paper we concentrate on eﬃcient privacy-preserving face recognition

Collusion-Tolerable and E cient Privacy-Preserving Time ...downloads.hindawi.com/journals/ijdsn/aip/1341606.pdf · Privacy-Preserving Time-Series Data Aggregation Protocol ... privacy-preserving

A Novel Anonymization Technique for Privacy Preserving Data Publishing

Privacy preserving data anonymization of spontaneous ADE reporting system … · 2017. 8. 23. · spontaneous ADE reporting system dataset ... Empirical evaluation on the real SRS

Privacy-preserving Anonymization of Set Value Data

Anonymization Is Dead – Long Live Privacy - gi

Research Article A Privacy-Preserving Continuous Location ...downloads.hindawi.com/journals/ijdsn/2015/815613.pdfLocation anonymization techniques for LBS that con-sider the movement

Data Privacy: Anonymization & Re-Identification

Accuracy constrained privacy-preserving

A Review on anonymization approach to preserve privacy of ... · A Review on anonymization approach to preserve privacy of . Published data through record elimination . Isha K. Gayki

Privacy-Preserving Sharing of Horizontally-Distributed ... · Non-randomization approaches have been suggested as well. In Sweeney’s papers [30, 34], k-anonymization was proposed

PRIVACY PRESERVING BIG DATA SOCIAL MEDIA ...PRIVACY PRESERVING BIG DATA SOCIAL MEDIA ANALYTICS USING ANONYMIZATION AND RANDOMIZATION TECHNIQUES : A SURVEY 1Andrew.J , Research Scholar,

Smart Grid Privacy via Anonymization of Smart Metering Data