Data Anonymization

Sara Szoc, CrossLangWorkshop

Introduction

Data Anonymization

• Concept

• Methods

• Risks

• Practical tips

What is data anonymization

What ?

• Process of removing private or confidential information from raw data

• Results in anonymous data that cannot be associated with any individual or company

• Protection of identity and private activities

• Financial aspect

• Using anonymization technique(s)

• Selection and assessment based on use case

PersonalData

Personal or identifiable data:

Information that can lead to the identification of an individual (or a group of individuals)

• Direct identifiersperson/company name, surname, email addresscontaining name, phone number, id card/socialsecurity number, medical record number …

• Indirect identifiersdate of birth, gender, zipcode can uniquelyidentify about 80% of the US population

• Pseudonymous or encrypted datacan be used to re-identify a person and thus remains personal data

PersonalData

“Personal data that has been rendered anonymous in such a way that the individual is not or no longer identifiable is no longer considered personal data.

For data to be truly anonymised, the anonymisation must be irreversible.”

(source: General Data Protection Regulation)

SensitiveData

• Sensitive personal data• can cause harm or embarrassment to the

individual

• for limited dissemination onlyracial/ethnic origin, political/religious beliefs, genetic data, biometric data (fingerprints), health information, sexual orientation … (GDPR)

• Sensitive business information• poses a risk to the company in question if

discovered trade secrets, acquisition plans, financial data, supplier and customer information

Structuredversus

unstructureddata

• Structured data• stored in a structured way

• easily searchable

• relational databases, spreadsheets, data in formats such as JSON, XML, CSV …

• Unstructured data• anything else

• difficult to search

• text files, reports, email messages, audio files, images …

Anonymizationmethods

suppression

masking

Before anonymization

After anonymization

classification

Before anonymization

After anonymization

Name Age Location Illness

Luke 39 Belgium Flu

Ashley 57 Belgium Multiple Sclerosis

John 81 Germany Lung cancer

Roman 72 Germany Multiple Sclerosis

perturbation

swapping

Name Age Location Illness

John 40 Brussels Flu

Ashley 56 Antwerp Multiple Sclerosis

Luke 80 Berlin Lung cancer

Roman 71 Munchen Multiple Sclerosis

generalization

Pseudonymization

• Reversible process by using a key

• Still to be treated as personal data because enables re-identification

Name Pseudonymized Anonymized

John q0fdGL xxxxx

Ashley s8fhPd xxxxx

Luke EiuD5j xxxxx

Roman qOerd xxxxx

Luke EiuD5j xxxxx

Measuringanonymization

and risks

• K-anonymity, Differential privacy

• Focus on structured data

Gender Age Location Illness

male 40-50 Belgium Flu

male 40-50 Belgium Multiple Sclerosis

female >50 Germany Lung cancer

female >50 Germany Multiple Sclerosis

2-anonymous data

Existing tools

• Tools for structured data• ARX

• Cornell Anonymization Toolkit

• Tools for unstructured data• MITRE Identification Scrubber Toolkit (MIST)

• Natural Language processing tools (e.g.OpenNLP or Stanford CoreNLP NamedEntity Recognizers)

Practical tips (conclusions)

There is no “one fits all solution”, but different factors need to be taken intoconsideration:

• Analyze nature of data

• Analyze recipients

• Analyze risks (de-anonymization risk management)

• Analyze data utility

• Run anonymization process insideorganization

Data Anonymization - European Commission

Transcript of Data Anonymization - European Commission

Data Anonymization - European Commission

Documents

Transcript of Data Anonymization - European Commission

European Economic Forecast - European Commission

Microdata anonymization considerations

COMMISSION EUROPEAN - European Commission

THE COMMISSION OF THE EUROPEAN ... - European Commission

Edgar Thielmann European Commission June 2009 EUROPEAN COMMISSION European GNSS Programmes Galileo and EGNOS.

IP address anonymization

EUROPEAN COMMISSION - European Parliament

Bridging environmental and fisheries policies · European Commission . European Commission . European Commission . Luropean . European Commission . Title: PowerPoint Presentation

Data Anonymization Professional Certification

Data anonymization

Robust De-anonymization of Large Sparse Datasetsshmat/shmat_oak08netflix.pdf · Robust De-anonymization of Large Sparse Datasets ... case study). Our de-anonymization algorithm isrobust

European Road Safety - European Commission

Ķ COMMISSION EUROPEENNE - European Commission

commission Recommendation - European Commission

GDPR for Magento 2 … · Account Anonymization Settings To configure the account anonymization section, follow the below steps: • Title: Add title for the anonymization section

Data Anonymization (1)

EUROPEAN COMMISSION · EN EN EUROPEAN COMMISSION Brussels, 2.12.2011

Data Privacy and Anonymization

Sven Halle European Commission, DG TREN F2 Single European Sky EUROPEAN COMMISSION.

COMMISSION EUROPEAN