Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving...
-
Upload
kelemam -
Category
Technology
-
view
979 -
download
3
description
Transcript of Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving...
![Page 1: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/1.jpg)
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices
Nathalie HolmesKhaled El Emam
![Page 2: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/2.jpg)
Workshop Outline
Big Data: Opportunities and Risks in Healthcare De-identification Myths: Fact or Fiction Overview of Terms Used in Anonymization De-identification Maturity Model (DMM) Case
Studies DMM Uses and Benefits
![Page 3: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/3.jpg)
OPPORTUNITIES AND RISKS WITH BIG DATA
How to Successfully Leverage Data While Protecting Individual Privacy
![Page 4: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/4.jpg)
Big Data Tidal Wave is Creating UnforeseenOpportunities and Risks
![Page 5: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/5.jpg)
Organizations with the Right Tools And a Skilled Team will
Come Out on Top
![Page 6: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/6.jpg)
Big Data Opportunities and Risks
A lot of useful data contains personal information about patients, study participants, or consumers The challenge is getting access to the data – addressing the privacy
requirements:- Do you have authority ?
- Is it mandatory or discretionary ?- Do you patient / participant consent ?- Can you anonymize the data
These are the only ways that you get access to the data
![Page 7: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/7.jpg)
Healthcare Breaches
Best evidence suggests at least 27% of healthcare practices have a breach every year The costs for healthcare are $200 per individual for breach notification
(Ponemon) This applies whether you have obtained consent or authority
![Page 8: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/8.jpg)
De-identification is one piece of an enterprise privacy program that can make privacy work
“Privacy by Design” provides helpful best practices
Proactive, Preventative, Embedded and Continuous
![Page 9: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/9.jpg)
De-Identification Facts or Fiction #1
True or False: - It’s possible to re-identify most, if not all, data.
False:- Using robust methods, evidence suggests risk
can be very small.
![Page 10: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/10.jpg)
De-Identification Facts or Fiction #2
True or False: - Privacy regulations say that there must be zero
chance of re-identification in order for a data set to be used for secondary purposes.
False:- HIPAA states that the risk of re-identification
must be “very small”. The FTC and other regulations use a “reasonableness” standard. All of these standards take context into account
![Page 11: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/11.jpg)
De-Identification Facts or Fiction #3
True or False: - Only covered entities should consider HIPAA as
a standard for de-identification.
False:- HIPAA is a good standard to use regardless of
the applicable regulations.
![Page 12: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/12.jpg)
OVERVIEW OF ANONYMIZATION
How to Successfully Leverage Data While Protecting Individual Privacy
![Page 13: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/13.jpg)
PRIVACYANALYTICS.CA
© 2012-2013, Privacy Analytics. All Rights Reserved13 of 76
Balancing Data Privacy Requires Evaluation of Privacy Protection and Data Utility
![Page 14: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/14.jpg)
Balancing Data Privacy
![Page 15: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/15.jpg)
Direct and In-Direct/Quasi-Identifiers
Examples of direct identifiers: Name, address, telephone number, fax number, MRN, health card number, health plan beneficiary number, license plate number, email address, photograph, biometrics, SSN, SIN, implanted device number
Examples of quasi identifiers: sex, date of birth or age, geographic locations (such as postal codes, census geography, information about proximity to known or unique landmarks), language spoken at home, ethnic origin, total years of schooling, marital status, criminal history, total income, visible minority status, profession, event dates
![Page 16: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/16.jpg)
Terminology
![Page 17: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/17.jpg)
A process that removes the association between the identifying data and the data
subject. (Source ISO/TS 25237:2008)
![Page 18: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/18.jpg)
Reducing the risk of identifying a data subject to a very small level through the
application of a set of data transformation techniques without any concern for the
analytics utility of the data.
![Page 19: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/19.jpg)
Removal of fields from a data set
![Page 20: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/20.jpg)
A particular type of anonymization that both removes the association with a data
subject and adds an association between a particular set of characteristics to the data
subject and one or more pseudonyms (Source: ISO/TS 25237:2008)
![Page 21: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/21.jpg)
Replacing a value in the data with a random
value from a large database of possible
values
![Page 22: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/22.jpg)
Data Masking
Data Masking = No analytics on those fields
![Page 23: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/23.jpg)
Reducing the risk of identifying a data subject to a very small level through the application of a set of data transformation techniques such that the
resulting data retains a very high analytics value.
![Page 24: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/24.jpg)
Reducing the precision of a value
to a more general one
![Page 25: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/25.jpg)
The removal of records or values (cells) in the data
![Page 26: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/26.jpg)
Randomly selecting a subset of records or patients from a data set
![Page 27: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/27.jpg)
The motives and capacity of thedata recipient to
re-identify the data
![Page 28: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/28.jpg)
The security and privacy practices
that the data recipient has in place to manage
the data received.
![Page 29: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/29.jpg)
Statistical De-identification
De-identification =High analytical value
![Page 30: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/30.jpg)
RE-IDENTIFICATION RISKS
Risks from Basic Demographics
![Page 31: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/31.jpg)
![Page 32: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/32.jpg)
![Page 33: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/33.jpg)
![Page 34: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/34.jpg)
![Page 35: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/35.jpg)
![Page 36: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/36.jpg)
![Page 37: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/37.jpg)
DE-IDENTIFICATION MATURITY MODEL
How to Successfully Leverage Data While Protecting Individual Privacy
![Page 38: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/38.jpg)
De-identification Maturity Model (DMM)
Formal framework to evaluate maturity of de-identification services within an organization Gauges level of an organization’s readiness and experience in
relation to people, processes, technologies and consistent measurement practices “DMM” used as a measurement tool; enables the enterprise to
implement a grounded strategy based on facts Improves compliance, facilitates access, and scales support services
![Page 39: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/39.jpg)
Three Dimensions of the DMMA
CB
![Page 40: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/40.jpg)
Practice Dimension
DMM has five maturity levels for the de-identification practicesthat an organization has in place Level 1 is lowest level of maturity and level 5 is the highest
level of maturity
Adhoc Masking HeuristicRisk
BasedGovernance
1 2 3 4 5
A
![Page 41: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/41.jpg)
Case Study 1 – Safe Harbor
Organization A is a disease registry They have lots of databases that they connect to and they do a lot of
data releases to internal and external data analysts Practice Dimension (what you do):
- Their primary way of anonymizing data is through following the Safe Harbor de-identification standard (L3)
Implementation Dimension (how well you do it):- There is a clear process and well defined roles for following SH,
which is well documented- Because its documented, it’s repeatable (L3)
![Page 42: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/42.jpg)
Safe HarborSafe Harbor Direct Identifiers and Quasi-identifiers
1. Names2. ZIP Codes (except first
three)3. All elements of dates
(except year)4. Telephone numbers5. Fax numbers6. Electronic mail
addresses7. Social security
numbers8. Medical record
numbers9. Health plan beneficiary
numbers10.Account numbers11.Certificate/license
numbers
12.Vehicle identifiers and serial numbers, including license plate numbers
13.Device identifiers and serial numbers
14.Web Universal Resource Locators (URLs)
15.Internet Protocol (IP) address numbers
16.Biometric identifiers, including finger and voice prints
17.Full face photographic images and any comparable images;
18. Any other unique identifying number, characteristic, or code
Actual Knowledge
![Page 43: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/43.jpg)
Case Study 1 – Safe Harbor
Automation dimension (is it automated)- They use a home grown scripts for implementing SH- The scripts do not have any external validation that they work or are
sufficient (L1) Challenges
- Despite these efforts, they have missed some key items- There have been pressures by analysts to provide more granular
data
![Page 44: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/44.jpg)
Case Study 1 – Safe Harbor
- They have interpreted the SH regulation for dates such that they have only dealt with dates of birth rather than all dates
- They have not brought all zip down to 3, and for regions where there are fewer than 20K people replace with 000 per SH
- Some identifiers were missed (such as clinical trial participant numbers)
- Did not consider the Actual Knowledge requirement in SH
![Page 45: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/45.jpg)
Case Study 2 – Masking
Company B is a claims processor They have a need for realistic data for software testing Practice Dimension (what you do):
- Their primary way of anonymizing is through data masking - This means they deal only with the direct identifiers (L2) Implementation Dimension (how well you do it):
- There is a clear process for doing masking and how they implement heuristics, which is well documented
- Because its documented, it’s repeatable (L3)
![Page 46: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/46.jpg)
Case Study 2 – Masking
Automation dimension (is it automated)- They use a commercial product for masking- This product produces consistent results (L2) Challenges
- Despite these efforts, they have missed some key items – the quasi-identifiers
- Some dates and ZIP codes were not addressed- There is no evidence that the risk of re-identification was “very small”- The tool vendor architect provided assurance that this was OK
![Page 47: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/47.jpg)
Case Study 3 – Governance
Company C is an EMR vendor They have a need to provide reports to their clients on trends and
benchmarks to help clients to improve their businesses Practice Dimension (what you do):
- They have a risk-based approach which includes anonymizing both direct identifiers (masking) and in-direct identifiers (de-identification)
Implementation Dimension (how well you do it):- There is a clear process for anonymizing the data which is well
documented- Because its documented, it’s repeatable
![Page 48: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/48.jpg)
Case Study 3 – Governance
- They have on-going training of staff on how to do the anonymization
- They are able to quickly produce reports and metrics documenting what they did to the data before they released it
- They have automated data sharing agreements which specifies the controls that need to be in place by data users
- They have a full audit trail to demonstrate that the risk of re-identification is “very small” per HIPAA
- They track when there is overlap between the various data sets- Audits are conducted on data users to confirm compliance with
conditions
![Page 49: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/49.jpg)
Case Study 3 – Governance
Automation Dimension (is it automated)- They use commercial software to do masking and de-
identification- The product produces consistent results- They are able to get defensible anonymization more quickly than
by doing it manually- The product has been scrutinized by other users & peers and is
upgraded on a regular basis- They are able to release more data sets, more quickly
![Page 50: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/50.jpg)
Benefits of DMM
Determine whether an organization can defensibly ensure risk of re-identification is “very small” Provides a road map to meet regulatory and legal requirements Automation and governance allow organizations to share more data for
secondary purposes with fewer resources A higher the level of maturity results in higher quality data and greater
consistency in de-identification Significant improvement in ability to estimate resources and time
required to de-identify data sets
![Page 51: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/51.jpg)
PRIVACYANALYTICS.CA© 2012-2013, Privacy Analytics. All Rights Reserved51 of 92
Key Learnings
![Page 52: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/52.jpg)
Data Anonymization Resources
Book Signing: Sept 26,10:35 am Booth # 107
Khaled El Emam & Luk Arbuckle
![Page 53: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/53.jpg)
Other Conference Activities
Session: Facilitating Analytics While Protecting Individual Privacy Using Data De-identification - Khaled El Emam- Thursday , September 26 @ 4:00pm, Salon F Office hours in the Sponsor Pavilion:
- Nathalie Holmes - Thursday, September 26 @ 3:10pm, Table D- Khaled El Emam - Thursday, September 26 @ 6:30pm, Table D
![Page 54: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/54.jpg)
Contact
Nathalie Holmes:[email protected] ext 122
Khaled El Emam:[email protected]
@PrivacyAnalytic
2012 Start-Up Showcase Winner
![Page 55: Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and Improving De-identification Practices](https://reader034.fdocuments.in/reader034/viewer/2022042601/555e7fcdd8b42abd468b4acb/html5/thumbnails/55.jpg)
Review Quiz
What does anonymization mean? What is the difference between data masking and de-identification? Why is it important to strive for balance between privacy and data utility? How many levels of maturity (Practice Dimension) are there in the DMM? Is it possible to be at Practice Dimension 1 (Ad hoc) and score well in the
Implementation Dimension? Ex. Have a repeatable, defined and measurable process? What are some advantages of having Standard Automation (software)? What is the main difference between Practice Dimension 4 (Risk Based) and
Dimension 5 (Governance)?