Statistical disclosure limitation: Balancing data confidentiality and data access.
-
Upload
melissa-corcoran -
Category
Documents
-
view
219 -
download
0
Transcript of Statistical disclosure limitation: Balancing data confidentiality and data access.
Statistical disclosure limitation: Balancing data confidentiality
and data access
• Enables evidence based policy-making• Informs the general public on local and
national concerns• Advances scientific research • Trains students in data analysis and decision
making
Access to high quality data is vital
Breach of confidentiality:• May violate laws (e.g., CIPSEA, HIPAA)• Undermine broadly held and highly valued
ethical principles• May lead data providers to withhold
important information or refuse to participate in research
Protecting confidentiality of data is essential
• HIPAA Privacy Rule - “Safe harbor”
- Statistical standard- Limited data sets
• 2008: Delaware Cancer Registry vs. press - Public’s desire to learn cancer sites - State requirement to protect privacy - New legislation
Example of tension between access and confidentiality
• De-identification Strip unique identifiers like names, addresses, and tax IDs from shared files.
• Reducing potential for re-identifcation Seemingly innocuous information may reveal individual identities and information
Protecting confidentiality while providing access
• “De-identification”Original data name abcdefghijkl Name deleted abcdefghijkl
• “Re-identification” Shared data abcdefghijkl Other data abcdefmnop name
Where:a = Day, month, year of birth d = Countyb = Gender e = Occupationc = State of residence f = Race
Example: Re-identification by matching
• Advances in statistical analysis and the collection of more detailed data enable researchers and policy makers to ask refined questions
• Enormous amounts of individual-level data are collected, processed, widely distributed … and linkable.
• Better matching technologies enable linkages
Better data – opportunities and problems
• Personal information available on the Internet, from private sources, and government surveys
• Individuals with the right skills and resources could link this personal information to publicly available data:–MIT student re-identifies Massachusetts
governor– NIH scientists express caution in making
genetic information available
Problems – a closer look
Statisticians:• Develop ways to identify risk of confidentiality
breaches• Develop methods for providing safe access to
confidential data• Conduct research on providing safe access to
emerging, complex data types
Statisticians can help find a satisfactory balance
General strategies for data protection:
• Modify data content Remove or alter sensitive or identifying values, and provide unrestricted access to modified data (e.g., public use files)
• Control data accessUse technology and training to reduce chances of breaches, limit who can access the confidential data, the conditions under which the data can be accessed, and the purposes for which the data can be used
Useful data can be shared and protected
• Eliminate variables (geography)• Aggregate sensitive data (age, income)• Add random variation to numerical data values• Exchange some values between selected
records • Replace sensitive data with values simulated
from statistical models estimated with the original data
Modified data: General techniques
• Methods can be applied to all or some cases with varying degrees
• Wider application of methods improves confidentiality protection, but…
• …degrades usefulness of data • Statisticians measure the tradeoffs between
disclosure risk and analytic/policy priorities
Key features of modified data
• Restricted data enclaves (Census, NCHS)• Remote access systems (NCHS, NORC)• Licensing (NCES, BLS, )• Online tabulations/analysis (Census, NCHS,
NCES)
Restricted access increasingly provided - examples
• Safe projects: Authorized projects, typically with data use agreement
• Safe people: Approved analysts from authorized institutions; trained in confidentiality issues
• Safe sites: Use actively monitored by data custodians
• Safe outputs: Data products subject to statistical and confidentiality review
=> Analysts have use of detailed data but do not “own” them which permits manipulations not possible with publicly available data
Key features of restricted access
• Data access and data confidentiality are intimately connected
• Statisticians play a central role in improving data usefulness while protecting data confidentiality
• Statisticians in government, academia, and industry can provide guidance to policy-makers on key issues related to privacy and confidentiality
Summary
• ASA Statement on Data Access and Personal Privacy http://www.amstat.org/news/statementondataaccess.cfm
• ASA’s Privacy and Confidentiality Committee http://www.amstat.org/committees/commdetails/cfm?txtComm=CCNPRO02
• ASA’s Privacy, Data Security and Confidentiality Websitehttp://www.amstat.org/committee/pc/index.html
• OMB/FCSM Report on Statistical Disclosure Limitation Methodology
http://www.fcsm.gov/working-papers/spwp22.html
• Expanding Access to Research Data: Reconciling Risks and Opportunities http://books.nap.edu/catalog.php?record_id=11434
Further information
American Statistical Association 732 N. Washington StreetAlexandria, Virginia 22314
703.684.1221http://www.amstat.org