Post on 10-Nov-2014
description
A Lifecycle Approach to Information Privacy
Prepared for:
New Directions in the Science of Differential Privacy
March 2013
A Lifecycle Approach to Information Privacy
Micah Altman<Micah_Altman@alumni.brown.edu>
Director of Research, MIT LibrariesNon-Resident Senior Fellow, Brookings Institution
Collaborators*
A Lifecycle Approach to Information Privacy
• Privacy Tools for Sharing Research Data Project:Edo Airoldi, Stephen Chong, Merce Crosas, Cynthia Dwork Gary King, Phil Malone, Latanya Sweeney, Salil Vadhan
• Research SupportThanks to, the National Science Foundation (award
1237235), the Sloan Foundation and the Massachusetts Institute of Technology, & Harvard University.
* And co-conspirators
A Lifecycle Approach to Information Privacy
Related WorkReprints available from:
micahaltman.com
• Comments on ANPRM: Human Subjects Protection, http://dataprivacylab.org/projects/irb/Vadhan.pdf
• Privacy tools project proposal:http://privacytools.seas.harvard.edu/full-project-description
A Lifecycle Approach to Information Privacy
Overarching challenges• Law is evolving
– specification of technical requirements– new legal concepts – “Right to be forgotten”
• Research is changing– evidence base shifting:
reliant on big data, transactional data, new forms of data– conduct of research distributive, collaborative, multi-
institutional, multi-national– Infrastructure is changing:
cloud & distributed third-party computation & storage• privacy analysis is advancing
– new computational privacy solution concepts– new findings from reidentification experiments– new methods for estimating utility/privacy tradeoffs
A Lifecycle Approach to Information Privacy
Shifting social science evidence base How to deidentify without destroying utility? • The “Netflix Problem”: large, sparse datasets that overlap can be probabilistically
linked [see Narayan and Shmatikov 2008]• The “GIS problem”: fine geo-spatial-temporal data very difficult to mask,
when correlated with external data [see Zimmerman & Pavlik 2008; Zan et al, 2013; Srivatsa & Hicks 2013]
• The “Facebook Problem”: Possible to identify masked network data, if only a few nodes controlled. [see Backstrom, et. al 2007]
• The “Blog problem” : Pseudononymous communication can be linked through textual analysis [see Novak,, Raghavan, and Tomkins 2004]
Source: [Calberese 2008; Real Time Rome Project 2007]
A Lifecycle Approach to Information Privacy
CUSP aims for the the Leading Edge
• Urban Informatics – high-velocity localized social science
• Leading edge data –sensors, crowd-sourcing
• Leading edge privacy needs –privacy policy,privacy award information management,privacy ethics
A Lifecycle Approach to Information Privacy
Data InputOutput Approach
Published Outputs
* Jones * * 1961 021*
* Jones * * 1961 021*
* Jones * * 1972 9404*
* Jones * * 1972 9404*
* Jones * * 1972 9404*
Modal Practice“The correlation between X and Y was large and
statistically significant”
Summary statistics
Contingency table
Public use sample microdata
Information Visualization
A Lifecycle Approach to Information Privacy
Questions Generated from Data I/O Model Solution Concepts
• Comparison of risks across concepts
• Extension of solution concepts range
Processing Stage
• How to apply DP to new analytic methods?– Bayesian methods– Data mining methods– Text analysis methods
• How to apply DP to different types of “Microdata”
– Network data– Text– Geospatial traces– Relations
Disclosure Deterministic ProbabilisticIndividual Record Linkage
K-anonymityReidentification probability
Group attributes
K-anonymity + heterogeneity (e.g. l-diversity
Threat analysisSDC on skewed magnitude tables
Individual Attributes
Attribute disclosure Differential privacyDistributional privacyBayesian-optimal privacy
specified columns/rows
Private Multiparty Computation
Questions about transformation– Imputation methods– Computation efficiency– Informational utility*
See for example:- Dwork & Smith 2009
* “My, what a large ε you have, grandma!”
A Lifecycle Approach to Information Privacy
Information Life Cycle Model
Creation/Collection
Storage/Ingest
Processing
Internal SharingAnalysis
External dissemination/publica
tion
Re-use
Long-term access
A Lifecycle Approach to Information Privacy
Legal/Policy FrameworksContract Intellectual Property
Access Rights Confidentiality
Copyright
Fair Use
DMCA
Database Rights
Moral Rights
Intellectual Attribution
Trade Secret
Patent
Trademark
Common Rule45 CFR 26
HIPAA
FERPA EU Privacy DirectivePrivacy Torts
(Invasion, Defamation)
Rights of Publicity
Sensitive but Unclassified
Potentially Harmful
(Archeological Sites,
Endangered Species, Animal
Testing, …)
Classified
FOIA
CIPSEA
State Privacy Laws
EAR
State FOI Laws
Journal Replication
Requirements
Funder Open Access
Contract
License
Click-WrapTOU
ITAR
Export Restrictions
Questions Generated by Lifecycle Model
A Lifecycle Approach to Information Privacy
• Which laws apply to each stage:– are legal requirements consistent across
stages?• How to align legal instruments:
– consent forms, SLA, DUA’s • Optimizing privacy risk/utility/cost across
the research stages…when is it more efficient to…
– apply disclosure limitation at data collection stage?
– Use particular solution concepts at particular stages
– Harmonize concepts/treatments across stages
• Policy design– Policies to internalize future / public
stakeholder needs– Policy equilibrium under different privacy
solution concepts• Information reuse
– Bayesian priors– Scientific verification and replication
• Infrastructure needs– Data acquisition, storage, dissemination– Identification, authorization, authentication– Metadata, protocols
Creation/Collection
Storage/Ingest
Processing
Internal SharingAnalysis
External dissemination/pub
lication
Re-use
Long-term
access
Research methods
Data ManagementSystems
Legal / Policy Frameworks∂
∂
Statistical / Computational
Frameworks
A Lifecycle Approach to Information Privacy
Questions on Differential Privacy from Information Lifecycle Analysis: Legal
• Legal requirements -- when does law … – require exact answers? (DP does not give exact answers) – give safe harbor if linkages are ‘only’ probabilistic? (DP provides safe harbor in
this case)– require action based on “actual knowledge”? (How do we include strongly
informative priors in DP? When is DP not actually “worst case”?) – require analysis of a specific unit of observation? (DP does not give answers
for individual units.)– require balance of privacy and utility (DP does not inherently balance, but
uses minimax – maximizes utility subject to given privacy constraint. What is appropriate choice of privacy constraint?_
• Legal instruments -- how to describe DP protections in a legally coherent way for …– service level agreements– consent/deposit terms– data usage agreements
A Lifecycle Approach to Information Privacy
Questions on Differential Privacy from Information Lifecycle Analysis: System Design
• System design: potential increased implementation cost of DP:– Information security -- hardening– Information security – certification & auditing– Model server development, provisioning, maintenance, reliability, availability
• System design: information security tradeoffs of DP… Interactive systems have larger vulnerability:
– Availability risks: denial of service attack– Availability/integrity risks: privacy budget exhaustion attacks– Integrity risks: modification of delivered results (e.g. man-in-the-middle attacks)– Secrecy/privacy: breach of authentication/authorization layer
• System design: optimizing privacy & utility across lifecycle– When does limiting disclosive data collection (e.g. using randomized response, group aggregated
methods) dominate applying DP to data analysis stage– When does restricted virtual data enclaves + public synthetic data dominate public DP queries (of
same type)• System design: Information reuse
– How do you incorporate informative priors in DP privacy solution concept? (When does the “Terry Gross” problem apply?)
– What’s required for ensuring scientific replication/verification of results produced by differentially private model servers?
– How to do DP query on confidential data linked with externally provided microdata?
A Lifecycle Approach to Information Privacy
Questions on Differential Privacy from Information Lifecycle Analysis: Policy Design
• Policy design: “market failures” for privacy goods– Is their a market failure, how do we know?– What is the nature of the market failure:
• Conditions on market structure/market power: Barriers to entry? Natural monopoly/network effect? First-mover advantage, path dependency?
• Conditions on goods: excludability, rivalry, externality• Conditions on exchange: transaction costs, agency problems, bounded
rationality, or informational asymmetry
• Policy design: policy equlibria– When does enforcing a specific privacy concept yield socially optimal
solution?– When is DP a prisoner’s dilemma?
(E.g. I contribute to a database for a small payment, since my unilateral entry does note effect result, but equilibrium is that database is largeand you learn substantially more about me than if it database was small.)
A Lifecycle Approach to Information Privacy
Urban Instrumentation and Confidentiality
Specific data source• Administrative records• Transactions• Traffic• Health• Mobile phones• Microenvironment• Crowdsource
Possible nosy questions…
• Were you fined?• What did you buy?• Where were you?• Are you sick?• How rich are you?• Do you have meth lab?
Categories• Infrastructure• Environment• People• Community – self-identified
neighborhood, school district, voting precinct, election district, police beat, crime locations, grocery prices, produce availability
Privacy implications• Business confidentiality• Security & safety – infrastructure
chokepoints; police coverage; endangered species; animal testing labs; environmental hazards
• Personal privacy
A Lifecycle Approach to Information Privacy
Law
Social Science
Public Policy
Data Collection Methods(Research
Methodology)
Data Management(Information Science)
Statistics
Computer Science
• Privacy-aware data-management systems
• Methods for confidential data collection and management
Interdisciplinary Research Required
Law
Social Science
Public Policy
Research Methodolog
y
Information Science
Statistics
Computer Science
• Creative-Commons-like modular license plugins for privacy data use; consent; terms of service
• Model legislation – for modern privacy concepts• Privacy requirements taxonomy and
classification• Game theoretic/social-choice models of social
privacy equilibria under different privacy policies
A Lifecycle Approach to Information Privacy
References• Backstrom, Lars, Cynthia Dwork, and Jon Kleinberg. "Wherefore art thou r3579x?: anonymized social
networks, hidden patterns, and structural steganography." Proceedings of the 16th international conference on World Wide Web. ACM, 2007
• C. Dwork, A. Smith, 2009, “Differential Privacy for Statistics: What we Know and What we Want to Learn “, Journal of Privacy and Confidentiality (2009) 1(2) 135–154
• Narayanan, Arvind, and Vitaly Shmatikov. "Robust de-anonymization of large sparse datasets." Security and Privacy, 2008. SP 2008. IEEE Symposium on. IEEE, 2008.
• Novak, Jasmine, Prabhakar Raghavan, and Andrew Tomkins. "Anti-aliasing on the web." Proceedings of the 13th international conference on World Wide Web. ACM, 2004.
• M Srivatsa and Mhi cks. 2012. Deanonymizing mobility traces: using social network as a side-channel. In Proceedings of the 2012 ACM conference on Computer and communications security (CCS '12). ACM, New York, NY, USA, 628-637. DOI=10.1145/2382196.2382262 http://doi.acm.org/10.1145/2382196.2382262
• Bin Zan, Zhanbo Sun, Macro Gruteser, and Xuegang Ban. 2013. Linking anonymous location traces through driving characteristics. In Proceedings of the third ACM conference on Data and application security and privacy (CODASPY '13). ACM, New York, NY, USA, 293-300. DOI=10.1145/2435349.2435391 http://doi.acm.org/10.1145/2435349.2435391
• Zimmerman, D. L., Pavlik, C. (2008). Quantifying the effects of mask metadata disclosure and multiple releases on the confidentiality of geographically masked health data. Geographical Analysis 40.1, 52 (25).
Discussion
Personal Web: micahaltman.com
Privacy Tools for Sharing Research Data:privacytools.seas.harvard.edu/
E-mail: micah_altman@alumni.brown.edu
Twitter: @drmaltman