Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

19
Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    3

Transcript of Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Page 1: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Privacy Research Overview

18739A: Foundations of Security and Privacy

Anupam Datta

Fall 2007-08

Page 2: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Privacy Research Space

What is Privacy?[Philosophy, Law, Public Policy]

Formal Model, Policy Language,Compliance-check Algorithms

[Programming Languages, Logic]

Implementation-level Compliance[Software Engg, Formal Methods]

Data Privacy[Databases, Cryptography]

TODAY

TODAY

Next 3 lectures

Page 3: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Philosophical studies on privacy

Reading• Overview article in Stanford Encyclopedia of

Philosophy http://plato.stanford.edu/entries/privacy/

Alan Westin, Privacy and Freedom, 1967 Ruth Gavison, Privacy and the Limits of Law,

1980 Helen Nissenbaum, Privacy as Contextual

Integrity, 2004 (more on Nov 8)

Page 4: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Westin 1967

Privacy and control over information

“Privacy is the claim of individuals, groups or institutions to determine for themselves when, how, and to what extent information about them is communicated to others”

Relevant when you give personal information to a web site; agree to privacy policy posted on web site

May not apply to your personal health information

Page 5: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Gavison 1980

Privacy as limited access to self

“A loss of privacy occurs as others obtain information about an individual, pay attention to him, or gain access to him. These three elements of secrecy, anonymity, and solitude are distinct and independent, but interrelated, and the complex concept of privacy is richer than any definition centered around only one of them.”

Basis for database privacy definition discussed later

Page 6: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Gavison 1980

On utility

“We start from the obvious fact that both perfect privacy and total loss of privacy are undesirable. Individuals must be in some intermediate state – a balance between privacy and interaction …Privacy thus cannot be said to be a value in the sense that the more people have of it, the better.”

This balance between privacy and utility will show up in data privacy as well as in privacy policy languages, e.g. health data could be shared with medical researchers

Page 7: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Privacy Laws in the US

HIPAA (Health Insurance Portability and Accountability Act, 1996)• Protecting personal health information

GLBA (Gramm-Leach-Bliley-Act, 1999)• Protecting personal information held by financial service

institutions

COPPA (Children‘s Online Privacy Protection Act, 1998)• Protecting information posted online by children under 13

More details in lecture on Nov 8.

Page 8: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Data Privacy

Releasing sanitized databases• k-anonymity• (c,t)-isolation• Differential privacy

Privacy preserving data mining

Page 9: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Sanitization of Databases

Real Database (RDB)

Sanitized Database (SDB)

• Health records

• Census data

Add noise, delete names,

etc.

• Protect privacy

• Provide useful information (utility)

Page 10: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Re-identification by linking• Linking two sets of data on shared attributes may

uniquely identify some individuals:

• Example [Sweeney] : De-identified medical data was released,

purchased Voter Registration List of MA, re-identified Governor • 87 % of US population uniquely identifiable by 5-digit ZIP, sex, dob

Page 11: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

K-anonymity (1)

Quasi-identifier: Set of attributes (e.g. ZIP, sex, dob) that can be linked with external data to uniquely identify individuals in the population

Make every record in the table indistinguishable from at least k-1 other records with respect to

quasi-identifiers

Linking on quasi-identifiers yields at least k records for each possible value of the quasi-identifier

Page 12: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

K-anonymity and beyond

• Provides some protection: linking on ZIP, age, nationality yields 4 records• Limitations: lack of diversity in sensitive attributes, background knowledge, subsequent releases on the same data set • Utility: less suppression implies better utility

Page 13: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

(c,t)-isolation (2)

Mathematical definition motivated by Gavison’s idea that privacy is protected to the extent that an individual blends into a crowd.

Image courtesy of WaldoWiki: http://images.wikia.com/waldo/images/a/ae/LandofWaldos.jpg

Page 14: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Definition of (c,t)-isolation Let y be any RDB point, and let δy=║q-y║2. We say that q (c,t)-isolates y iff

B(q,cδy) contains fewer than t points in the RDB, that is, |B(q,c δy) ∩ RDB| < t.

A database is represented by n points in high dimensional space (one dimension per column)

q

yδy

cδy

x2

x1

xt-2

Page 15: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Definition of (c,t)-isolation (contd)

Page 16: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Differential Privacy: Motivation (3)

Guaranteeing that a sanitized database does not imply any private information is too hard• Auxiliary info: Terry is an inch taller than average• Sanitized database: The average height is 6 feet• Sanitized database only provided non-private

data, but resulted in private info being learned All surveyors really need is for people to be

comfortable supplying their private data People will be comfortable if providing data does

not change the sanitized database enough to be noticed

Page 17: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Differential Privacy: Formalization

Want a sanitization function K that maps two databases D1 and D2 that differ by one person to about the same sanitized databases K(D1) and K(D2)

Make a disclosure S about as likely with K(D1) as K(D2)

A randomized function K give ε-differential privacy if for all data sets D1 and D2 differing in at most one element and all subset S of Range(K),

Pr[K(D1) in S] ≤ exp(ε) × Pr[K(D2) in S]

Page 18: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

Privacy Preserving Data Mining

Reference• Y. Lindell and B. Pinkas. Privacy Preserving Data

Mining, Journal of Cryptology, 15(3):177-206, 2002. Problem:

• Compute some function of two confidential databases without revealing unnecessary information

Example: Govt. database of suspected terrorists intersection with airline passengers database

Approach:• Cryptographic techniques for secure multiparty

computation

Page 19: Privacy Research Overview 18739A: Foundations of Security and Privacy Anupam Datta Fall 2007-08.

The Security Definition (Slide: Lindell)

IDEALREAL

Trusted party

Protocolinteraction

For every real adversary A

there exists anadversary S

Computational Indistinguishability: every probabilistic

polynomial-time observer that receives the input/output distribution of the honest parties and the adversary, outputs 1 upon receiving the distribution

generated in IDEAL with negligibly close probability to when it is generated in REAL.