1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation...

32
1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

Transcript of 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation...

Page 1: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

1

Privacy Preserving Data Mining

Haiqin Yang

Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

Page 2: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

2

Outline

Motivation Privacy Secure computation and privacy Privacy preserving SVM Related to 3D-LBS Challenges

Page 3: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

3

Motivation Huge databases exist in various applications

Medical data Consumer purchase data Census data Communication and media-related data Data gathered by government agencies

Can these data be utilized? For medical research For improving customer service For homeland security

Page 4: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

4

Motivation Data sharing is necessary for full utilization

Pooling medical data can improve the quality of medical research

Pooling of information from different government agencies can provide a wider picture What is the health status of citizens that are

supported by social welfare? Are there citizens that receive simultaneous

support from different agencies? Data gathered by the government (e.g.,

census data) should be publicly available

Page 5: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

5

Motivation

The huge amount of data available means that it is possible to learn a lot of information about individuals from public data Purchasing patterns Family history Medical data …

Page 6: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

6

Privacy Human definition:

Privacy and autonomy: information that is personal, confidential or private should not be unnecessarily distributed or publicly known

Privacy and control: Personal or private information should not be misused (whatever that means)

Difficulties in mathematically formulating The same information is classified differently by

different people Legitimate use is interpreted differently by different

people

Page 7: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

7

Secure computation and privacy Secure computation

Assume that there is a function that all parties wish to compute

Secure computation shows how to compute that function in the safest way possible

In particular, it guarantees minimal information leakage (the output only)

Privacy Does the function output itself reveal “sensitive

information”, or Should the parties agree to compute this

function?

Page 8: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

8

This talk

Secure multiparty computation Trace back to “two millionaires

problem” (A. Yao 82), or earlier Privacy preserving data mining

Privacy preserving SVM (Vaidya et al. KIS 2007)

Page 9: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

9

Secure multiparty computation

A set of parties with private inputs Objective: jointly compute a function of the

inputs so that certain security properties (like privacy and correctness) are preserved Applications: secure elections, auctions…

Properties must be ensured even if some of the parties maliciously attack the protocol

Page 10: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

10

Secure computation tasks

Examples: Authentication protocols Online payments Auctions Elections Privacy preserving data mining Essentially any task…

Page 11: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

11

The real model

x

Protocol output

y

Protocol output

Page 12: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

12

The ideal model

x

f1(x,y)

y

f2(x,y)

x

f1(x,y)

y

f2(x,y)

Page 13: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

13

IDEALREAL

Trusted party

Protocolinteraction

The security definition

For every real adversary A

there exists anadversary S

Page 14: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

14

Why this approach?

General – it captures all applications The specifics of an application are defined

by its functionality, security is defined as above

The security guarantees achieved are easily understood (because the ideal model is easily understood)

We can be confident that we did not “miss” any security requirements

Page 15: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

15

The ideal model – More details The definition we gave suffices in the

case of an honest majority When there is no honest majority

Guaranteed output delivery cannot be achieved Fairness cannot be achieved

Changes to ideal model: Corrupted parties receive output first Adversary decides if honest parties receive their

outputs as well This is called security with abort

Page 16: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

16

Defects to the ideal model When no honest majority, fairness and guaranteed

output delivery cannot be achieved This “defect” is included into the ideal model

This approach can be used to models of partial information leakage: The parties wish to compute a function f, but more

information is leaked by the protocol This can be modeled by having the trusted party

explicitly leak this information Helps for efficiency considerations

Advantage: explicit defect!

Page 17: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

17

Privacy preserving data mining

Setting Data is distributed at different sites These sites may be third parties (e.g.,

hospitals, government bodies) or individuals Aim

Compute the data mining algorithm on the data so that nothing but the output is learned

That is, carry out a secure computation

Page 18: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

18

Privacy Security

Secure computation only deals with the process of computing the function It does not ask whether or not the

function should be computed

Page 19: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

19

Privacy and secure computation

Secure computation can be used to solve any distributed data-mining problem

A two-stage process Decide that the function/algorithm should be

computed – an issue of privacy Apply secure computation techniques to

compute it securely – security But, not every privacy problem can be cast as a

distributed computation

Page 20: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

20

Privacy preserving SVM classification (Vaidya et al. KIS 2007)

Page 21: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

21

SVM introduction

Data Objective

x1

x 2

SVM Illustration

Problem: How to pass the kernel matrix among different parties?

Page 22: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

22

Linear kernel

Polynomial kernel

RBF kernel

Case 1: On vertically partitioned data

d

D1 D2N

Bank, health insurance company and auto insurance company collect different information about the same people

Page 23: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

23

Secure merge Assumption: K parties, Pi holds vi

Procedure P0 chooses a random number R from a

uniform distribution over F P0 sends (R+v0) mod |F | to P1

Pi receives Pi sends to Pi+1

Page 24: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

24

Case 2: On horizontally partitioned data

Different banks collect data for their customers

d

D1

D2N

Page 25: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

25

Case 3: On arbitrarily partitioned data

Page 26: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

26

How to do? Homomorphic encryption

where * is either addition or multiplication (in some abelian group)

Existing cryptosystems Goldwasser–Micali (Blum M, Goldwasser S 1984) Benaloh cryptosystem (Benaloh JC 1986) Naccache–Stern cryptosystem (Naccache D, Stern J

1998) Paillier cryptosystem (Paillier P 1999) Okamoto–Uchiyama cryptosystem (Okamoto T, Uchiyama

S 1998)

Page 27: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

27

Algorithm

Page 28: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

28

Other security issues

Collusion with the third party Q

Page 29: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

29

Related to 3D-LBS Issues

Some data may need be encrypted before research test

May need to work data miners on encrypted data

Problems Lack knowledge of security and

cryptography Limited understanding of privacy

Page 30: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

30

Future challenges Understand the real problem and what the real

requirement is A very non-trivial task and one that requires

interdisciplinary cooperation Computer scientists should help to formalize the

notion, lawyers, policy-makers, social scientists should be involved in understanding the concerns

Some challenges Reconciling cultural and legal differences relating to

privacy in different countries Understanding when privacy is “allowed” to be

breached

Page 31: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

31

Future challenges

Appropriate modeling for secure computation

Efficient protocols…

Page 32: 1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

32

Conclusion

Privacy-preserving data mining is truly needed Data mining is being used: by security

agencies, governmental bodies and corporations

Privacy advocates and citizen outcry often prevents positive use of data mining

Good solutions (which are still currently out of reach) may be able to resolve the conflict