Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural...

51
Security and Privacy in Machine Learning Nicolas Papernot Work done at the Pennsylvania State University and Google Brain July 2018 - MSR AI summer school @NicolasPapernot www.papernot.fr

Transcript of Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural...

Page 1: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Security and Privacy in Machine LearningNicolas PapernotWork done at the Pennsylvania State University and Google Brain

July 2018 - MSR AI summer school

@NicolasPapernot

www.papernot.fr

Page 2: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Martín Abadi (Google Brain)Pieter Abbeel (Berkeley)Michael Backes (CISPA)Dan Boneh (Stanford)Z. Berkay Celik (Penn State)Brian Cheung (UC Berkeley)Yan Duan (OpenAI)Gamaleldin F. Elsayed (Google Brain)Úlfar Erlingsson (Google Brain)Matt Fredrikson (CMU)Ian Goodfellow (Google Brain)Kathrin Grosse (CISPA)Sandy Huang (UC Berkeley)Somesh Jha (U of Wisconsin) 2

Alexey Kurakin (Google Brain)Praveen Manoharan (CISPA)Patrick McDaniel (Penn State)Ilya Mironov (Google Brain)Ananth Raghunathan (Google Brain)Arunesh Sinha (U of Michigan)Shreya Shankar (Stanford)Jascha Sohl-Dickstein (Google Brain) Shuang Song (UCSD)Ananthram Swami (US ARL)Kunal Talwar (Google Brain)Florian Tramèr (Stanford)Michael Wellman (U of Michigan)Xi Wu (Google)

Page 3: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Machine learning brings social disruption at scale

3

HealthcareSource: Peng and Gulshan (2017)

EducationSource: Gradescope

TransportationSource: Google

EnergySource: Deepmind

Page 4: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Machine learning is not magic (training time)

4

Training data

Page 5: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Machine learning is not magic (inference time)

5

?

C

Page 6: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Machine learning is deployed in adversarial settings

6

YouTube filtering

Content evades detection at inference

Tay chatbot

Training data poisoning

Page 7: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Machine learning does not always generalize well (1/2)

7

Training data Test data

CatDog

CatDog

Page 8: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

What if the adversary systematically poisoned the data?

8(Understanding Black-box Predictions via Influence Functions, Koh and Liang)

Page 9: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

9

What if the adversary systematically evaded at inference time?

Biggio et al., Szegedy et al., Goodfellow et al., Papernot et al., ...

(Goodfellow et al., 2014)

Page 10: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Machine learning does not always generalize well (2/2)

10

Training data Test data

CatDog

CatDog

Membership inference attack (Shokri et al.)

Page 11: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

What is the threat model?

11

Consu

mer

s

Model

host

Data o

wners

Trai

ning

Infe

renc

e

Data integrity

Model integrity

Model integrity

Data poisoning(Koh and Liang, 2017)

Backdoor(Gu et al., 2017)

Adversarial examples(Szegedy et al., 2013)

Data confidentiality

Membership inference(Shokri et al., 2017)

Data Privacy

CryptoNets(Dowlin et al., 2016)

Model confidentiality Model extraction(Tramer et al., 2016)

Data confidentiality

Data privacy RAPPOR(Erlingsson, 2014)

Federated learning(McMahan, 2017)

Adversarial goal Attack / defense example

Towards the Science of Security and Privacy in Machine Learning [IEEE EuroS&P 2018]Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman

Page 12: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Attacking Machine Learning Integrity with Adversarial Examples

12

Page 13: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Attacker may see the model: attacker needs to know details of the machine learning model

to do an attack --- aka a white-box attacker

Attacker may not see the model: attacker who knows very little (e.g. only gets to ask a few

questions) --- aka a black-box attacker

The threat model

13

ML

ML

Page 14: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Jacobian-based Saliency Map Approach (JSMA)

14The Limitations of Deep Learning in Adversarial Settings [IEEE EuroS&P 2016]Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami

Page 15: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

15

Page 16: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Adversarial examples...

… beyond deep learning

16

… beyond computer vision

Logistic Regression

Support Vector Machines

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples [arXiv preprint]Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow

P[X=Malware] = 0.90P[X=Benign] = 0.10

P[X*=Malware] = 0.10P[X*=Benign] = 0.90

Adversarial Attacks on Neural Network Policies [arXiv preprint]Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter AbbeelAdversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017]Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, Patrick McDaniel

Nearest Neighbors

Decision Trees

Useful to think about definitions and threat model

Page 17: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Attacker may see the model: attacker needs to know details of the machine learning model

to do an attack --- aka a white-box attacker

Attacker may not see the model: attacker who knows very little (e.g. only gets to ask a few

questions) --- aka a black-box attacker

The threat model

17

ML

ML

Page 18: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Attacking remotely hosted black-box models

18

Remote ML sys

“no truck sign”“STOP sign”

“STOP sign”

(1) The adversary queries remote ML system for labels on inputs of its choice.

Practical Black-Box Attacks against Machine Learning [AsiaCCS 2017]Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z.Berkay Celik, and Ananthram Swami

Page 19: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

19

Remote ML sys

Local substitute

“no truck sign”“STOP sign”

“STOP sign”

(2) The adversary uses this labeled data to train a local substitute for the remote system.

Attacking remotely hosted black-box models

Page 20: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

20

Remote ML sys

Local substitute

“no truck sign”“STOP sign”

(3) The adversary selects new synthetic inputs for queries to the remote ML system based on the local substitute’s output surface sensitivity to input variations.

Attacking remotely hosted black-box models

Page 21: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

21

Remote ML sys

Local substitute

“yield sign”

(4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability.

Attacking remotely hosted black-box models

Page 22: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Cross-technique transferability

22Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples [arXiv preprint]Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow

ML

Page 23: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Properly-blinded attacks on real-world remote systems

23

All remote classifiers are trained on the MNIST dataset (10 classes, 60,000 training samples)

Remote Platform ML technique Number of queriesAdversarial examples

misclassified (after querying)

Deep Learning 6,400 84.24%

Logistic Regression 800 96.19%

Unknown 2,000 97.72%

Page 24: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

24

Page 25: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Defending against adversarial examples

25

Page 26: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Learning models robust to adversarial examples is hard

26Is attacking machine learning easier than defending it? [Blog post at www.cleverhans.io]Ian Goodfellow and Nicolas Papernot

Error spaces containing adversarial examples are large

Learning or detecting adversarial examples creates an arms race

ML Victim

Page 27: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

What makes a successful deep neural network?

27

Softmax

3rd hidden

2nd hidden

1st hidden

Inputs

Layer name Neural architecture Representation spaces

Panda

Page 28: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

What makes a successful adversarial example?

28

School BusSoftmax

3rd hidden

2nd hidden

1st hidden

Inputs + = + =

Layer name Neural architecture Representation spaces Nearest neighbors

Page 29: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Nearest neighbors indicate support from training data...

29

Panda School BusSoftmax

3rd hidden

2nd hidden

1st hidden

Inputs + = + =

Layer name Neural architecture Representation spaces Nearest neighbors

Page 30: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

… Deep k-Nearest Neighbors (DkNN) classifier

30Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning [arXiv preprint]Nicolas Papernot, Patrick McDaniel

1. Searches for nearest neighbors in the training data at each layer

2. Estimates the nonconformity of input x for each possible label y

3. Apply conformal prediction to compute:

a. Confidence

“How likely is the prediction given the training data?”

b. Credibility

“How relevant is the training data to the prediction?”+ =

Panda School Bus

Page 31: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Carlini & WagnerBasic Iterative Method

Example applications of DkNN credibility

31Out

of d

istr

ibut

ion

Mis

labe

led

inpu

tsA

dver

saria

l ex

ampl

es

Clean inputs

Page 32: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Implications for the attacker and defender

Attacker

32

Defender

Reject low credibility predictions:

-> explicit tradeoff between clean accuracy and adversarial accuracy

Active learning: more training data through human labeling of rejected predictions

Contributes to breaking “black-box” myth

Page 33: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Some (surprising) connections to fairness & interpretability

33

Adversarial Examples that Fool both Human and Computer Vision [arXiv preprint]Gamaleldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein

Page 34: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Machine Learning with Privacy

34

Page 35: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Types of adversaries and our threat model

35

In our work, the threat model assumes:

- Adversary can make a potentially unbounded number of queries- Adversary has access to model internals

Model inspection (white-box adversary)Zhang et al. (2017) Understanding DL requires rethinking generalization

Model querying (black-box adversary)Shokri et al. (2016) Membership Inference Attacks against ML ModelsFredrikson et al. (2015) Model Inversion Attacks

?Black-box

ML

Page 36: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

A definition of privacy: differential privacy

36

Randomized Algorithm

Randomized Algorithm

Answer 1Answer 2

...Answer n

Answer 1Answer 2

...Answer n

????}

Page 37: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Private Aggregation of Teacher Ensembles (PATE)

37

Partition 1

Partition 2

Partition n

Partition 3

...

Teacher 1

Teacher 2

Teacher n

Teacher 3

...

Training

Sensitive Data

Data flow

Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data [ICLR 2017 best paper]Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, and Kunal Talwar

Page 38: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Aggregation

38

Count votes Take maximum

Page 39: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Intuitive privacy analysis

39

If most teachers agree on the label, it does not depend on specific partitions, so the privacy cost is small.

If two classes have close vote counts, the disagreement may reveal private information.

Page 40: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Noisy aggregation

40

Count votes Add Laplacian noise Take maximum

Page 41: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Teacher ensemble

41

Partition 1

Partition 2

Partition n

Partition 3

...

Teacher 1

Teacher 2

Teacher n

Teacher 3

...

Aggregated Teacher

Training

Sensitive Data

Data flow

Page 42: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Student training

42

Partition 1

Partition 2

Partition n

Partition 3

...

Teacher 1

Teacher 2

Teacher n

Teacher 3

...

Aggregated Teacher Student

Training

Available to the adversaryNot available to the adversary

Sensitive Data

Public Data

Inference Data flow

Queries

Page 43: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Why train an additional “student” model?

43

Each prediction increases total privacy loss.Privacy budgets create a tension between the accuracy and number of predictions.

Inspection of internals may reveal private data.Privacy guarantees should hold in the face of white-box adversaries.

1

2

The aggregated teacher violates our threat model:

Page 44: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Student training

44

Partition 1

Partition 2

Partition n

Partition 3

...

Teacher 1

Teacher 2

Teacher n

Teacher 3

...

Aggregated Teacher Student

Training

Available to the adversaryNot available to the adversary

Sensitive Data

Public Data

Inference Data flow

Queries

Page 45: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Deployment

45

Inference

Available to the adversary

QueriesStudent

Page 46: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Differential privacy:A randomized algorithm M satisfies ( , ) differential privacy if for all pairs of neighbouring datasets (d,d’), for all subsets S of outputs:

Application of the Moments Accountant technique (Abadi et al, 2016)

Strong quorum ⟹ Small privacy cost

Bound is data-dependent: computed using the empirical quorum

Differential privacy analysis

46

Page 47: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Trade-off between student accuracy and privacy

47

Page 48: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Synergy between utility and privacy

48

1. Check privately for consensus2. Run noisy argmax only when consensus is sufficient

Scalable Private Learning with PATE [ICLR 2018]Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Ulfar Erlingsson

Page 49: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Trade-off between student accuracy and privacy

49

Selective PATE

Page 50: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

Machine learning and Goodhart’s law

Economist Charles Goodhart posited in 1975 that …

“When a measure becomes a target, it ceases to be a good

measure”

As ML models make more and more decisions, we will have to satisfy them, and they will become targets.

50

Page 51: Machine Learning Security and Privacy in …...Adversarial Perturbations Against Deep Neural Networks for Malware Classification [ESORICS 2017] Kathrin Grosse, Nicolas Papernot, Praveen

?Thank you for listening!

51

www.cleverhans.io@NicolasPapernot

www.papernot.fr