Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security,...
Transcript of Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security,...
![Page 1: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/1.jpg)
Adversarial Examples in Machine LearningNicolas PapernotGoogle PhD Fellow in Security, Pennsylvania State University
Talk at USENIX Enigma - February 1, 2017
Advised by Patrick McDanielPresentation prepared with Ian Goodfellow and Úlfar Erlingsson
@NicolasPapernot
![Page 2: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/2.jpg)
Successes of machine learning
Malware / APT detection
Financial fraud detection
Machine Learning as a Service
2
Autonomous driving
FeatureSmith
@NicolasPapernot
![Page 3: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/3.jpg)
Failures of machine learning: Dave’s talk
3
An adversary forces a PDF detector trained with ML to make wrong predictions.
The scenario:
- Availability of the model for intensive querying- Access to the model label and score- Binary classifier (two outputs: malware/benign)
![Page 4: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/4.jpg)
Failures of machine learning: this talk
4
An adversary forces a computer vision model trained with ML to make wrong predictions.
The scenario:
- Remote availability of the model through an API - Access to the model label only- A multi-class classifier (up to 1000 outputs)
![Page 5: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/5.jpg)
5
Adversarial examples represent worst-case domain shifts
![Page 6: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/6.jpg)
6
Adversarial examples
[GSS15] Goodfellow et al. Explaining and Harnessing Adversarial Examples
![Page 7: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/7.jpg)
Crafting adversarial examples: fast gradient sign method
7[GSS15] Goodfellow et al. Explaining and Harnessing Adversarial Examples
During training, the classifier uses a loss function to minimize model prediction errors
After training, attacker uses loss function to maximize model prediction error
1. Compute its gradient with respect to the input of the model
2. Take the sign of the gradient and multiply it by a threshold
![Page 8: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/8.jpg)
Black-box attacks and transferability
8
![Page 9: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/9.jpg)
Threat model of a black-box attack
Training dataModel architecture Model parameters
Model scores
Adversarial capabilities
(limited) oracle access: labels
Adversarial goal Force a ML model remotely accessible through an API to misclassify
9
Example
![Page 10: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/10.jpg)
Our approach to black-box attacks
10
Alleviate lack of knowledge about model
Alleviate lack of training data
![Page 11: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/11.jpg)
Adversarial example transferability
Adversarial examples have a transferability property:
samples crafted to mislead a model A are likely to mislead a model B
These property comes in several variants:
● Intra-technique transferability:○ Cross model transferability○ Cross training set transferability
● Cross-technique transferability
11
ML A
[SZS14] Szegedy et al. Intriguing properties of neural networks
![Page 12: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/12.jpg)
Adversarial examples have a transferability property:
samples crafted to mislead a model A are likely to mislead a model B
These property comes in several variants:
● Intra-technique transferability:○ Cross model transferability○ Cross training set transferability
● Cross-technique transferability
12[SZS14] Szegedy et al. Intriguing properties of neural networks
ML A
ML B Victim
Adversarial example transferability
![Page 13: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/13.jpg)
Adversarial examples have a transferability property:
samples crafted to mislead a model A are likely to mislead a model B
These property comes in several variants:
● Intra-technique transferability:○ Cross model transferability○ Cross training set transferability
● Cross-technique transferability
13
PDF Classifier A
PDF Classifier B
Adversarial example transferability
![Page 14: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/14.jpg)
Adversarial examples have a transferability property:
samples crafted to mislead a model A are likely to mislead a model B
14
Adversarial example transferability
![Page 15: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/15.jpg)
Adversarial examples have a transferability property:
samples crafted to mislead a model A are likely to mislead a model B
15
Adversarial example transferability
![Page 16: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/16.jpg)
Intra-technique transferability: cross training data
16[PMG16b] Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Strong Weak Intermediate
![Page 17: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/17.jpg)
Cross-technique transferability
17[PMG16b] Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
![Page 18: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/18.jpg)
Cross-technique transferability
18[PMG16b] Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
![Page 19: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/19.jpg)
Our approach to black-box attacks
19
Adversarial example transferability from a substitute model to
target model
Alleviate lack of knowledge about model
Alleviate lack of training data
![Page 20: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/20.jpg)
Attacking remotely hosted black-box models
20
Remote ML sys
“no truck sign”“STOP sign”
“STOP sign”
(1) The adversary queries remote ML system for labels on inputs of its choice.
![Page 21: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/21.jpg)
21
Remote ML sys
Local substitute
“no truck sign”“STOP sign”
“STOP sign”
(2) The adversary uses this labeled data to train a local substitute for the remote system.
Attacking remotely hosted black-box models
![Page 22: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/22.jpg)
22
Remote ML sys
Local substitute
“no truck sign”“STOP sign”
(3) The adversary selects new synthetic inputs for queries to the remote ML system based on the local substitute’s output surface sensitivity to input variations.
Attacking remotely hosted black-box models
![Page 23: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/23.jpg)
23
Remote ML sys
Local substitute
“yield sign”
(4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability.
Attacking remotely hosted black-box models
![Page 24: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/24.jpg)
Our approach to black-box attacks
24
Adversarial example transferability from a substitute model to
target model
Synthetic data generation
+
Alleviate lack of knowledge about model
Alleviate lack of training data
![Page 25: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/25.jpg)
Results on real-world remote systems
25
All remote classifiers are trained on the MNIST dataset (10 classes, 60,000 training samples)
Remote Platform ML technique Number of queriesAdversarial examples
misclassified (after querying)
Deep Learning 6,400 84.24%
Logistic Regression 800 96.19%
Unknown 2,000 97.72%
[PMG16a] Papernot et al. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
![Page 26: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/26.jpg)
Defending and benchmarking machine learning models
26
![Page 27: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/27.jpg)
Adversarial training
Intuition: injecting adversarial example during training with correct labels
Goal: improve model generalization outside of training manifold
27Figure by Ian Goodfellow
Training time (epochs)
![Page 28: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/28.jpg)
Adversarial training
Intuition: injecting adversarial example during training with correct labels
Goal: improve model generalization outside of training manifold
28Figure by Ian Goodfellow
Training time (epochs)
![Page 29: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/29.jpg)
Adversarial training
Intuition: injecting adversarial example during training with correct labels
Goal: improve model generalization outside of training manifold
29Figure by Ian Goodfellow
Training time (epochs)
![Page 30: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/30.jpg)
Adversarial training
Intuition: injecting adversarial example during training with correct labels
Goal: improve model generalization outside of training manifold
30Figure by Ian Goodfellow
Training time (epochs)
![Page 31: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/31.jpg)
Adversarial training: some limitations
Works well here because same attack used by adversary and classifier
Harder to generalize model robustness to adaptive attacks
Classifier needs to be aware of all attacker strategies
31
![Page 32: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/32.jpg)
The cleverhans library (and blog)
32
Code at: github.com/openai/cleverhans
Blog at: cleverhans.io
Benchmark models against adversarial example attacks
Increase model robustness with adversarial training
Contributions welcomed!
![Page 33: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/33.jpg)
Hands-on tutorial with the MNIST dataset
33
# Define input TF placeholderx = tf.placeholder(tf.float32, shape=(None, 1, 28, 28))y = tf.placeholder(tf.float32, shape=(None, FLAGS.nb_classes))
# Define TF model graphmodel = model_mnist()predictions = model(x) # Craft adversarial examples using Fast Gradient Sign Method (FGSM)adv_x = fgsm(x, predictions, eps=0.3)X_test_adv, = batch_eval(sess, [x], [adv_x], [X_test])
# Evaluate the accuracy of the MNIST model on adversarial examplesaccuracy = tf_model_eval(sess, x, y, predictions, X_test_adv, Y_test)print('Test accuracy on adversarial examples: ' + str(accuracy))
![Page 34: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/34.jpg)
Implications of security research to the fields of ML & AI
34
![Page 35: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/35.jpg)
35
Adversarial examples are a tangible instance of hypothetical AI safety problems
Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg
![Page 36: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/36.jpg)
36
Accurate extrapolation outside of training data (i.e., resilience to adversarial examples) is a prerequisite for
model-based optimization
Image source: http://www.pullman-wa.gov/images/stories/Police/drugs.jpg
![Page 37: Talk at USENIX Enigma - February 1, 2017 @NicolasPapernot ... · Google PhD Fellow in Security, Pennsylvania State University Talk at USENIX Enigma - February 1, 2017 Advised by Patrick](https://reader033.fdocuments.in/reader033/viewer/2022041422/5e20293b318e6a661920b894/html5/thumbnails/37.jpg)
?Thank you for listening!
37Thank you to my sponsors:
www.cleverhans.io@NicolasPapernot