Distributions of Adversarial Examples by learning...
Transcript of Distributions of Adversarial Examples by learning...
by learning the Distributions of Adversarial Examples
Boqing GongJoint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang
Published in ICML 2019
Intriguing properties of deep neural networks (DNNs)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR.Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR.
Intriguing properties of deep neural networks (DNNs)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR.Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR.
Projected gradient descent (PGD) attack
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083.Kurakin, A., Goodfellow, I., & Bengio, S. (2016). Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236.
Intriguing results (1)
~100% attack success rates on CIFAR10 & ImageNet
vs
Intriguing results (2)
Intriguing results (2)
Adversarial examples generalize between different DNNs
vs
E.g., AlexNet InceptionV3
Intriguing results (3)
A universal adversarial perturbation
Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.
In a nutshell, white-box adversarial attacks can
Fool different DNNs for almost all test examples
Most data points lie near the classification boundaries.
Fool different DNNs by the same adversarial examples
The classification boundaries of various DNNs are close.
Fool different DNNs by a single universal perturbation
We can turn most examples to adversarial by moving them along the same direction by the same amount.
However, white-box adversarial attacks can
Not apply to most real-world scenarios
Not work when the network architecture is unknown
Not work when the weights are unknown
Not work when querying networks is (e.g., cost) prohibitive
Black-box attacks
Panda
Panda: 0.88493
Indri: 0.00878
Red Panda: 0.00317
Substitute attack (Papernot et al., 2017)
Decision-based (Brendel et al., 2017)
Boundary-tracing (Cheng et al., 2019)
Zero-th order (Chen et al., 2017)
Natural evolution strategies (Ilyas et al., 2018)
Bad local optimum, non-smooth optimization, curse of dimensionality, defense-specific gradient estimation, etc.
The? adversarial perturbation (for an input)
Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML.
Learns the distribution of adversarial examples (for any input)
Our work
Learns the distribution of adversarial examples (for an input)
Reduces the “attack dimension”Fewer queries into the network.
Smoothes the optimizationHigher attack success rates.
Characterizes the risk of the input exampleNew defense methods.
Our work
Learns the distribution of adversarial examples (for an input)
Our work
> > >
Learns the distribution of adversarial examples (for an input)
A sample from the distribution fails DNN by a high chance.
Our work
Which family of distributions?
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
Black-box
Experiment setup
Attack 13 defended DNNs & 2 vanilla DNNs
Consider both and
Examine all test examples of CIFAR10 & 1000 of ImageNet
Excluding those misclassified by the targeted DNN
Evaluate by attack success rates
Attack success rates, ImageNet
Attack success rates, CIFAR10
Attack success rate vs. optimization steps
Transferabilities of the adversarial examples
A universally effective defense technique?
Adversarial training / defensive learning
The PGD attack
DNN weights
In a nutshell,
Is a powerful black-box attack, >= white-box attacks
Is universal, failed various defenses by the same algorithm
Characterizes the distributions of adversarial examples
Reduces the “attack dimension”
Speeds up the defensive learning (ongoing work)
Physical adversarial attack
Joint work with Yang Zhang, Hassan Foroosh, & David Phil
Published in ICLR 2019Boqing Gong
Recall the following result
A universal adversarial perturbation
Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.
Physical attack: universal perturbation → 2D mask
Eykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. "Robust physical-world attacks on deep learning models." CVPR 2018.
Physical attack: 2D mask → 3D camouflage
Gradient descent w.r.t. camouflage cin order to minimize detection scoresfor the vehicle under all feasible locations
Non-differentiable
Repeat until done1. Camouflage a vehicle2. Drive it around and take many pictures of it3. Detect it by Faster-RCNN & save the detection scores
→ Dataset: {(camouflage, vehicle, background, detection score)}
Physical attack: 2D mask → 3D camouflage
Fit a DNN to predict any camouflage’s corresponding detection scores
Physical attack: 2D mask → 3D camouflage
Physical attack: 2D mask → 3D camouflage
Gradient descent w.r.t. Camouflage cin order to minimize detection scoresfor the vehicle under all feasible locations
Non-differentiableBut approximated by a DNN
Why do we care?
Observation, re-observation, & future work
Defended DNNs are still vulnerable to transfer attacks (only to some moderate degree though)
Adversarial examples from black-box attacks are less transferable than those from white-box attacks
All future work on defenses will adopt adversarial training
Adversarial training will become faster (we are working on it)
We should certify DNNs’ expected robustness by
New works to watchStateful DNNs: Goodfellow (2019). A Research Agenda: Dynamic Models to Defend Against Correlated Attacks. arXiv: 1903.06293.
Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples Are Not Bugs, They Are Features. arXiv:1905.02175.
Faster adversarial training: Zhang et al. (2019). You Only Propagate Once: Painless Adversarial Training Using Maximal Principle. arXiv:1905.00877. && Shafahi et al. (2019). Adversarial Training for Free! arXiv: 1904.12843.
Certifying DNNs’ expected robustness: Webb et al. (2019). A Statistical Approach to Assessing Neural Network Robustness. ICLR. && Cohen et al. (2019). Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918.