Threat of Adversarial Attacks on Deep Learning in Computer ...€¦ · N. Akhtar, A. Mian: Threat...

Received January 4, 2018, accepted February 8, 2018, date of publication February 19, 2018, date of current version March 28, 2018.

Digital Object Identifier 10.1109/ACCESS.2018.2807385

Threat of Adversarial Attacks on Deep Learningin Computer Vision: A SurveyNAVEED AKHTAR AND AJMAL MIANDepartment of Computer Science and Software Engineering, The University of Western Australia, Crawley, WA 6009, Australia

Corresponding author: Naveed Akhtar ([email protected])

This work was supported in part by ARC under Grant DP 160101458.

ABSTRACT Deep learning is at the heart of the current rise of artificial intelligence. In the field of computervision, it has become the workhorse for applications ranging from self-driving cars to surveillance andsecurity. Whereas, deep neural networks have demonstrated phenomenal success (often beyond humancapabilities) in solving complex problems, recent studies show that they are vulnerable to adversarial attacksin the form of subtle perturbations to inputs that lead a model to predict incorrect outputs. For images,such perturbations are often too small to be perceptible, yet they completely fool the deep learning models.Adversarial attacks pose a serious threat to the success of deep learning in practice. This fact has recentlyled to a large influx of contributions in this direction. This paper presents the first comprehensive survey onadversarial attacks on deep learning in computer vision. We review the works that design adversarial attacks,analyze the existence of such attacks and propose defenses against them. To emphasize that adversarialattacks are possible in practical conditions, we separately review the contributions that evaluate adversarialattacks in the real-world scenarios. Finally, drawing on the reviewed literature, we provide a broader outlookof this research direction.

INDEX TERMS Deep learning, adversarial perturbation, black-box attack, white-box attack, adversariallearning, perturbation detection.

I. INTRODUCTIONDeep Learning [1] is providing major breakthroughs in solv-ing the problems that have withstood many attempts ofmachine learning and artificial intelligence community inthe past. As a result, it is currently being used to decipherhard scientific problems at an unprecedented scale, e.g. inreconstruction of brain circuits [2]; analysis of mutations inDNA [3]; prediction of structure-activity of potential drugmolecules [4], and analyzing the particle accelerator data [5][6]. Deep neural networks have also become the preferredchoice to solve many challenging tasks in speech recogni-tion [7] and natural language understanding [8].

In the field of Computer Vision, deep learning became thecenter of attention after Krizhevsky et al. [9] demonstrated theimpressive performance of a Convolutional Neural Network(CNN) [10] based model on a very challenging large-scalevisual recognition task [11] in 2012. A significant credit forthe current popularity of deep learning can also be attributedto this seminal work. Since 2012, the Computer Vision com-munity has made numerous valuable contributions to deeplearning research, enabling it to provide solutions for the

problems encountered in medical science [21] to mobileapplications [179]. The recent breakthrough in artificial intel-ligence in the form of tabula-rasa learning of AlphaGoZero [14] also owes a fair share to deep Residual Networks(ResNets) [15] that were originally proposed for the task ofimage recognition.

With the continuous improvements of deep neural networkmodels [15], [144], [166]; open access to efficient deep learn-ing software libraries [175]–[177]; and easy availability ofhardware required to train complex models, deep learningis fast achieving the maturity to enter into safety and secu-rity critical applications, e.g. self driving cars [12], [180],surveillance [13], maleware detection [34], [107], drones androbotics [155], [178], and voice command recognition [7].With the recent real-world developments like facial recogni-tion ATM [181] and Face ID security onmobile phones [182],it is apparent that deep learning solutions, especially thoseoriginating from Computer Vision problems are about to playa major role in our day-to-day lives.

Whereas deep learning performs a wide variety ofComputer Vision tasks with remarkable accuracies,

144102169-3536 2018 IEEE. Translations and content mining are permitted for academic research only.

Personal use is also permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

VOLUME 6, 2018

https://orcid.org/0000-0003-3406-673X

N. Akhtar, A. Mian: Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

FIGURE 1. Example of attacks on deep learning models with ‘universal adversarial perturbations’ [16]: The attacks areshown for the CaffeNet [9], VGG-F network [17] and GoogLeNet [18]. All the networks recognized the original cleanimages correctly with high confidence. After small perturbations were added to the images, the networks predictedwrong labels with similar high confidence. Notice that the perturbations are hardly perceptible for human vision system,however their effects on the deep learning models are catastrophic.

Szegedy et al. [22] first discovered an intriguing weakness ofdeep neural networks in the context of image classification.They showed that despite their high accuracies, modern deepnetworks are surprisingly susceptible to adversarial attacksin the form of small perturbations to images that remain(almost) imperceptible to human vision system. Such attackscan cause a neural network classifier to completely change itsprediction about the image. Even worse, the attacked modelsreport high confidence on the wrong prediction. Moreover,the same image perturbation can fool multiple network clas-sifiers. The profound implications of these results triggereda wide interest of researchers in adversarial attacks and theirdefenses for deep learning in general.

Since the findings of Szegedy et al. [22], several inter-esting results have surfaced regarding adversarial attacks ondeep learning in Computer Vision. For instance, in addi-tion to the image-specific adversarial perturbations [22],Moosavi-Dezfooli et al. [16] showed the existence of ‘uni-versal perturbations’ that can fool a network classifier on anyimage (see Fig. 1 for example). Similarly, Athalye et al. [65]demonstrated that it is possible to even 3-D print real-world objects that can fool deep neural network classifiers(see Section IV-C). Keeping in view the significance of deeplearning research in Computer Vision and its potential appli-cations in the real life, this article presents the first com-prehensive survey on adversarial attacks on deep learning inComputer Vision. The article is intended for a wider read-ership than Computer Vision community, hence it assumesonly basic knowledge of deep learning and image processing.Nevertheless, it also discusses technical details of importantcontributions for the interested readers.

We first describe the common terms related to adversarialattacks in the parlance of Computer Vision in Section II.In Section III, we review the adversarial attacks for the

task of image classification and beyond. A separate sectionis dedicated to the approaches that deal with adversarialattacks in the real-world conditions. Those approaches arereviewed in Section IV. In the literature, there are also worksthat mainly focus on analyzing the existence of adversar-ial attacks. We discuss those contributions in Section V.The approaches that make defense against the adversarialattacks as their central topic are discussed in Section VI.In Section VII, we provide a broader outlook of the researchdirection based on the reviewed literature. Finally, we drawconclusion in Section VIII.

II. DEFINITIONS OF TERMSIn this section, we describe the common technical terms usedin the literature related to adversarial attacks on deep learningin Computer Vision. The remaining article also follows thesame definitions of the terms.• Adversarial example/image is a modified version ofa clean image that is intentionally perturbed (e.g. byadding noise) to confuse/fool a machine learning tech-nique, such as deep neural networks.

• Adversarial perturbation is the noise that is added to theclean image to make it an adversarial example.

• Adversarial training uses adversarial images besides theclean images to train machine learning models.

• Adversary more commonly refers to the agent who cre-ates an adversarial example. However, in some cases theexample itself is also called adversary.

• Black-box attacks feed a targeted model with the adver-sarial examples (during testing) that are generated with-out the knowledge of that model. In some instances, it isassumed that the adversary has a limited knowledge ofthe model (e.g. its training procedure and/or its archi-tecture) but definitely does not know about the model

VOLUME 6, 2018 14411


parameters. In other instances, using any informationabout the target model is referred to as ‘semi-black-box’attack. We use the former convention in this article.

• Detector is a mechanism to (only) detect if an image isan adversarial example.

• Fooling ratio/rate indicates the percentage of images onwhich a trained model changes its prediction label afterthe images are perturbed.

• One-shot/one-step methods generate an adversarial per-turbation by performing a single step computation, e.g.computing gradient of model loss once. The oppositeare iterative methods that perform the same computationmultiple times to get a single perturbation. The latter areoften computationally expensive.

• Quasi-imperceptible perturbations impair images veryslightly for human perception.

• Rectifier modifies an adversarial example to restore theprediction of the targeted model to its prediction on theclean version of the same example.

• Targeted attacks fool a model into falsely predicting aspecific label for the adversarial image. They are oppo-site to the non-targeted attacks in which the predictedlabel of the adversarial image is irrelevant, as long as itis not the correct label.

• Threat model refers to the types of potential attacksconsidered by an approach, e.g. black-box attack.

• Transferability refers to the ability of an adversarialexample to remain effective even for the models otherthan the one used to generate it.

• Universal perturbation is able to fool a given model on‘any’ imagewith high probability. Note that, universalityrefers to the property of a perturbation being ‘image-agnostic’ as opposed to having good transferability.

• White-box attacks assume the complete knowledge ofthe targetedmodel, including its parameter values, archi-tecture, training method, and in some cases its trainingdata as well.

III. ADVERSARIAL ATTACKSIn this section, we review the body of literature in ComputerVision that introduces methods for adversarial attacks ondeep learning. The reviewed literature mainly deals with theart of fooling the deep neural networks in ‘laboratory set-tings’, where approaches are developed for the typical Com-puter Vision tasks, e.g. recognition, and their effectiveness isdemonstrated using standard datasets, e.g. MNIST [10]. Thetechniques that focus on attacking deep learning in the real-world conditions are separately reviewed in Section IV. How-ever, it should be noted that the approaches reviewed in thissection form the basis of the real-world attacks, and almosteach one of them has the potential to significantly affect deeplearning in practice. Our division is based on the evaluationconditions of the attacks in the original contributions.

The review in this section is mainly organized in chrono-logical order, with few exceptions to maintain the flow ofdiscussion. To provide technical understanding of the core

concepts to the reader, we also go into technical details of thepopular approaches and some representative techniques of theemerging directions in this area. Other methods are discussedbriefly. We refer to the original papers for the details onthose techniques. This section is divided into two parts. Inpart III-A, we review the methods that attack deep neuralnetworks performing the most common task in ComputerVision, i.e. classification/recognition. Approaches that arepredominantly designed to attack deep learning beyond thistask are discussed in part III-B.

A. ATTACKS FOR CLASSIFICATION1) BOX-CONSTRAINED L-BFGSSzegedy et al. [22] first demonstrated the existence of smallperturbations to the images, such that the perturbed imagescould fool deep learning models into misclassification. LetIc ∈ Rm denote a vectorized clean image - the subscript ‘c’emphasizes that the image is clean. To compute an additiveperturbation ρ ∈ Rm that would distort the image veryslightly to fool the network, Szegedy et al. proposed to solvethe following problem:

minρ||ρ||2 s.t. C(Ic + ρ) = `; Ic + ρ ∈ [0, 1]m, (1)

where ‘`’ denotes the label of the image and C(.) is thedeep neural network classifier. The authors proposed to solve(1) for its non-trivial solution where ‘`’ is different fromthe original label of Ic. In that case, (1) becomes a hardproblem, hence an approximate solution is sought using abox-constrained L-BFGS [20]. This is done by finding theminimum c > 0 for which the minimizer ρ of the followingproblem satisfies the condition C(Ic + ρ) = `:

minρ

c|ρ| + L(Ic + ρ, `) s.t. Ic + ρ ∈ [0, 1]m, (2)

where L(., .) computes the loss of the classifier. We note that(2) results in the exact solution for a classifier that has aconvex loss function. However, for deep neural networks, thisis generally not the case. The computed perturbation is simplyadded to the image to make it an adversarial example.As shown in Fig. 2, the above method is able to compute

perturbations that when added to clean images fool a neuralnetwork, but the adversarial images appear similar to theclean images to the human vision system. It was observed bySzegedy et al. that the perturbations computed for one neuralnetwork were also able to fool multiple networks. Theseastonishing results identified a blind-spot in deep learning.At the time of this discovery the Computer Vision communitywas fast adapting to the impression that deep learning featuresdefine the space where perceptual distances are well approxi-mated by the Euclidean distances. Hence, these contradictoryresults triggered a wide interest of researchers in adversarialattacks on deep learning in Computer Vision.

2) FAST GRADIENT SIGN METHOD (FGSM)It was observed by Szegedy et al. [22] that the robustnessof deep neural networks against the adversarial examples

14412 VOLUME 6, 2018


FIGURE 2. Illustration of adversarial examples generated using [22] forAlexNet [9]. The perturbations are magnified 10x for better visualization(values shifted by 128 and clamped). The predicted labels of adversarialexamples are also shown.

could be improved by adversarial training. To enable effec-tive adversarial training, Goodfellow et al. [23] developed amethod to efficiently compute an adversarial perturbation fora given image by solving the following problem:

ρ = ε sign (∇J (θ , Ic, `)) , (3)

where ∇J (., ., .) computes the gradient of the cost functionaround the current value of the model parameters θ w.r.t. Ic,sign(.) denotes the sign function and ε is a small scalar valuethat restricts the norm of the perturbation. The method forsolving (3) was termed ‘Fast Gradient Sign Method’ (FGSM)in the original work.

Interestingly, the adversarial examples generated byFGSM exploit the ‘linearity’ of deep network models inthe higher dimensional space whereas such models werecommonly thought to be highly non-linear at that time.Goodfellow et al. [23] hypothesized that the designs of mod-ern deep neural networks that (intentionally) encourage linearbehavior for computational gains, alsomake them susceptibleto cheap analytical perturbations. In the related literature, thisidea is often referred to as the ‘linearity hypothesis’, which issubstantiated by the FGSM approach.

Kurakin et al. [80] noted that on the popular large-scaleimage recognition data set ImageNet [11], the top-1 error rateon the adversarial examples generated by FGSM is around63 − 69% for ε ∈ [2, 32]. The authors also proposed a‘one-step target class’ variation of the FGSM where insteadof using the true label ` of the image in (3), they used thelabel `target of the least likely class predicted by the networkfor Ic. The computed perturbation is then subtracted fromthe original image to make it an adversarial example. For aneural network with cross-entropy loss, doing so maximizesthe probability that the network predicts `target as the labelof the adversarial example. It is suggested, that a random

class can also be used as the target class for fooling thenetwork, however it may lead to less interesting fooling,e.g. misclassification of one breed of dog as another dogbreed. The authors also demonstrated that adversarial trainingimproves robustness of deep neural networks against theattacks generated by FGSM and its proposed variants.

The FGSM perturbs an image to increase the loss ofthe classifier on the resulting image. The sign functionensures that the magnitude of the loss is maximized, whileε essentially restricts the `∞-norm of the perturbation.Miyato et al. [103] proposed a closely related method tocompute the perturbation as follows

ρ = ε∇J (θ , Ic, `)||∇J (θ , Ic, `)||2

. (4)

In the above equation, the computed gradient is normalizedwith its `2-norm. Kurakin et al. [80] referred to this techniqueas ‘Fast Gradient L2’ method and also proposed an alternativeof using the `∞-norm for normalization, and referred to theresulting technique as ‘Fast Gradient L∞’ method. Broadlyspeaking, all of these methods are seen as ‘one-step’ or ‘one-shot’ methods in the literature related to adversarial attacksin Computer Vision.

3) BASIC & LEAST-LIKELY-CLASS ITERATIVE METHODSThe one-step methods perturb images by taking a single largestep in the direction that increases the loss of the classifier(i.e. one-step gradient ascent). An intuitive extension of thisidea is to iteratively take multiple small steps while adjustingthe direction after each step. The Basic Iterative Method(BIM) [35] does exactly that, and iteratively computes thefollowing:

Ii+1ρ = Clipε{Iiρ + α sign(∇J (θ , Iiρ, `)

}, (5)

where Iiρ denotes the perturbed image at the ith iteration,Clipε{.} clips (the values of the pixels of) the image in itsargument at ε and α determines the step size (normally,α = 1). The BIM algorithm starts with I0ρ = Ic andruns for the number of iterations determined by the formulabmin(ε+4, 1.25ε)c. Madry et al. [55] pointed out that BIM isequivalent to (the `∞ version of) Projected Gradient Descent,a standard convex optimization method.

Similar to extending the FGSM to its ‘one-step target class’variation, Kurakin et al. [35] also extended BIM to IterativeLeast-likely Class Method (ILCM). In that case, the label `of the image in (5) is replaced by the target label `target of theleast likely class predicted by the classifier. The adversarialexamples generated by the ILCM method has been shown toseriously affect the classification accuracy of a modern deeparchitecture Inception v3 [144], even for very small valuesof ε, e.g. < 16.

4) JACOBIAN-BASED SALIENCY MAP ATTACK (JSMA)In the literature, it is more common to generateadversarial examples by restricting `∞ or `2-norms of the

VOLUME 6, 2018 14413


perturbations to make them imperceptible for humans. How-ever, Papernot et al. [60] also created an adversarial attackby restricting the `0-norm of the perturbations. Physically,it means that the goal is to modify only a few pixels inthe image instead of perturbing the whole image to foolthe classifier. The crux of their algorithm to generate thedesired adversarial image can be understood as follows.The algorithm modifies pixels of the clean image one at atime and monitors the effects of the change on the resultingclassification. The monitoring is performed by computinga saliency map using the gradients of the outputs of thenetwork layers. In this map, a larger value indicates a higherlikelihood of fooling the network to predict `target as thelabel of the modified image instead of the original label `.Thus, the algorithm performs targeted fooling. Once the maphas been computed, the algorithm chooses the pixel that ismost effective to fool the network and alters it. This processis repeated until either the maximum number of allowablepixels are altered in the adversarial image or the foolingsucceeds.

5) ONE PIXEL ATTACKAn extreme case for the adversarial attack is when only onepixel in the image is changed to fool the classifier. Inter-estingly, Su et al. [68] claimed successful fooling of threedifferent network models on 70.97% of the tested images bychanging just one pixel per image. They also reported that theaverage confidence of the networks on the wrong labels wasfound to be 97.47%. We show representative examples of theadversarial images from [68] in Fig. 3. Su et al. computedthe adversarial examples by using the concept of DifferentialEvolution [146]. For a clean image Ic, they first created aset of 400 vectors in R5 such that each vector containedxy-coordinates and RGB values for an arbitrary candidatepixel. Then, they randomly modified the elements of thevectors to create children such that a child competes with itsparent for fitness in the next iteration, while the probabilisticpredicted label of the network is used as the fitness criterion.The last surviving child is used to alter the pixel in the image.

FIGURE 3. Illustration of one pixel adversarial attacks [68]: The correctlabel is mentioned with each image. The corresponding predicted label isgiven in parentheses.

Evenwith such a simple evolutionary strategy Su et al. [68]were able to show successful fooling of deep networks.Notice that, differential evolution enables their approach togenerate adversarial examples without having access to anyinformation about the network parameter values or their gra-dients. The only input their technique requires is the proba-bilistic labels predicted by the targeted model.

6) CARLINI AND WAGNER ATTACKS (C&W)A set of three adversarial attacks were introduced by Carliniand Wagner [36] in the wake of defensive distillation againstthe adversarial perturbations [38]. These attacks make theperturbations quasi-imperceptible by restricting their `2, `∞and `0 norms, and it is shown that defensive distillation forthe targeted networks almost completely fails against theseattacks. Moreover, it is also shown that the adversarial exam-ples generated using the unsecured (un-distilled) networkstransfer well to the secured (distilled) networks, which makesthe computed perturbations suitable for black-box attacks.

Whereas it is more common to exploit the transferabil-ity property of adversarial examples to generate black-boxattacks, Chen et al. [41] also proposed ‘Zeroth Order Opti-mization (ZOO)’ based attacks that directly estimate the gra-dients of the targeted model for generating the adversarialexamples. These attacks were inspired by C&W attacks.We refer to the original papers for further details on C&Wand ZOO attacks.

7) DEEPFOOLMoosavi-Dezfooli et al. [72] proposed to compute a minimalnorm adversarial perturbation for a given image in an iterativemanner. Their algorithm, i.e. DeepFool initializes with theclean image that is assumed to reside in a region confined bythe decision boundaries of the classifier. This region decidesthe class-label of the image. At each iteration, the algorithmperturbs the image by a small vector that is computed totake the resulting image to the boundary of the polyhydronthat is obtained by linearizing the boundaries of the regionwithin which the image resides. The perturbations added tothe image in each iteration are accumulated to compute thefinal perturbation once the perturbed image changes its labelaccording to the original decision boundaries of the network.The authors show that the DeepFool algorithm is able tocompute perturbations that are smaller than the perturbationscomputed by FGSM [23] in terms of their norm, while havingsimilar fooling ratios.

8) UNIVERSAL ADVERSARIAL PERTURBATIONSWhereas the methods like FGSM [23], ILCM [35], Deep-Fool [72] etc. compute perturbations to fool a network on asingle image, the ‘universal’ adversarial perturbations com-puted by Moosavi-Dezfooli et al. [16] are able to fool anetwork on ‘any’ image with high probability. These image-agnostic perturbations also remain quasi-imperceptible forthe human vision system, as can be observed in Fig. 1.To formally define these perturbations, let us assume that

14414 VOLUME 6, 2018


clean images are sampled from the distribution=c. A pertur-bation ρ is ‘universal’ if it satisfies the following constraint:

PIc∼=c

(C(Ic) 6= C(Ic + ρ)

)≥ δ s.t. ||ρ||p ≤ ξ, (6)

where P(.) denotes the probability, δ ∈ (0, 1] is the foolingratio, ||.||p denotes the `p-norm and ξ is a pre-defined con-stant. The smaller the value of ξ , the harder it is to perceivethe perturbation in the image. Strictly speaking, the perturba-tions that satisfy (6) should be referred to as (δ, ξ )-universalbecause of their strong dependence on the mentioned param-eters. However, these perturbations are commonly referred toas the ‘universal adversarial perturbations’ in the literature.

The authors computed the universal perturbations byrestricting their `2-norm as well as `∞-norm, and showedthat the perturbations with their norms upper bounded by 4%of the respective image norms already achieved significantfooling ratios of around 0.8 or more for state-of-the-art imageclassifiers. Their iterative approach to compute a perturbationis related to the DeepFool strategy [72] of gradually pushinga data point (i.e. an image) to the decision boundary for itsclass. However, in this case, ‘all’ the training data points aresequentially pushed to the respective decision boundaries andthe perturbations computed over all the images are gradu-ally accumulated by back-projecting the accumulator to thedesired `p ball of radius ξ every time.The algorithm proposed by Moosavi-Dezfooli et al. [16]

computes perturbations while targeting a single networkmodel, e.g. ResNet [15]. However, it is shown that theseperturbations also generalize well across different networks(especially those having similar architectures). In that sense,the author’s claim the perturbations to be, to some extent,‘doubly universal’. Moreover, it is also shown that highfooling ratio (e.g. δ ≥ 0.5) is achievable by learning aperturbation using only around 2, 000 training images.

9) UPSET AND ANGRISarkar et al. [145] proposed two black-box attack algorithms,namely UPSET: Universal Perturbations for Steering to ExactTargets, and ANGRI: Antagonistic Network for GeneratingRogue Images for targeted fooling of deep neural networks.For ‘n’ classes, UPSET seeks to produce ‘n’ image-agnosticperturbations such that when the perturbation is added to animage that does not belong to a targeted class, the classifierwill classify the perturbed image as being from that class. Thepower of UPSET comes from a residual generating networkR(.), that takes the target class ‘t’ as input and produces aperturbation R(t) for fooling. The overall method solves thefollowing optimization problem using the so-called UPSETnetwork:

Iρ = max(min(sR(t)+ Ic, 1),−1), (7)

where the pixel values in Ic are normalized to lie in [−1, 1],and ‘s’ is a scalar. To ensure Iρ to be a valid image, all valuesoutside the interval [−1, 1] are clipped. As compared to theimage-agnostic perturbations of UPSET, ANGRI computes

image-specific perturbations in a closely related manner, forwhich we refer to the original work. The perturbations result-ing from ANGRI are also used for targeted fooling. Bothalgorithms have been reported to achieve high fooling ratioson MNIST [10] and CIFAR-10 [150] datasets.

10) HOUDINICisse et al. [130] proposed ‘Houdini’- an approach for foolinggradient-based learning machines by generating adversarialexamples that can be tailored to task losses. Typical algo-rithms that generate adversarial examples employ gradientsof differentiable loss functions of the networks to compute theperturbations. However, task losses are often not amenableto this approach. For instance, the task loss of speech recog-nition is based on word-error-rate, which does not allowstraightforward exploitation of loss function gradient. Hou-dini is tailored to generate adversarial examples for suchtasks. Besides successful generation of adversarial imagesfor classification, Houdini has also been shown to success-fully attack a popular deep Automatic Speech Recognitionsystem [149]. The authors have also demonstrated the trans-ferability of attacks in speech recognition by fooling GoogleVoice in a black-box attack scenario. Moreover, successfultargeted and non-targeted attacks are also demonstrated for adeep learning model for human pose estimation.

11) ADVERSARIAL TRANSFORMATION NETWORKS (ATNs)Baluja and Fischer [42] trained feed-forward neural networksto generate adversarial examples against other targeted net-works or set of networks. The trained models were termedAdversarial Transformation Networks (ATNs). The adversar-ial examples generated by these networks are computed byminimizing a joint loss function comprising of two parts. Thefirst part restricts the adversarial example to have perceptualsimilarity with the original image, whereas the second partaims at altering the prediction of the targeted network on theresulting image.

Along the same direction, Hayex and Danezis [47] alsoused an attacker neural network to learn adversarial examplesfor black-box attacks. In the presented results, the examplescomputed by the attacker network remain perceptually indis-tinguishable from the clean images but they are misclassifiedby the targeted networks with overwhelming probabilities -reducing classification accuracy from 99.4% to 0.77% onMNIST data [10], and from 91.4% to 6.8% on the CIFAR-10dataset [150].

12) MISCELLANEOUS ATTACKSThe adversarial attacks discussed above are either the pop-ular ones in the recent literature or they are representativeof the research directions that are fast becoming popular.A summary of the main attributes of these attacks is alsoprovided in Table 1. For a comprehensive study, below weprovide brief descriptions of further techniques to generateadversarial attacks on deep neural networks. We note thatthis research area is currently highly active. Whereas every

VOLUME 6, 2018 14415


TABLE 1. Summary of the attributes of diverse attacking methods: The ‘perturbation norm’ indicates the restricted `p-norm of the perturbations to makethem imperceptible. The strength (higher for more asterisks) is based on the impression from the reviewed literature.

attempt has been made to review as many approaches aspossible, we do not claim the review to be exhaustive. Dueto high activity in this research direction, many more attacksare likely to surface in the near future.

Sabour et al. [26] showed the possibility of generatingadversarial examples by altering the internal layers of deepneural networks. The authors demonstrated that it is possibletomake internal network representation of adversarial imagesto resemble representations of images from different classes.Papernot et al. [109] studied transferability of adversarialattacks for deep learning as well as other machine learn-ing techniques and introduced further transferability attacks.Narodytska andKasiviswanathan [54] also introduced furtherblack-box attacks that have been found effective in foolingthe neural networks by changing only few pixel values inthe images. Liu et al. [31] introduced ‘epsilon-neighborhood’attack that have been shown to fool defensively distillednetworks [108] with 100% success for white-box attacks.Oh et al. [132] took a ‘Game Theory’ perspective on adver-sarial attacks and derived a strategy to counter the counter-measures taken against adversarial attacks on deep neuralnetworks. Mpouri et al. [134] developed a data-independentapproach to generate universal adversarial perturbations forthe deep network models. Hosseini et al. [98] introducedthe notion of ‘semantic adversarial examples’ - input imagesthat represent semantically same objects for humans but deepneural networks misclassify them. They used negatives of theimages as semantic adversarial examples. Kanbak et al. [73]introduced ‘ManiFool’ algorithm in the wake of DeepFoolmethod [72] to measure robustness of deep neural networksagainst geometrically perturbed images. Dong et al. [168]proposed an iterative method to boost adversarial attacks forblack-box scenarios. Recently, Carlini and Wagner [49] alsodemonstrated that ten different defenses against perturbationscan again be defeated by new attacks constructed using newloss functions. Rozsa et al. [94] also proposed a ‘hot/cold’method for generatingmultiple possible adversarial examplesfor a single image.

We note that the authors of many works reviewed inthis article have made the source code of their imple-mentations publicly available. This is one of the majorreasons behind the current rise in this research direction.

Beside those resources, there are also libraries, e.g.Cleverhans [111], [112] that have started emerging inorder to further boost this research direction. Adversarial-Playground (https: //github.com/QData/AdversarialDNN-Playground) is another example of a toolbox made public byNorton and Qi [141] to understand adversarial attacks.

B. ATTACKS BEYOND CLASSIFICATION/RECOGNITIONWith the exception of Houdini [130], all the mainstreamadversarial attacks reviewed in Section III-A directly focusedon the task of classification - typically fooling CNN-based [10] classifiers. However, due to the seriousness ofadversarial threats, attacks are also being actively investi-gated beyond the classification/recognition task in ComputerVision. Below, we review the works that develop approachesto attack deep neural networks beyond classification.

1) ATTACKS ON AUTOENCODERS ANDGENERATIVE MODELSTabacof et al. [127] investigated adversarial attacks forautoencoders [152], and proposed a technique to distort inputimage (to make it adversarial) that misleads the autoencoderto reconstruct a completely different image. Their approachattacks the internal representation of a neural network suchthat the representation for the adversarial image becomessimilar to that of the target image. However, it is reportedin [127] that autoencoders seem to be much more robustto adversarial attacks than the typical classifier networks.Kos et al. [120] also explored methods for computing adver-sarial examples for deep generative models, e.g. variationalautoencoder (VAE) and theVAE-Generative Adversarial Net-works (VAE-GANs). GANs, such as [151] are becomingexceedingly popular now-a-days in Computer Vision appli-cations due to their ability to learn data distributions and gen-erate realistic images using those distributions. The authorsintroduced three different classes of attacks for VAE andVAE-GANs. Owing to the success of these attacks it isconcluded that the deep generative models are also vulner-able to adversaries that can convince them to turn inputsinto very different outputs. This work adds further supportto the hypothesis that ‘‘adversarial examples are a generalphenomenon for current neural network architectures’’.

14416 VOLUME 6, 2018


2) ATTACK ON RECURRENT NEURAL NETWORKSPapernot et al. [110] successfully generated adversarial inputsequences for Recurrent Neural Networks (RNNs). RNNs aredeep learning models that are particularly suitable for learn-ing mappings between sequential inputs and outputs [153].Papernot et al. demonstrated that the algorithms proposedto compute adversarial examples for the feed-forward neuralnetworks (e.g. FGSM [23]) can also be adapted for foolingRNNs. In particular, the authors demonstrated successfulfooling of the popular Long-Short-Term-Memory (LSTM)RNN architecture [154]. It is concluded that the cyclic neu-ral network model like RNNs are also not immune to theadversarial perturbations that were originally uncovered inthe context of acyclic neural networks, i.e. CNNs.

3) ATTACKS ON DEEP REINFORCEMENT LEARNINGLin et al. [133] proposed two different adversarial attacksfor the agents trained by deep reinforcement learning [155].In the first attack, called ‘strategically-timed attack’,the adversary minimizes the reward of the agent by attackingit at a small subset of time steps in an episode. A method isproposed to determine when an adversarial example shouldbe crafted and applied, which enables the attack to go unde-tected. In the second attack, referred as ‘enchanting attack’,the adversary lures the agent to a designated target state byintegrating a generative model and a planning algorithm. Thegenerative model is used for predicting the future states of theagent, whereas the planning algorithm generates the actionsfor luring it. The attacks are successfully tested againstthe agents trained by the state-of-the-art deep reinforcementlearning algorithms [155], [156]. Details on this work andexample videos of the adversarial attacks can be found on thefollowing URL: http: //yclin.me/adversarial_attack_RL/.

In another work, Huang et al. [62] demonstrated thatFGSM [23] can also be used to significantly degrade per-formance of trained policies in the context of deep rein-forcement learning. Their threat model considers adversariesthat are capable of introducing minor perturbations to theraw input of the policy. The conducted experiments demon-strate that it is fairly easy to confuse neural network poli-cies with adversarial examples, even in black-box scenarios.Videos and further details on this work are available onhttp://rll.berkeley.edu/adversarial/.

4) ATTACKS ON SEMANTIC SEGMENTATION &OBJECT DETECTIONSemantic image segmentation and object detection are amongthe mainstream problems in Computer Vision. Inspired byMoosavi-Dezfooli et al. [16] and Metzen et al. [67] showedthe existence of image-agnostic quasi-imperceptible pertur-bations that can fool a deep neural network into signifi-cantly corrupting the predicted segmentation of the images.Moreover, they also showed that it is possible to computenoise vectors that can remove a specific class from the seg-mented classes while keepingmost of the image segmentation

FIGURE 4. Adversarial example for semantic segmentation and objectdetection [115]. FCN [157] and Faster-RCNN [148] are used forsegmentation and detection, respectively. Left column (top-down): Cleanimage, normal segmentation (purple region is predicted as dog) anddetection results. Right column (top-down): Perturbation 10x, fooledsegmentation (light green region is predicted as train and the pink regionas person) and detection results.

unchanged (e.g. removing pedestrians from road scenes).Although it is argued that the ‘‘space of the adversarial per-turbations for the semantic image segmentation is presum-ably smaller than image classification’’, the perturbationshave been shown to generalize well for unseen validationimages with high probability. Arnab et al. [51] also evaluatedFGSM [23] based adversarial attacks for semantic segmenta-tion and noted that many observations about these attacks forclassification do not directly transfer to segmentation task.

Xie et al. [115] computed adversarial examples for seman-tic segmentation and object detection under the observationthat these tasks can be formulated as classifying multipletargets in an image - the target is a pixel or a receptive fieldin segmentation, and object proposal in detection. Under thisperspective, their approach, called ‘Dense Adversary Gener-ation’ optimizes a loss function over a set of pixels/proposalsto generate adversarial examples. The generated examples aretested to fool a variety of deep learning based segmentationand detection approaches. Their experimental evaluation notonly demonstrates successful fooling of the targeted networksbut also shows that the generated perturbations generalizewell across different network models. In Fig. 4, we show arepresentative example of network fooling for segmentationand detection using the approach in [115].

5) ATTACKS ON FACE ATTRIBUTESFace attributes are among the emerging soft biometrics formodern security systems. Although face attribute recogni-tion can also be categorized as a classification problem, weseparately review some interesting attacks in this direction

VOLUME 6, 2018 14417


because face recognition itself is treated as a mainstreamproblem in Computer Vision.

Rozsa et al. [129], [158] explored the stability of multipledeep learning approaches using the CelebA benchmark [159]by generating adversarial examples to alter the results offacial attribute recognition, see top-row in Fig. 5. By attackingthe deep network classifiers with their so-called ‘Fast Flip-ping Attribute’ technique, they found that robustness of deepneural networks against the adversarial attacks varies highlybetween facial attributes. It is claimed that adversarial attacksare very effective in changing the label of a target attributeto a correlated attribute. Mirjalili and Ross [160] proposed atechnique that modifies a face image such that its gender (fora gender classifier) is modified, whereas its biometric utilityfor a face matching system remains intact, see bottom-rowin Fig. 5. Similarly, Shen et al. [143] proposed two differenttechniques to generate adversarial examples for faces that canhave high ‘attractiveness scores’ but low ‘subjective scores’for the face attractiveness evaluation using deep neural net-work. We refer to [183] for further attacks related to the taskof face recognition.

FIGURE 5. Top-row: Example of changing a facial attribute ‘wearinglipstick’ to ‘not wearing lipstick’ by Fast Flipping Attribute method [129].Bottom row: Changing gender with perturbation generated by [160].

IV. ATTACKS IN THE REAL WORLDThe literature reviewed in Section III assumes settings whereadversaries directly feed deep neural networks with perturbedimages. Moreover, the effectiveness of attacks are also eval-uated using standard image databases. Whereas those set-tings have proven sufficient to convince many researchersthat adversarial attacks are a real concern for deep learningin practice, we also come across instances in the literature(e.g. [30], [48]) where this concern is down-played and adver-sarial examples are implicated to be ‘only a matter of curios-ity’ with little practical concerns. Therefore, this Section isspecifically dedicated to the literature that deals with theadversarial attacks in practical real-world conditions to helpsettle the debate.

A. CELL-PHONE CAMERA ATTACKKurakin et al. [35] first demonstrated that threats of adver-sarial attacks also exist in the physical world. To illustratethis, they printed adversarial images and took snapshots froma cell-phone camera. These images were fed to Tensor-Flow Camera Demo app [179] that uses Google’s Inceptionmodel [144] for object classification. It was shown that a largefraction of images were misclassified even when perceivedthrough the camera. In Fig. 6, an example is shown fromthe original paper. A video is also provided on the followingURL https://youtu.be/zQ_uMenoBCk that shows the threatof adversarial attacks with further images. This work studiesFGSM [23], BIM and ILCM [35] methods for attacks in thephysical world.

B. ROAD SIGN ATTACKEtimov et al. [75] built on the attacks proposed in [36]and [88] to design robust perturbations for the physical world.They demonstrated the possibility of attacks that are robustto physical conditions, such as variation in view angles,distance and resolution. The proposed algorithm, termedRP2 for Robust Physical Perturbations, was used to generateadversarial examples for road sign recognition systems thatachieved high fooling ratios in practical drive-by settings.Two attack classes were introduced in this work for thephysical road signs, (a) poster-printing: where the attackerprints a perturbed road sign poster and places it over thereal sign (see Fig. 7), (b) sticker perturbation: where theprinting is done on a paper and the paper is stuck overthe real sign. For (b) two types of perturbations were stud-ied, (b1) subtle perturbations: that occupied the entire signand (b2) camouflage perturbations: that took the form ofgraffiti sticker on the sign. As such, all these perturbationsrequire access to a color printer and no other special hard-ware. Successful generation of perturbations for both (a) and(b) such that the perturbations remained robust to naturalvariations in the physical world demonstrate the threat ofadversarial example in the real world. We refer to the follow-ing URL for further details and videos related to this work:https://iotsecurity.eecs.umich.edu/#roadsigns.

It should be noted that Lu et al. [30] had previouslyclaimed that adversarial examples are not a concern for objectdetection in Autonomous Vehicles because of the changingphysical conditions in a moving car. However, the attack-ing methods they employed [22], [23], [35] were somewhatprimitive. The findings of Etimov et al. [75] are orthog-onal to the results in [66]. However, in a follow-up workLu et al. [19] showed that the detectors like YOLO9000 [147]and FasterRCNN [148] are ‘currently’ not fooled by theattacks introduced by Etimov et al. [75]. Zeng et al. [87]also argue that adversarial perturbations in the image spacedo not generalize well in the physical space of the real-world.However, Athalye et al. [65] showed that we can actually print3D physical objects for successful adversarial attacks in thephysical world. We discuss [65] in Section IV-C.

14418 VOLUME 6, 2018


FIGURE 6. Example of adversarial attack on mobile phone cameras: A clean image (a) was taken and used togenerate different adversarial images. The images were printed and the TensorFlow Camera Demo app [179] wasused to classify them. A clean image (b) is recognized correctly as a ‘washer’ when perceived through the camera,whereas adversarial images (c) and (d) are mis-classified. The images also show network confidence in the range[0,1] for each image. The value of ε is given for (3).

FIGURE 7. Example of road sign attack [75]: The success rate of fooling LISA-CNN [75] classifier on all the shownimages is 100%. The distance and angle to the camera are also shown. The classifier is trained using LISA dataset forroad signs [174].

Gu et al. [33] also explored an interesting notion of threatsto outsourced training of the neural networks in the contextof fooling neural networks on street signs. They showed thatit is possible to train a network (a BadNet) that shows state-of-the-art performance on the user’s training and validationsamples, but behaves badly on attacker-chosen inputs. Theydemonstrated this attack in a realistic scenario by creating astreet sign classifier that identifies stop signs as speed limitswhen a special sticker is added to the stop sign. Moreover,it was found that the fooling of the network persisted to areasonable extent even when the network was later fine-tunedwith additional training data.

C. GENERIC ADVERSARIAL 3D OBJECTSAthalye et al. [65] introduced a method for constructing 3Dobjects that can fool neural networks across a wide varietyof angles and viewpoints. Their ‘Expectation Over Trans-formation’ (EOT) framework is able to construct examplesthat are adversarial over an entire distribution of image/objecttransformations. Their end-to-end approach is able to printarbitrary adversarial 3D objects. In our opinion, results of thiswork ascertain that adversarial attacks are a real concern fordeep learning in the physical world. In Fig. 8 we show an

example of 3D-printed turtle that is modified by EOT frame-work to be classified as rifle. A video demonstrating the fool-ing by EOT in the physical world is available at the followingURL: https://www.youtube.com/watch?v=YXy6oX1iNoA&feature=youtu.be.

D. CYBERSPACE ATTACKSPapernot et al. [39] launched one of the first attacks againstthe deep neural network classifiers in cyberspace in the real-world settings. They trained a substitute network for the tar-geted black-box classifier on synthetic data, and instantiatedthe attack against remotely hosted neural networks by Meta-Mind, Amazon and Google. They were able to show that therespective targeted networks misclassified 84.24%, 96.19%and 88.94% of the adversarial examples generated by theirmethod. Indeed, the only information available to the attackerin their threat model was the output label of the targetednetwork for the input image fed by the attacker. In a relatedwork, Liu et al. [88] developed an ensemble based attack andshowed its success against Clarifai.com - a commercial com-pany providing state-of-the-art image classification services.The authors claim that their attacks for both targeted and non-targeted fooling are able to achieve high success rates.

VOLUME 6, 2018 14419


FIGURE 8. Different random poses of a 3D-printed turtle perturbed by EOT [65] to be classified as a rifle byan ImageNet classifier. The unperturbed version (not shown) is classified correctly with 100% probability.

Grosse et al. [61] showed construction of effective adver-sarial attacks for neural networks used as malware clas-sifiers. As compared to image recognition, the domain ofmalware classification introduces additional constraints inthe adversarial settings, e.g. continuous input domains arereplaced by discrete inputs, the condition of visual similar-ity is replaced by requiring equivalent functional behavior.However, Grosse et al. [61] showed that creating effectiveadversarial examples is still possible for maleware classi-fication. Further examples of successful adversarial attacksagainst deep lrearning based malware classification can alsobe found in [64], [107], and [124].

E. ROBOTIC VISION & VISUAL QA ATTACKSMelis et al. [63] demonstrated the vulnerability of robots tothe adversarial manipulations of the input images using thetechniques in [22]. The authors argue that strategies to enforcedeep neural networks to learn more stable representationsare necessary for secure robotics. Xu et al. [40] generatedadversarial attacks for the Visual Turing Test, also knownas ‘Visual Question Answer’ (VQA). The authors show thatthe commonly used compositional and non-compositionalVQA architectures that employ deep neural networks arevulnerable to adversarial attacks. Moreover, the adversarialexamples are transferable between the models. They con-clude that the ‘‘adversarial examples pose real threats to notonly image classification models, but also more complicatedVQA models’’ [63].

V. ON THE EXISTENCE OF ADVERSARIAL EXAMPLESIn the literature related to adversarial attacks on deep learningin Computer Vision, there are varied views on the exis-tence of adversarial examples. These views generally alignwell with the local empirical observations made by theresearchers while attacking or defending the deep neuralnetworks. However, they often fall short in terms of gener-alization. For instance, the popular linearity hypothesis ofGoodfellow et al. [23] explains the FGSM and related attacks

very well. However, Tanay and Griffin [74] demonstratedimage classes that do not suffer from adversarial examplesfor linear classifier, which is not in-line with the linearityhypothesis. Not to mention, the linearity hypothesis itselfdeviates strongly from the previously prevailing opinion thatthe adversarial examples stem from highly non-linear deci-sion boundaries induced by deep neural networks. Thereare also other examples in the literature where the linearityhypothesis is not directly supported [118].

Flatness of decision boundaries [69], large local curvatureof the decision boundaries [70] and low flexibility of thenetworks [71] are some more examples of the viewpoints onthe existence of adversarial examples that do not perfectlyalign with each other. Whereas it is apparent that adversarialexamples can be formed by modifying as little as one pixelin an image, current literature seems to lack consensus on thereasons for the existence of the adversarial examples. Thisfact also makes analysis of adversarial examples an activeresearch direction that is expected to explore and explain thenature of the decision boundaries induced by deep neural net-works, which are currently more commonly treated as black-boxmodels. Below, we review the works that mainly focus onanalyzing the existence of adversarial perturbations for deeplearning. We note that, besides the literature reviewed below,works related to adversarial attacks (Section III) and defenses(Section VI) often provide brief analysis of adversarial per-turbations while conjecturing about the phenomena resultingin the existence of the adversarial examples.

A. LIMITS ON ADVERSARIAL ROBUSTNESSFawzi et al. [117] introduced a framework for studying theinstability of classifiers to adversarial perturbations. Theyestablished fundamental limits on the robustness of classi-fiers in terms of a ‘distinguishability measure’ between theclasses of the dataset, where distinguishability is defined asthe distance between the means of two classes for linearclassifiers and the distance between the matrices of secondorder moments for the studied non-linear classifiers. This

14420 VOLUME 6, 2018


work shows that adversarial examples also exist for the clas-sifiers beyond deep neural networks. The presented analysistraces back the phenomenon of adversarial instability to thelow flexibility of the classifiers, which is not completelyorthogonal to the prevailing belief at that time that high-nonlinearity of the networks make them susceptible to adver-sarial examples.

B. SPACE OF ADVERSARIAL EXAMPLESTabacof and Valle [25] generated adversarial examples forshallow and deep network classifiers on MNIST [10] andImageNet [11] datasets and probed the pixel space of adver-sarial examples by using noise of varying distribution andintensity. The authors empirically demonstrated that adver-sarial examples appear in large regions in the pixel space,which is in-line with the similar claim in [23]. However,somewhat in contrast to the linearity hypothesis, they arguethat a weak, shallow and more linear classifier is also assusceptible to adversarial examples as a strong deep classifier.

Tramer et al. [131] proposed a method to estimate thedimensionality of the space of the adversarial examples. It isclaimed that the adversarial examples span a contiguous highdimension space (e.g. with dimensionality ≈ 25). Due tohigh dimensionality, the subspaces of different classifierscan intersect, which gives rise to the transferability of theadversarial examples. Interestingly, their analysis suggeststhat it is possible to defend classifiers against transfer-basedattacks even when they are vulnerable to direct attacks.

C. BOUNDARY TILTING PERSPECTIVETanay and Griffin [74] provided a ‘boundary tilting’ per-spective on the existence of adversarial examples for deepneural networks. They argued that generally a single classdata that is sampled to learn and evaluate a classifier livesin a sub-manifold of the class, and adversarial examples forthat class exist when the classification boundary lies close tothis sub-manifold. They formalized the notion of ‘adversarialstrength’ of a classifier and reduced it to the ‘deviation angle’between the boundaries of the considered classifier and thenearest centroid classifier. It is then shown that adversarialstrength of a classifier can be varied by decision ‘boundarytilting’. The authors also argued that adversarial stability ofthe classifier is associated with its regularization. In the opin-ion of Tanay and Griffin, the linearity hypothesis [23] aboutthe existence of adversarial examples is ‘‘unconvincing’’.

D. PREDICTION UNCERTAINTY AND EVOLUTIONARYSTALLING OF TRAINING CAUSE ADVERSARIESCubuk et al. [91] argue that the ‘‘origin of adversarial exam-ples is primarily due to an inherent uncertainty that neuralnetworks have about their predictions’’. They empiricallycompute a functional form of the uncertainty, which is shownto be independent of network architecture, training protocoland dataset. It is argued that this form only depends onthe statistics of the network logit differences. This eventu-ally results in fooling ratios caused by adversarial attacks

to exhibit a universal scaling with respect to the size ofperturbation. They studied FGSM [23], ILCM and BIM [35]based attacks to corroborate their claims. It is also claimedthat accuracy of a network on clean images correlates with itsadversarial robustness (see Section V-E for more argumentsin this direction).

Rozsa et al. [102] hypothesized that the existence of adver-sarial perturbations is a result of evolutionary stalling ofdecision boundaries on training images. In their opinion,individual training samples stop contributing to the trainingloss of the model (i.e. neural network) once they are clas-sified correctly, which can eventually leave them close tothe decision boundary. Hence, it becomes possible to throwthose (and similar) samples away to a wrong class region byadding minor perturbations. They proposed a Batch AdjustedNetwork Gradients (BANG) algorithm to train a network tomitigate the evolutionary stalling during training.

E. ACCURACY-ADVERSARIAL ROBUSTNESS CORRELATIONIn the quest of explaining the existence of adversarial pertur-bations, Rozsa et al. [97] empirically analyzed the correla-tion between the accuracy of eight deep network classifiersand their robustness to three adversarial attacks introducedin [23] and [94]. The studied classifiers include AlexNet [9],VGG-16 and VGG-19 networks [161], Berkeley-trainedversion of GoogLeNet and Princeton-GoogLeNet [18],ResNet-52; ResNet-101; and ResNet-152 [15]. The adver-sarial examples are generated with the help of large-scaleImageNet dataset [11] using the techniques proposed in [23]and [94]. Their experiments lead to the observation that thenetworks with higher classification accuracy generally alsoexhibit more robustness against the adversarial examples.They also concluded that adversarial examples transfer betterbetween similar network topologies.

F. MORE ON LINEARITY AS THE SOURCEKortov and Hopfiled [126] examined the existence of adver-sarial perturbations in the context of Dense AssociativeMem-ory (DAM)models [162]. As compared to the typical moderndeep neural networks, DAM models employ higher order(more than quadratic) interactions between the neurons. Theauthors have demonstrated that adversarial examples gen-erated using DAM models with smaller interaction power,which is similar to using a deep neural network with ReLUactivation [163] for inducing linearity, are unable to fool mod-els having higher order interactions. The authors providedempirical evidence on the existence of adversarial examplesthat is independent of the FGSM [23] attack, yet supports thelinearity hypothesis of Goodfellow et al. [23].

G. EXISTENCE OF UNIVERSAL PERTURBATIONSMoosavi-Dezfooli et al. [16] initially argued that univer-sal adversarial perturbations exploit geometric correlationsbetween the decision boundaries induced by the classifiers.Their existence partly owes to a subspace containing nor-mals to the decision boundaries, such that the normals also

VOLUME 6, 2018 14421


FIGURE 9. Broad categorization of approaches aimed at defending deepneural networks against adversarial attacks.

surround the natural images. In [70], they built further ontheir theory and showed the existence of common directions(shared across datapoints) along which the decision boundaryof a classifier can be highly positively curved. They argue thatsuch directions play a key role in the existence of universalperturbations. Based on their findings, the authors also pro-pose a new geometricmethod to efficiently compute universaladversarial perturbations.

It is worth noting that previously Fawzi et al. [69] alsoassociated the theoretical bounds on the robustness of clas-sifiers to the curvature of decision boundaries. Similarly,Tramer et al. [77] also held the curvature of decision bound-aries in the vicinity of data points responsible for the vulner-ability of neural networks to black-box attacks.

VI. DEFENSES AGAINST ADVERSARIAL ATTACKSCurrently, the defenses against the adversarial attacks arebeing developed along three main directions:

1) Using modified training during learning or modifiedinput during testing.

2) Modifying networks, e.g. by adding more layers/sub-networks, changing loss/activation functions etc.

3) Using external models as network add-on when classi-fying unseen examples.

The approaches along the first direction do not directlydeal with the learning models. On the other hand, the othertwo categories are more concerned with the neural networksthemselves. The techniques under these categories can befurther divided into two types; namely (a) complete defenseand (b) detection only. The ‘complete defense’ approachesaim at enabling the targeted network to achieve its originalgoal on the adversarial examples, e.g. a classifier predictinglabels of adversarial examples with acceptable accuracy. Onthe other hand, ‘detection only’ approaches are meant to raisethe red flag on potentially adversarial examples to reject themin any further processing. The taxonomy of the describedcategories is also shown in Fig. 9. The remaining section isorganized according to this taxonomy. In the used taxonomy,the difference between ‘modifying’ a network and employingan ‘add-on’ is that the former makes changes to the originaldeep neural network architecture/parameters during training.On the other hand, the latter keeps the original model intactand appends external model(s) to it during testing.

A. MODIFIED TRAINING/INPUT1) BRUTE-FORCE ADVERSARIAL TRAININGSince the discovery of adversarial examples for the deepneural networks [22], there has been a general consensusin the related literature that robustness of neural networksagainst these examples improves with adversarial training.Therefore, most of the contributions introducing new adver-sarial attacks, e.g. [22], [23], [72] (see Section III) simultane-ously propose adversarial training as the first line of defenseagainst those attacks. Although adversarial training improvesrobustness of a network, it is a non-adaptive strategy thatrequires training to be performed using strong attacks and thearchitecture of the network is sufficiently expressive. Sinceadversarial training necessitates increased training/data size,we refer to it as a ‘brute-force’ strategy.

It is also commonly observed in the literature that brute-force adversarial training results in regularizing the network(e.g. see [23], [90]) to reduce over-fitting, which in turnimproves robustness of the networks against the adversarialattacks. Inspired by this observation, Miyato et al. [113]proposed a ‘Virtual Adversarial Training’ approach to smooththe output distributions of the neural networks. A related ‘sta-bility training’ method is also proposed by Zheng et al. [116]to improve the robustness of neural networks against smalldistortions to input images. It is noteworthy that whereasadversarial training is known to improve robustness of neuralnetworks, Moosavi-Dezfooli et al. [16] showed that effec-tive adversarial examples can again be computed for alreadyadversarially trained networks.

2) DATA COMPRESSION AS DEFENSEDziugaite et al. [122] noted that most of the popular imageclassification datasets comprise JPG images. Motivated bythis observation, they studied the effects of JPG compres-sion on the perturbations computed by FGSM [23]. It isreported that JPG compression can actually reverse the dropin classification accuracy to a large extent for the FGSMperturbations. Nevertheless, it is concluded that compres-sion alone is far from an effective defense. JPEG compres-sion was also studied by Guo et al. [82] for mitigatingthe effectiveness of adversarial images. Das et al. [37]also took a similar approach and used JPEG compressionto remove the high frequency components from imagesand proposed an ensemble-based technique to counter theadversarial attacks generated by FGSM [23] and Deep-Fool method [72]. Whereas encouraging results are reportedin [37], there is no analysis provided for the stronger attacks,e.g. C&W attacks [36]. Moreover, Shin and Song [184]have demonstrated the existence of adversarial examplesthat can survive JPEG compression. Compression under Dis-crete Cosine Transform (DCT) was also found inadequateas a defense against the universal perturbations [16] in ourprevious work [81]. One major limitation of compressionbased defense is that larger compressions also result in lossof classification accuracy on clean images, whereas smaller

14422 VOLUME 6, 2018


compressions often do not adequately remove the adversarialperturbations.

In another related approach, Bhagoji et al. [167] proposedto compress input data using Principal Component Analysisfor adversarial robustness. However, Xu et al. [139] notedthat this compression also results in corrupting the spatialstructure of the image, hence often adversely affecting theclassification performance.

3) FOVEATION BASED DEFENSELuo et al. [118] demonstrated that significant robustnessagainst the adversarial attacks using L-BFGS [22] andFGSM [23] is possible with ‘foveation’ mechanism - apply-ing neural network in different regions of images. It is hypoth-esized that CNN-based classifiers trained on large datasets,such as ImageNet [11] are generally robust to scale andtranslation variations of objects in the images. However,this invariance does not extend to adversarial patterns inthe images. This makes foveation as a viable option forreducing the effectiveness of adversarial attacks proposedin [22] and [23]. However, foveation is yet to demonstrateits effectiveness against more powerful attacks.

4) DATA RANDOMIZATION AND OTHER METHODSXie et al. [115] showed that random resizing of the adversarialexamples reduces their effectiveness. Moreover, adding ran-dom padding to such examples also results in reducing thefooling rates of the networks. Wang et al. [137] transformedthe input data with a separate data-transformation module toremove possible adversarial perturbations in images. In theliterature, we also find evidence that data augmentation dur-ing training (e.g. Gaussian data augmentation [46]) also helpsin improving robustness of neural networks to adversarialattacks, albeit only slightly.

B. MODIFYING THE NETWORKFor the approaches that modify the neural networks fordefense against the adversarial attacks, we first discussthe ‘complete defense’ approaches. The ‘detection only’approaches are separately reviewed in Section VI-B.8.

1) DEEP CONTRACTIVE NETWORKSIn the early attempts of making deep learning robust to adver-sarial attacks, Gu and Rigazio [24] introduced Deep Con-tractive Networks (DCN). It was shown that Denoising AutoEncoders [152] can reduce adversarial noise, however stack-ing them with the original networks can make the resultingnetwork even more vulnerable to perturbations. Based on thisobservation, the training procedure of DCNs used a smooth-ness penalty similar to Contractive Auto Encoders [171].Whereas reasonable robustness of DCNs was demonstratedagainst the L-BGFS [22] based attacks, many stronger attackshave been introduced since DCNs were initially proposed.A related concept of using auto encoders for adversarialrobustness of the neural networks can be also found in [140].

2) GRADIENT REGULARIZATION/MASKINGRoss and Doshi-Velez [52] studied input gradient regular-ization [165] as a method for adversarial robustness. Theirmethod trains differentiable models (e.g. deep neural net-works) while penalizing the degree of variation resulting inthe output with respect to change in the input. Implying,a small adversarial perturbation becomes unlikely to changethe output of the trained model drastically. It is shown thatthis method, when combined with brute-force adversarialtraining, can result in very good robustness against attackslike FGSM [23] and JSMA [60]. However, each of thesemethods almost double the training complexity of a network,which is already prohibitive in many cases.

Previously, Lyu et al. [28] also used the notion of penal-izing the gradient of loss function of network models withrespect to the inputs to incorporate robustness in the net-works against L-BFGS [22] and FGSM [23] based attacks.Similarly, Shaham et al. [27] attempted to improve the localstability of neural networks byminimizing the loss of a modelover adversarial examples at each parameter update. Theyminimized the loss of their model over worst-case adversarialexamples instead of the original data. In a related work,Nguyen and Sinha [44] introduced a masking based defenseagainst C&W attack [36] by adding noise to the logit outputsof networks.

3) DEFENSIVE DISTILLATIONPapernot et al. [38] exploited the notion of ‘distillation’ [164]to make deep neural networks robust against adversarialattacks. Distillation was introduced by Hinton et al. [164]as a training procedure to transfer knowledge of a morecomplex network to a smaller network. The variant of theprocedure introduced by Papernot et al. [38] essentially usesthe knowledge of the network to improve its own robustness.The knowledge is extracted in the form of class probabilityvectors of the training data and it is fed back to train theoriginal model. It is shown that doing so improves resilienceof a network to small perturbation in the images. Furtherempirical evidence in this regard is also provided in [108].Moreover, in a follow-up work, Papernot and McDaniel [84]also extended the defensive distillation method by addressingthe numerical instabilities that were encountered in [38].It is worth noting that the ‘Carlini and Wagner’ (C&W)attacks [36] introduced in Section III-A are claimed to be suc-cessful against the defensive distillation technique. We alsonote that defensive distillation can also be seen as an exampleof ‘gradient masking’ technique. However, we describe itseparately keeping in view its popularity in the literature.

4) BIOLOGICALLY INSPIRED PROTECTIONNayebi and Ganguli [123] demonstrated natural robust-ness of neural networks against adversarial attacks withhighly non-linear activations (similar to nonlinear dendriticcomputations). It is noteworthy that the Dense AssociativeMemory models of Krotov and Hopfield [126] also work

VOLUME 6, 2018 14423


on a similar principle for robustness against the adver-sarial examples. Considering the linearity hypothesis ofGoodfellow et al. [23], [123], and [126] seem to further thenotion of susceptibility of modern neural networks to adver-sarial examples being the effect of linearity of activations. Wenote that Brendel and Bethge [185] claim that the attacks failon the biologically inspired protection [123] due to numeri-cal limitations of computations. Stabilizing the computationsagain allow successful attacks on the protected networks.

5) PARSEVAL NETWORKSCisse et al. [130] proposed ‘Parseval’ networks as a defenseagainst the adversarial attacks. These networks employ alayer-wise regularization by controlling the global Lipschitzconstant of the network. Considering that a network can beseen as a composition of functions (at each layer), robus-tification against small input perturbations is possible bymaintaining a small Lipschitz constant for these functions.Cisse et al. proposed to do so by controlling the spectral normof the weight matrices of the networks by parameterizingthem with ‘parseval tight frames’ [170], hence the name‘Parseval’ networks.

6) DeepCloakGao et al. [138] proposed to insert a masking layer immedi-ately before the layer handling the classification. The addedlayer is explicitly trained by forward-passing clean and adver-sarial pair of images, and it encodes the differences betweenthe output features of the previous layers for those imagepairs. It is argued that the most dominant weights in theadded layer correspond to the most sensitive features of thenetwork (in terms of adversarial manipulation). Therefore,while classifying, those features are masked by forcing thedominant weights of the added layer to zero.

7) MISCELLANEOUS APPROACHESAmong other notable efforts in making neural networksrobust to adversarial attacks, Zantedeschi et al. [46] pro-posed to use bounded ReLU [172] to reduce the effective-ness of adversarial patterns in the images. Jin et al. [119]introduced a feedforward CNN that used additive noise tomitigate the effects of adversarial examples. Sun et al. [56]prposed ‘HyperNetworks’ that use statistical filtering as amethod to make the network robust. Madry et al. [55]studied adversarial defense from the perspective of robustoptimization. They showed that adversarial training witha PGD adversary can successfully defend against a rangeof other adversaries. Later, Carlini et al. [59] also verifiedthis observation. Na et al. [85] employed a network thatis regularized with a unified embedding for classificationand low-level similarity learning. The network is penalizedusing the distance between clean and the correspondingadversarial embeddings. Strauss et al. [89] studied ensem-ble of methods to defend a network against the perturba-tions. Kadran and Stanley [135] modified the output layer ofa neural network to induce robustness against the adversarial

attacks. Wang et al. [121], [128] developed adversary resis-tant neural networks by leveraging non-invertible data trans-formation in the network. Lee et al. [106] developedmanifoldregularized networks that use a training objective to mini-mizes the difference between multi-layer embedding resultsof clean and adversarial images. Kotler and Wong [96] pro-posed to learn ReLU-based classifier that show robustnessagainst small adversarial perturbations. They train a neuralnetwork that provably achieves high accuracy (>90%) againstany adversary in a canonical setting (ε = 0.1 for `∞-normperturbation on MNIST). However, given that their approachis computationally infeasible to apply on larger networks,the only defenses that have been extensively evaluated arethose of Madry et al. [55], giving 89% accuracy against largeε (0.3/1) on MNIST and 45% for moderate ε (8/255) onCIFAR for the `∞-norm perturbations.

8) DETECTION ONLY APPROACHESa: SafetyNetLu et al. [66] hypothesized that adversarial examples producedifferent patterns of ReLU activations in (the late stages of)networks than what is produced by clean images. Basedon this hypothesis, they proposed to append a Radial BasisFunction SVM classifier to the targeted models such that theSVM uses discrete codes computed by the late stage ReLUsof the network. To detect perturbation in a test image, itscode is compared against those of training samples using theSVM. Effective detection of adversarial examples generatedby [23], [35], and [72] is demonstrated by their framework,named SafetyNet.

b: DETECTOR SUBNETWORKMetzen et al. [78] proposed to augment a targeted networkwith a subnetwork that is trained for a binary classificationtask of detecting adversarial perturbations in inputs. It isshown that appending such a network to the internal layers ofa model and using adversarial training can help in detectingperturbations generated using FGSM [23], BIM [35] andDeepFool [72] methods. However, Lu et al. [66] later showedthat this approach is again vulnerable to counter-counter mea-sures.

c: EXPLOITING CONVOLUTION FILTER STATISTICSLi and Li [105] used statistics of the convolution filters inCNN-based neural networks to classify the input imagesas clean or adversarial. A cascaded classifier is designedthat uses these statistics, and it is shown to detect morethan 85% adversarial images generated by the methodsin [22] and [114].

d: ADDITIONAL CLASS AUGMENTATIONGrosse et al. [57] proposed to augment the potentially tar-geted neural network model with an additional class in whichthe model is trained to classify all the adversarial examples.Hosseini et al. [32] also employed a similar strategy to detectblack-box attacks.

14424 VOLUME 6, 2018


FIGURE 10. Illustration of defense against universal perturbations [81]:The approach rectifies an image to restore the network prediction. Thepattern removed by rectification is separately analyzed to detectperturbation.

C. NETWORK ADD-ONS1) DEFENSE AGAINST UNIVERSAL PERTURBATIONSAkhtar et al. [81] proposed a defense framework againstthe adversarial attacks generated using universal perturba-tions [16]. The framework appends extra ‘pre-input’ layersto the targeted network and trains them to rectify a perturbedimage so that the classifier’s prediction becomes the same asits prediction on the clean version of the same image. Thepre-input layers are termed Perturbation Rectifying Network(PRN), and they are trained without updating the parametersof the targeted network. A separate detector is trained byextracting features from the input-output differences of PRNfor the training images. A test image is first passed throughthe PRN and then its features are used to detect perturbations.If adversarial perturbations are detected, the output of PRNis used to classify the test image. Fig. 10, illustrates therectification performed by PRN. The removed patterns areseparately analyzed for detection.

2) GAN-BASED DEFENSELee et al. [101] used the popular framework of GenerativeAdversarial Networks [151] to train a network that is robustto FGSM [23] like attacks. The authors proposed to directlytrain the network along a generator network that attempts togenerate perturbation for that network. During its training,the classifier keeps trying to correctly classify both the cleanand perturbed images. We categorize this technique as an‘add-on’ approach because the authors propose to always

train any network in this fashion. In another GAN-baseddefense, Shen et al. [58] use the generator part of the networkto rectify a perturbed image.

3) DETECTION ONLY APPROACHESa: FEATURE SQUEEZINGXu et al. [43] proposed to use feature squeezing to detectadversarial perturbation to an image. They added two externalmodels to the classifier network, such that these modelsreduce the color bit depth of each pixel in the image, andperform spatial smoothing over the image. The predictions ofthe targeted network over the original image and the squeezedimages are compared. If a large difference is found betweenthe predictions, the image is considered to be an adversarialexample. Whereas [43] demonstrated the effectiveness of thisapproach against more classical attacks [23], a follow-upwork [139] also claims that themethodworks reasonablywellagainst the more powerful C&W attacks [36]. He et al. [76]also combined feature squeezing with the ensemble methodproposed in [173] to show that strength of defenses does notalways increase by combining them.

b: MagNetMeng and Chen [45] proposed a framework that usesone or more external detectors to classify an input image asadversarial or clean. During training, the framework aims atlearning the manifold of clean images. In the testing phase,the images that are found far from the manifold are treatedas adversarial and are rejected. The images that are closeto the manifold (but not exactly on it) are always reformedto lie on the manifold and the classifier is fed with thereformed images. The notion of attracting nearby imagesto the manifold of clean images and dropping the far-offimages also inspires the name of the framework, i.e. MagNet.It is noteworthy that Carlini and Wagner [186] very recentlydemonstrated that this defense technique can also be defeatedwith slightly larger perturbations.

c: MISCELLANEOUS METHODSLiang et al. [50] treated perturbations to images as noiseand used scalar quantization and spatial smoothing filter toseparately detect such perturbations. In a related approach,Feinman et al. [86] proposed to detect adversarial perturba-tions by harnessing uncertainty estimates (of dropout neuralnetworks) and performing density estimation in the featurespace of neural networks. Eventually, separate binary classi-fiers are trained as adversarial example detectors using theproposed features. Gebhart and Schrater [92] viewed neu-ral network computation as information flow in graphs andproposed a method to detect adversarial perturbations byapplying persistent homology to the induced graphs.

VII. OUTLOOK OF THE RESEARCH DIRECTIONIn the previous sections, we presented a comprehensivereview of the recent literature in adversarial attacks on deep

VOLUME 6, 2018 14425


learning. Whereas several interesting facts were reported inthose sections along the technical details, below we makemore general observations regarding this emerging researchdirection. The discussion presents a broader outlook to thereaders without in-depth technical knowledge of this area.Our arguments are based on the literature reviewed above.

d: THE THREAT IS REALWhereas few works suggest that adversarial attacks on deeplearning may not be a serious concern, a large body of therelated literature indicates otherwise. The literature reviewedin Sections III and IV clearly demonstrate that adversar-ial attacks can severely degrade the performance of deeplearning techniques on multiple Computer Vision tasks, andbeyond. In particular, the literature reviewed in Section IVascertains that deep learning is vulnerable to adversarialattacks in the real physical world. Therefore, we can conclu-sively argue that adversarial attacks pose a real threat to deeplearning in practice.

e: ADVERSARIAL VULNERABILITY IS A GENERALPHENOMENONThe reviewed literature shows successful fooling of differenttypes of deep neural networks, e.g. MLPs, CNNs, RNNson a variety of tasks in Computer Vision, e.g. recognition,segmentation, detection. Although most of the existing worksfocus on fooling deep learning on the task of classifica-tion/recognition, based on the surveyed literature we caneasily observe that deep learning approaches are vulnerableto adversarial attacks in general.

f: ADVERSARIAL EXAMPLES OFTEN GENERALIZE WELLOne of the most common properties of adversarial examplesreported in the literature is that they transfer well betweendifferent neural networks. This is especially true for the net-works that have relatively similar architecture. The general-ization of adversarial examples is often exploited in black-box attacks.

g: REASONS OF ADVERSARIAL VULNERABILITY NEEDMORE INVESTIGATIONThere are varied view-points in the literature on the reasonsbehind the vulnerability of deep neural networks to subtleadversarial perturbations. Often, these view-points are notwell-aligned with each other. There is an obvious need forsystematic investigation in this direction.

h: LINEARITY DOES PROMOTE VULNERABILITYGoodfellow et al. [23] first suggested that the design of mod-ern deep neural networks that forces them to behave linearlyin high dimensional spaces also makes them vulnerable toadversarial attacks. Although popular, this notion has alsofaced some opposition in the literature. Our survey pointedout multiple independent contributions that hold linearity ofthe neural networks accountable for their vulnerability toadversarial attacks. Based on this fact, we can argue that

linearity does promote vulnerability of deep neural networksto the adversarial attacks. However, it does not seem to bethe only reason behind successful fooling of deep neuralnetworks with cheap analytical perturbations.

i: COUNTER-COUNTER MEASURES ARE POSSIBLEWhereas multiple defense techniques exist to counter adver-sarial attacks, it is often shown in the literature that adefended model can again be successfully attacked by devis-ing counter-counter measures, e.g. see [49]. This observationnecessitates that new defenses also provide an estimate oftheir robustness against obvious counter-counter measures.

j: HIGHLY ACTIVE RESEARCH DIRECTIONThe profound implications of vulnerability of deep neuralnetworks to adversarial perturbations have made researchin adversarial attacks and their defenses highly active inrecent time. The majority of the literature reviewed in thissurvey surfaced in the last two years, and there is currentlya continuous influx of contributions in this direction. On onehand, techniques are being proposed to defend neural net-works against the known attacks, on the other; more andmote powerful attacks are being devised. Recently, a Kagglecompetition was also organized for the defenses against theadversarial attacks (https://www.kaggle.com/c/nips-2017-defense-against-adversarial-attack/). It can be hoped that thishigh research activity will eventually result in making deeplearning approaches robust enough to be used in safety andsecurity critical applications in the real world.

VIII. CONCLUSIONThis article presented the first comprehensive survey in thedirection of adversarial attacks on deep learning in ComputerVision. Despite the high accuracies of deep neural networkson a wide variety of Computer Vision tasks, they have beenfound vulnerable to subtle input perturbations that lead themto completely change their outputs. With deep learning atthe heart of the current advances in machine learning andartificial intelligence, this finding has resulted in numerousrecent contributions that devise adversarial attacks and theirdefenses for deep learning. This article reviews these contri-butions, mainly focusing on the most influential and inter-esting works in the literature. From the reviewed literature,it is apparent that adversarial attacks are a real threat todeep learning in practice, especially in safety and securitycritical applications. The existing literature demonstrates thatcurrently deep learning can not only be effectively attackedin cyberspace but also in the physical world. However, owingto the very high activity in this research direction it can behoped that deep learning will be able to show considerablerobustness against the adversarial attacks in the future.

ACKNOWLEDGEMENTThe authors acknowledge Nicholas Carlini (UC Berkely)and Dimitris Tsipras (MIT) for providing useful feedback toimprove paper quality.

14426 VOLUME 6, 2018


REFERENCES[1] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,

no. 7553, pp. 436–444, 2015.[2] M. Helmstaedter, K. L. Briggman, S. C. Turaga, V. Jain, H. S. Seung, and

W. Denk, ‘‘Connectomic reconstruction of the inner plexiform layer inthe mouse retina,’’ Nature, vol. 500, no. 7461, pp. 168–174, 2013.

[3] H. Y. Xiong et al., ‘‘The human splicing code reveals new insights into thegenetic determinants of disease,’’ Science, vol. 347, no. 6218, p. 1254806,2015.

[4] J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik, ‘‘Deepneural nets as a method for quantitative structure–activity relationships,’’J. Chem. Inf. Model., vol. 55, no. 2, pp. 263–274, 2015.

[5] T. Ciodaro, D. Deva, J. M. de Seixas, and D. Damazio, ‘‘Online particledetection with neural networks based on topological calorimetry informa-tion,’’ J. Phys., Conf. Series. vol. 368, no. 1, p. 012030, 2012.

[6] Kaggle. (2014). Higgs Boson Machine Learning Challenge. [Online].Available: https://www.kaggle.com/c/higgs-boson

[7] G. Hinton et al., ‘‘Deep neural networks for acoustic modeling in speechrecognition: The shared views of four research groups,’’ IEEE SignalProcess. Mag., vol. 29, no. 6, pp. 82–97, Nov. 2012.

[8] I. Sutskever, O. Vinyals, and Q. V. Le, ‘‘Sequence to sequence learningwith neural networks,’’ in Proc, Adv. Neural Inf. Process. Syst., 2014,pp. 3104–3112.

[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classificationwith deep convolutional neural networks,’’ in Proc. Adv. Neural Inf.Process. Syst., 2012, pp. 1097–1105.

[10] Y. LeCun et al., ‘‘Backpropagation applied to handwritten zip coderecognition,’’ Neural Comput., vol. 1,, no. 4, pp. 541–551, 1989.

[11] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet:A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput.Vis. Patt. Recognit., Jun. 2009, pp. 248–255.

[12] E. Ackerman. How Drive.ai is Mastering Autonomous DrivingWith Deep Learning. Accessed: Dec. 2017. [Online]. Available:https://spectrum.ieee.org/cars-that-think/transportation/self-driving/how-driveai-is-mastering-autonomous-driving-with-deep-learning

[13] M.M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya, R.Wald,and E. Muharemagic, ‘‘Deep learning applications and challenges in bigdata analytics,’’ J. Big Data, vol. 2, no. 1, pp. 1–21, 2015.

[14] D. Silver et al., ‘‘Mastering the game of go without human knowledge,’’Nature, vol. 550, no. 7676, pp. 354–359, 2017.

[15] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning forimage recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,Jun. 2016, pp. 770–778.

[16] S. M.Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, ‘‘Universaladversarial perturbations,’’ in Proc. IEEE Conf. Comput. Vis. PatternRecognit. (CVPR), Jul. 2017, pp. 86–94.

[17] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. (2014).‘‘Return of the devil in the details: Delving deep into convolutional nets.’’[Online]. Available: https://arxiv.org/abs/1405.3531

[18] C. Szegedy et al., ‘‘Going deeper with convolutions,’’ inProc. IEEEConf.Comput. Vis. Pattern Recognit., Jun. 2015, pp. 1–9.

[19] J. Lu, H. Sibai, E. Fabry, and D. Forsyth. (2017). ‘‘Standard detectorsaren’t (currently) fooled by physical adversarial stop signs.’’ [Online].Available: https://arxiv.org/abs/1710.03337

[20] R. Fletcher, Practical Methods of Optimization. Hoboken, NJ, USA;Wiley, 2013.

[21] A. Esteva et al., ‘‘Dermatologist-level classification of skin cancer withdeep neural networks,’’ Nature, vol. 542, no. 7639, pp. 115–118, 2017.

[22] C. Szegedy et al. (2014). ‘‘Intriguing properties of neural networks.’’[Online]. Available: https://arxiv.org/abs/1312.6199

[23] I. J. Goodfellow, J. Shlens, and C. Szegedy. (2015). ‘‘Explaining and har-nessing adversarial examples.’’ [Online]. Available: https://arxiv.org/abs/1412.6572

[24] S. Gu and L. Rigazio. (2015). ‘‘Towards deep neural network architec-tures robust to adversarial examples.’’ [Online]. Available: https://arxiv.org/abs/1412.5068

[25] P. Tabacof and E. Valle, ‘‘Exploring the space of adversarial images,’’ inProc. IEEE Int. Joint Conf. Neural Netw., Jul. 2016, pp. 426–433.

[26] S. Sabour, Y. Cao, F. Faghri, and D. J. Fleet. (2015). ‘‘Adversarial manip-ulation of deep representations.’’ [Online]. Available: https://arxiv.org/abs/1511.05122s

[27] U. Shaham, Y. Yamada, and S. Negahban. (2016). ‘‘Understanding adver-sarial training: Increasing local stability of neural nets through robustoptimization.’’ [Online]. Available: https://arxiv.org/abs/1511.05432

[28] C. Lyu, K. Huang, and H.-N. Liang, ‘‘A unified gradient regularizationfamily for adversarial examples,’’ in Proc. IEEE Int. Conf. Data Mining,Nov. 2015, pp. 301–309.

[29] I. Evtimov et al. (2017). ‘‘Robust physical-world attacks on deep learningmodels.’’ [Online]. Available: https://arxiv.org/abs/1707.08945

[30] J. Lu, H. Sibai, E. Fabry, and D. Forsyth. (2017). ‘‘No need to worryabout adversarial examples in object detection in autonomous vehicles.’’[Online]. Available: https://arxiv.org/abs/1707.03501

[31] Y. Liu, W. Zhang, S. Li, and N. Yu. (2017). ‘‘Enhanced attackson defensively distilled deep neural networks.’’ [Online]. Available:https://arxiv.org/abs/1711.05934

[32] H. Hosseini, Y. Chen, S. Kannan, B. Zhang, and R. Poovendran. (2017).‘‘Blocking transferability of adversarial examples in black-box learningsystems.’’ [Online]. Available: https://arxiv.org/abs/1703.04318

[33] T. Gu, B. Dolan-Gavitt, and S. Garg. (2017). ‘‘BadNets: Identifyingvulnerabilities in the machine learning model supply chain.’’ [Online].Available: https://arxiv.org/abs/1708.06733

[34] N. Papernot, P. McDaniel, A. Sinha, and M. Wellman. (2016). ‘‘Towardsthe science of security and privacy in machine learning.’’ [Online]. Avail-able: https://arxiv.org/abs/1611.03814

[35] A. Kurakin, I. Goodfellow, and S. Bengio. (2016). ‘‘Adversarial exam-ples in the physical world.’’ [Online]. Available: https://arxiv.org/abs/1607.02533

[36] N. Carlini and D. Wagner. (2016). ‘‘Towards evaluating the robustness ofneural networks.’’ [Online]. Available: https://arxiv.org/abs/1608.04644

[37] N. Das et al. (2017). ‘‘Keeping the bad guys out: Protecting and vac-cinating deep learning with JPEG compression.’’ [Online]. Available:https://arxiv.org/abs/1705.02900

[38] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, ‘‘Distillation asa defense to adversarial perturbations against deep neural networks,’’ inProc. IEEE Symp. Secur. Privacy (SP), May 2016, pp. 582–597.

[39] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A.Swami, ‘‘Practical black-box attacks against machine learning,’’ in Proc.ACM Asia Conf. Comput. Commun. Secur., 2017, pp. 506–519.

[40] X. Xu, X. Chen, C. Liu, A. Rohrbach, T. Darell, and D. Song. (2017).‘‘Can you fool AI with adversarial examples on a visual Turing test?’’[Online]. Available: https://arxiv.org/abs/1709.08693

[41] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh, ‘‘ZOO: Zerothorder optimization based black-box attacks to deep neural networks with-out training substitute models,’’ in Proc. 10th ACMWorkshop Artif. Intell.Secur. (AISEC), 2017, pp. 15–26.

[42] S. Baluja and I. Fischer. (2017). ‘‘Adversarial transformation net-works: Learning to generate adversarial examples.’’ [Online]. Available:https://arxiv.org/abs/1703.09387

[43] W. Xu, D. Evans, and Y. Qi. (2017). ‘‘Feature squeezing: Detectingadversarial examples in deep neural networks.’’ [Online]. Available:https://arxiv.org/abs/1704.01155

[44] L. Nguyen, S. Wang, and A. Sinha. (2017). ‘‘A learning and mask-ing approach to secure learning.’’ [Online]. Available: https://arxiv.org/abs/1709.04447

[45] D. Meng and H. Chen, ‘‘MagNet: A two-pronged defense against adver-sarial examples,’’ in Proc. ACM Conf. Comput. Commun. Secur. (CCS),2017, pp. 135–147.

[46] V. Zantedeschi, M.-I. Nicolae, and A. Rawat. (2017). ‘‘Efficient defensesagainst adversarial attacks.’’ [Online]. Available: https://arxiv.org/abs/1707.06728

[47] J. Hayes and G. Danezis. (2017). ‘‘Machine learning as an adversarialservice: Learning black-box adversarial examples.’’ [Online]. Available:https://arxiv.org/abs/1708.05207

[48] A. Graese, A. Rozsa, and T. E. Boult, ‘‘Assessing threat of adversarialexamples on deep neural networks,’’ in Proc. IEEE Int. Conf. Mach.Learn. Appl., Dec. 2016, pp. 69–74.

[49] N. Carlini and D. Wagner. (2017). ‘‘Adversarial examples are not eas-ily detected: Bypassing ten detection methods.’’ [Online]. Available:https://arxiv.org/abs/1705.07263

[50] B. Liang, H. Li, M. Su, X. Li, W. Shi, and X. Wang. (2017). ‘‘Detectingadversarial examples in deep networks with adaptive noise reduction’.’’[Online]. Available: https://arxiv.org/abs/1705.08378

[51] A. Arnab, O. Miksik, and P. H. S. Torr. (2017). ‘‘On the robustness ofsemantic segmentation models to adversarial attacks.’’ [Online]. Avail-able: https://arxiv.org/abs/1711.09856

[52] A. S. Ross and F. Doshi-Velez. (2017). ‘‘Improving the adversarial robust-ness and interpretability of deep neural networks by regularizing theirinput gradients.’’ [Online]. Available: https://arxiv.org/abs/1711.09404

VOLUME 6, 2018 14427


[53] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. (2017).‘‘PixelDefend: Leveraging generative models to understand and defendagainst adversarial examples.’’ [Online]. Available: https://arxiv.org/abs/1710.10766

[54] N. Narodytska and S. P. Kasiviswanathan. (2016). ‘‘Simple black-box adversarial perturbations for deep networks.’’ [Online]. Available:https://arxiv.org/abs/1612.06299

[55] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu.(2017). ‘‘Towards deep learning models resistant to adversarial attacks.’’[Online]. Available: https://arxiv.org/abs/1706.06083

[56] Z. Sun, M. Ozay, and T. Okatani. (2017). ‘‘HyperNetworks with statis-tical filtering for defending adversarial examples.’’ [Online]. Available:https://arxiv.org/abs/1711.01791

[57] K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. McDaniel.(2017). ‘‘On the (statistical) detection of adversarial examples.’’ [Online].Available: https://arxiv.org/abs/1702.06280

[58] S. Shen, G. Jin, K. Gao, and Y. Zhang. (2017). ‘‘APE-GAN: Adversarialperturbation elimination with GAN.’’ [Online]. Available: https://arxiv.org/abs/1707.05474

[59] N. Carlini, G. Katz, C. Barrett, and D. L. Dill. (2017). ‘‘Ground-truth adversarial examples.’’ [Online]. Available: https://arxiv.org/abs/1709.10207

[60] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, andA. Swami, ‘‘The limitations of deep learning in adversarial settings,’’ inProc. IEEE Eur. Symp. Secur. Privacy, Mar. 2016, pp. 372–387.

[61] K. Grosse, N. Papernot, P. Manoharan, M. Backes, andP. McDaniel. (2016). ‘‘Adversarial perturbations against deepneural networks for malware classification.’’ [Online]. Available:https://arxiv.org/abs/1606.04435

[62] S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel. (2017).‘‘Adversarial attacks on neural network policies.’’ [Online]. Available:https://arxiv.org/abs/1702.02284

[63] M. Melis, A. Demontis, B. Biggio, G. Brown, G. Fumera, and F. Roli.(2017). ‘‘Is deep learning safe for robot vision? Adversarial exam-ples against the iCub humanoid.’’ [Online]. Available: https://arxiv.org/abs/1708.06939

[64] I. Rosenberg, A. Shabtai, L. Rokach, and Y. Elovici. (2017). ‘‘Genericblack-box end-to-end attack against state of the art API call based mal-ware classifiers.’’ [Online]. Available: https://arxiv.org/abs/1707.05970

[65] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. (2017).‘‘Synthesizing robust adversarial examples.’’ [Online]. Available:https://arxiv.org/abs/1707.07397

[66] J. Lu, T. Issaranon, and D. Forsyth. (2017). ‘‘SafetyNet: Detect-ing and rejecting adversarial examples robustly.’’ [Online]. Available:https://arxiv.org/abs/1704.00103

[67] J. H. Metzen, M. C. Kumar, T. Brox, and V. Fischer, ‘‘Universal adver-sarial perturbations against semantic image segmentation,’’ in Proc. IEEEConf. Comput. Vis. Pattern Recognit., Oct. 2017, pp. 2755–2764.

[68] J. Su, D. V. Vargas, and S. Kouichi. (2017). ‘‘One pixel attack forfooling deep neural networks.’’ [Online]. Available: https://arxiv.org/abs/1710.08864

[69] A. Fawzi, S. Moosavi-Dezfooli, and P. Frossard, ‘‘Robustness of clas-sifiers: From adversarial to random noise,’’ in Proc. Adv. Neural Inf.Process. Syst., 2016, pp. 1632–1640.

[70] S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard, and S. Soatto.(2017). ‘‘Analysis of universal adversarial perturbations.’’ [Online].Available: https://arxiv.org/abs/1705.09554

[71] A. Fawzi, O. Fawzi, and P. Frossard. (2015). ‘‘Analysis of clas-sifiers’ robustness to adversarial perturbations.’’ [Online]. Available:https://arxiv.org/abs/1502.02590

[72] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, ‘‘DeepFool: A simpleand accurate method to fool deep neural networks,’’ in Proc. IEEE Conf.Comput. Vis. Pattern Recognit., Jun. 2016, pp. 2574–2582.

[73] C. Kanbak, S.-M. Moosavi-Dezfooli, and P. Frossard. (2017). ‘‘Geomet-ric robustness of deep networks: Analysis and improvement.’’ [Online].Available: https://arxiv.org/abs/1711.09115

[74] T. Tanay and L. Griffin. (2016). ‘‘A boundary tilting persepectiveon the phenomenon of adversarial examples.’’ [Online]. Available:https://arxiv.org/abs/1608.07690

[75] I. Evtimov et al. (2017). ‘‘Robust physical-world attacks on deep learningmodels.’’ [Online]. Available: https://arxiv.org/abs/1707.08945

[76] W. He, J. Wei, X. Chen, N. Carlini, and D. Song. (2017). ‘‘Adversarialexample defenses: ensembles of weak defenses are not strong.’’ [Online].Available: https://arxiv.org/abs/1706.04701

[77] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, andP. McDaniel. (2017). ‘‘Ensemble adversarial training: Attacks anddefenses.’’ [Online]. Available: https://arxiv.org/abs/1705.07204

[78] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. (2017).‘‘On detecting adversarial perturbations.’’ [Online]. Available:https://arxiv.org/abs/1702.04267

[79] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille. (2017). ‘‘Miti-gating adversarial effects through randomization.’’ [Online]. Available:https://arxiv.org/abs/1711.01991

[80] A. Kurakin, I. Goodfellow, and S. Bengio. (2017). ‘‘Adversarial machinelearning at scale.’’ [Online]. Available: https://arxiv.org/abs/1611.01236

[81] N. Akhtar, J. Liu, and A. Mian. (2017). ‘‘Defense against univer-sal adversarial perturbations.’’ [Online]. Available: https://arxiv.org/abs/1711.05929

[82] C. Guo, M. Rana, M. Cisse, and L. van der Maaten. (2017). ‘‘Counteringadversarial images using input transformations.’’ [Online]. Available:https://arxiv.org/abs/1711.00117

[83] A. Galloway, G. W. Taylor, and M. Moussa. (2017). ‘‘Attacking binarizedneural networks.’’ [Online]. Available: https://arxiv.org/abs/1711.00449

[84] N. Papernot and P. McDaniel. (2017). ‘‘Extending defensive distillation.’’[Online]. Available: https://arxiv.org/abs/1705.05264

[85] T. Na, J. H. Ko, and S. Mukhopadhyay. (2017). ‘‘Cascade adversarialmachine learning regularized with a unified embedding.’’ [Online]. Avail-able: https://arxiv.org/abs/1708.02582

[86] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner. (2017).‘‘Detecting adversarial samples from artifacts.’’ [Online]. Available:https://arxiv.org/abs/1703.00410

[87] X. Zeng et al. (2017). ‘‘Adversarial attacks beyond the image space.’’[Online]. Available: https://arxiv.org/abs/1711.07183

[88] Y. Liu, X. Chen, C. Liu, and D. Song. (2017). ‘‘Delving into transfer-able adversarial examples and black-box attacks.’’ [Online]. Available:https://arxiv.org/abs/1611.02770

[89] T. Strauss, M. Hanselmann, A. Junginger, and H. Ulmer. (2017). ‘‘Ensem-ble methods as a defense to adversarial perturbations against deep neuralnetworks.’’ [Online]. Available: https://arxiv.org/abs/1709.03423

[90] S. Sankaranarayanan, A. Jain, R. Chellappa, and S. N. Lim. (2017). ‘‘Reg-ularizing deep networks using efficient layerwise adversarial training.’’[Online]. Available: https://arxiv.org/abs/1705.07819

[91] E. D. Cubuk, B. Zoph, S. S. Schoenholz, and Q. V. Le. (2017).‘‘Intriguing properties of adversarial examples.’’ [Online]. Available:https://arxiv.org/abs/1711.02846

[92] T. Gebhart and P. Schrater. (2017). ‘‘Adversary detection inneural networks via persistent homology.’’ [Online]. Available:https://arxiv.org/abs/1711.10056

[93] J. Bradshaw, A. G. de G. Matthews, and Z. Ghahramani. (2017).‘‘Adversarial examples, uncertainty, and transfer testing robustnessin Gaussian process hybrid deep networks.’’ [Online]. Available:https://arxiv.org/abs/1707.02476

[94] A. Rozsa, E. M. Rudd, and T. E. Boult. (2016). ‘‘Adversarial diversityand hard positive generation.’’ [Online]. Available: https://arxiv.org/abs/1605.01775

[95] O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, andA. Criminisi. (2017). ‘‘Measuring neural net robustness with constraints.’’[Online]. Available: https://arxiv.org/abs/1605.07262

[96] J. Z. Kolter and E. Wong. (2017). ‘‘Provable defenses against adversarialexamples via the convex outer adversarial polytope.’’ [Online]. Available:https://arxiv.org/abs/1711.00851

[97] A. Rozsa, M. Günther, and T. E. Boult, ‘‘Are accuracy and robustnesscorrelated,’’ in Proc. IEEE Int. Conf. Mach. Learn. Appl., Dec. 2016,pp. 227–232.

[98] H. Hosseini, B. Xiao, M. Jaiswal, and R. Poovendran. (2017). ‘‘Onthe limitation of convolutional neural networks in recognizing negativeimages.’’ [Online]. Available: https://arxiv.org/abs/1703.06857

[99] M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier. (2017).‘‘Parseval networks: Improving robustness to adversarial examples.’’[Online]. Available: https://arxiv.org/abs/1704.08847

[100] N. Cheney, M. Schrimpf, and G. Kreiman. (2017). ‘‘On the robustnessof convolutional neural networks to internal architecture and weightperturbations.’’ [Online]. Available: https://arxiv.org/abs/1703.08245

[101] H. Lee, S. Han, and J. Lee. (2017). ‘‘Generative adversarial trainer:Defense to adversarial perturbations with GAN.’’ [Online]. Available:https://arxiv.org/abs/1705.03387

14428 VOLUME 6, 2018


[102] A. Rozsa, M. Gunther, and T. E. Boult. (2017). ‘‘Towards robust deepneural networks with BANG.’’ [Online]. Available: https://arxiv.org/abs/1612.00138

[103] T. Miyato, S.-I. Maeda, M. Koyama, and S. Ishii. (2017). ‘‘Virtualadversarial training: A regularization method for supervised and semi-supervised learning.’’ [Online]. Available: https://arxiv.org/abs/1704.03976

[104] D. Arpit et al. (2017). ‘‘A closer look at memorization in deep networks.’’[Online]. Available: https://arxiv.org/abs/1706.05394

[105] X. Li and F. Li, ‘‘Adversarial examples detection in deep networks withconvolutional filter statistics,’’ in Proc. Int. Conf. Comput. Vis., 2017,pp. 1–9.

[106] T. Lee, M. Choi, and S. Yoon. (2016). ‘‘Manifold regularized deepneural networks using adversarial examples.’’ [Online]. Available:https://arxiv.org/abs/1511.06381

[107] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel,‘‘Adversarial examples for malware detection,’’ in Proc. Eur. Symp. Res.Comput. Secur., 2017, pp. 62–79.

[108] N. Papernot and P. McDaniel. (2016). ‘‘On the effectiveness of defensivedistillation.’’ [Online]. Available: https://arxiv.org/abs/1607.05113

[109] N. Papernot, P. McDaniel, and I. Goodfellow. (2016). ‘‘Transferability inmachine learning: From phenomena to black-box attacks using adversar-ial samples.’’ [Online]. Available: https://arxiv.org/abs/1605.07277

[110] N. Papernot, P. McDaniel, A. Swami, and R. Harang, ‘‘Crafting adversar-ial input sequences for recurrent neural networks,’’ inProc. IEEEMilitaryCommun. Conf., 2016, pp. 49–54.

[111] N. Papernot et al. (2016). ‘‘Cleverhans v2.0.0: An adversarial machinelearning library.’’ [Online]. Available: https://arxiv.org/abs/1610.00768

[112] I. Goodfellow et al. (2016). ‘‘cleverhans v2.0.0: An adversarial machinelearning library.’’ [Online]. Available: https://arxiv.org/abs/1610.00768

[113] T. Miyato, A. M. Dai, and I. Goodfellow. (2016). ‘‘Adversarial trainingmethods for semi-supervised text classification.’’ [Online]. Available:https://arxiv.org/abs/1605.07725

[114] A. Nguyen, J. Yosinski, and J. Clune, ‘‘Deep neural networks are easilyfooled: High confidence predictions for unrecognizable images,’’ in Proc.IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 427–436.

[115] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille. (2017).‘‘Adversarial examples for semantic segmentation and object detection.’’[Online]. Available: https://arxiv.org/abs/1703.08603

[116] S. Zheng, Y. Song, T. Leung, and I. Goodfellow, ‘‘Improving the robust-ness of deep neural networks via stability training,’’ in Proc. IEEE Conf.Comput. Vis. Pattern Recognit., Jun. 2016, pp. 4480–4488.

[117] A. Fawzi, O. Fawzi, and P. Frossard, ‘‘Fundamental limits on adversarialrobustness,’’ in Proc. ICML, Workshop Deep Learn., 2015, pp. 1–7.

[118] Y. Luo, X. Boix, G. Roig, T. Poggio, and Q. Zhao. (2015). ‘‘Foveation-based mechanisms alleviate adversarial examples.’’ [Online]. Available:https://arxiv.org/abs/1511.06292

[119] J. Jin, A. Dundar, and E. Culurciello. (2015). ‘‘Robust convolu-tional neural networks under adversarial noise.’’ [Online]. Available:https://arxiv.org/abs/1511.06306

[120] J. Kos, I. Fischer, and D. Song. (2017). ‘‘Adversarial examples for gener-ative models.’’ [Online]. Available: https://arxiv.org/abs/1702.06832

[121] Q. Wang et al. (2016). ‘‘Adversary resistant deep neural networkswith an application to malware detection.’’ [Online]. Available:https://arxiv.org/abs/1610.01239

[122] G. K. Dziugaite, Z. Ghahramani, and D. M. Roy. (2016). ‘‘A study of theeffect of JPG compression on adversarial images.’’ [Online]. Available:https://arxiv.org/abs/1608.00853

[123] A. Nayebi and S. Ganguli. (2017). ‘‘Biologically inspired protec-tion of deep networks from adversarial attacks.’’ [Online]. Available:https://arxiv.org/abs/1703.09202

[124] W. Hu and Y. Tan. (2017). ‘‘Generating adversarial malware exam-ples for black-box attacks based on GAN.’’ [Online]. Available:https://arxiv.org/abs/1702.05983

[125] G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer. (2017).‘‘Towards proving the adversarial robustness of deep neural networks.’’[Online]. Available: https://arxiv.org/abs/1709.02802

[126] D. Krotov and J. J. Hopfield. (2017). ‘‘Dense associative memory isrobust to adversarial inputs.’’ [Online]. Available: https://arxiv.org/abs/1701.00939

[127] P. Tabacof, J. Tavares, and E. Valle. (2016). ‘‘Adversarial imagesfor variational autoencoders.’’ [Online]. Available: https://arxiv.org/abs/1612.00155

[128] Q. Wang et al. (2016). ‘‘Using non-invertible data transformationsto build adversarial-robust neural networks.’’ [Online]. Available:https://arxiv.org/abs/1610.01934

[129] A. Rozsa, M. Günther, E. M. Rudd, and T. E. Boult. (2018). ‘‘Facialattributes: Accuracy and adversarial robustness.’’ [Online]. Available:https://arxiv.org/abs/1801.02480

[130] M. Cisse, Y. Adi, N. Neverova, and J. Keshet. (2017). ‘‘Hou-dini: Fooling deep structured prediction models.’’ [Online]. Available:https://arxiv.org/abs/1707.05373

[131] F. Tramèr, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel.(2017). ‘‘The space of transferable adversarial examples.’’ [Online].Available: https://arxiv.org/abs/1704.03453

[132] S. J. Oh, M. Fritz, and B. Schiele. (2017). ‘‘Adversarial image pertur-bation for privacy protection—A game theory perspective.’’ [Online].Available: https://arxiv.org/abs/1703.09471

[133] Y.-C. Lin, Z.-W. Hong, Y.-H. Liao, M.-L. Shih, M.-Y. Liu, and M. Sun.(2017). ‘‘Tactics of adversarial attack on deep reinforcement learningagents.’’ [Online]. Available: https://arxiv.org/abs/1703.06748

[134] K. R. Mopuri, U. Garg, and R. V. Babu. (2017). ‘‘Fast feature fool: A dataindependent approach to universal adversarial perturbations.’’ [Online].Available: https://arxiv.org/abs/1707.05572

[135] N. Kardan and K. O. Stanley, ‘‘Mitigating fooling with competitiveovercomplete output layer neural networks,’’ in Proc. Int. Joint Conf.Neural Netw., May 2017, pp. 518–525.

[136] Y. Dong, H. Su, J. Zhu, and F. Bao. (2017). ‘‘Towards interpretabledeep neural networks by leveraging adversarial examples.’’ [Online].Available: https://arxiv.org/abs/1708.05493

[137] Q. Wang et al. (2016). ‘‘Learning adversary-resistant deep neural net-works.’’ [Online]. Available: https://arxiv.org/abs/1612.01401

[138] J. Gao, B.Wang, Z. Lin, W. Xu, and Y. Qi. (2017). ‘‘DeepCloak: Maskingdeep neural network models for robustness against adversarial samples.’’[Online]. Available: https://arxiv.org/abs/1702.06763

[139] W. Xu, D. Evans, and Y. Qi. (2017). ‘‘Feature squeezing mitigatesand detects carlini/wagner adversarial examples.’’ [Online]. Available:https://arxiv.org/abs/1705.10686

[140] W. Bai, C. Quan, and Z. Luo, ‘‘Alleviating adversarial attacks via convo-lutional autoencoder,’’ in Proc. Int. Conf. Softw. Eng., Artif. Intell., Netw.Parallel/Distrib. Comput. (SNPD), Jun. 2017, pp. 53–58.

[141] A. P. Norton and Y. Qi, ‘‘Adversarial-Playground: A visualization suiteshowing how adversarial examples fool deep learning,’’ in Proc. IEEESymp. Vis. Cyber Secur., Oct. 2017, pp. 1–4.

[142] Y. Dong, F. Liao, T. Pang, X. Hu, and J. Zhu. (2017). ‘‘Discov-ering adversarial examples with momentum.’’ [Online]. Available:https://arxiv.org/abs/1710.06081

[143] S. Shen, R. Furuta, T. Yamasaki, and K. Aizawa, ‘‘Fooling neural net-works in face attractiveness evaluation: Adversarial examples with highattractiveness score but low subjective score,’’ inProc. IEEE 3rd Int. Conf.Multimedia Big Data, Apr. 2017, pp. 66–69.

[144] C. Szegedy, V. Vincent, S. Ioffe, J. Shlens, and Z.Wojna, ‘‘Rethinking theinception architecture for computer vision,’’ in Proc. IEEE Conf. Comput.Vis. Pattern Recognit., Jun. 2016, pp. 2818–2826.

[145] S. Sarkar, A. Bansal, U. Mahbub, and R. Chellappa. (2017). ‘‘UPSETand ANGRI: Breaking high performance image classifiers.’’ [Online].Available: https://arxiv.org/abs/1707.01159

[146] S. Das and P. N. Suganthan, ‘‘Differential evolution: A survey of thestate-of-the-art,’’ IEEE Trans. Evol. Comput., vol. 15, no. 1, pp. 4–31,Feb. 2011.

[147] J. Redmon and A. Farhadi. (2016). ‘‘YOLO9000: Better, faster, stronger.’’[Online]. Available: https://arxiv.org/abs/1612.08242

[148] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-timeobject detection with region proposal networks,’’ in Proc. Adv. Neural Inf.Process. Syst., 2015, pp. 91–99.

[149] D. Amodei et al. (2015). ‘‘Deep speech 2: End-to-end speechrecognition in English and Mandarin.’’ [Online]. Available:https://arxiv.org/abs/1512.02595

[150] A. Krizhevsky, ‘‘Learning multiple layers of features from tiny images,’’M.S. thesis, Dept. Comput. Sci., Univ. Toronto, Toronto, ON, Canada,2009.

[151] D. P. Kingma and M. Welling. (2013). ‘‘Auto-encoding variationalBayes.’’ [Online]. Available: https://arxiv.org/abs/1312.6114

[152] Y. Bengio, ‘‘Learning deep architectures for AI,’’ Found. Trends Mach.Learn., vol. 2, no. 1, pp. 1–127, 2009.

VOLUME 6, 2018 14429


[153] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, ‘‘Learning repre-sentations by back-propagating errors,’’ Nature, vol. 323, pp. 533–536,Oct. 1986.

[154] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ NeuralComput., vol. 9, no. 8, pp. 1735–1780, 1997.

[155] V. Mnih et al., ‘‘Human-level control through deep reinforcement learn-ing,’’ Nature, vol. 518, pp. 529–533, 2015.

[156] M. Volodymyr et al., ‘‘Asynchronous methods for deep reinforcementlearning,’’ in Proc. Int. Conf. Mach. Learn., 2016, pp. 1–10.

[157] J. Long, E. Shelhamer, and T. Darrell, ‘‘Fully convolutional networksfor semantic segmentation,’’ in Proc. IEEE Conf. Comput. Vis. PatternRecognit., Jun. 2015, pp. 3431–3440.

[158] A. Rozsa, M. Günther, E. M. Rudd, and T. E. Boult, ‘‘Are facial attributesadversarially robust?’’ in Proc. Int. Conf. Pattern Recognit., Dec. 2016,pp. 3121–3127.

[159] Z. Liu, P. Luo, X. Wang, and X. Tang, ‘‘Deep learning face attributes inthe wild,’’ in Proc. Int. Conf. Comput. Vis., 2015, pp. 3730–3738.

[160] V. Mirjalili and A. Ross, ‘‘Soft biometric privacy: Retaining biometricutility of face images while perturbing gender,’’ in Proc. Int. Joint Conf.Biometrics, 2017, pp. 1–10.

[161] K. Simonyan and A. Zisserman. (2015). ‘‘Very deep convolutionalnetworks for large-scale image recognition.’’ [Online]. Available:https://arxiv.org/abs/1409.1556

[162] D. Krotov and J. J. Hopfield, ‘‘Dense associative memory for pat-tern recognition,’’ in Proc. Adv. Neural Inf. Process. Syst., 2016,pp. 1172–1180.

[163] R. H. R. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, andH. S. Seung, ‘‘Digital selection and analogue amplification coexist in acortex-inspired silicon circuit,’’ Nature, vol. 405, no. 6789, pp. 947–951,Jun. 2000.

[164] G. Hinton, O. Vinyals, and J. Dean. (2014). ‘‘Distilling the knowledge ina neural network.’’ [Online]. Available: https://arxiv.org/abs/1503.02531

[165] H. Drucker and Y. Le Cun, ‘‘Improving generalization performanceusing double backpropagation,’’ IEEE Trans. Neural Netw., vol. 3, no. 6,pp. 991–997, Nov. 1992.

[166] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger.(2016). ‘‘Densely connected convolutional networks.’’ [Online]. Avail-able: https://arxiv.org/abs/1608.06993

[167] A. N. Bhagoji, D. Cullina, C. Sitawarin, and P. Mittal. (2017). ‘‘Enhanc-ing robustness of machine learning systems via data transformations.’’[Online]. Available: https://arxiv.org/abs/1704.02654

[168] Y. Dong et al. (2017). ‘‘Boosting adversarial attacks with momentum.’’[Online]. Available: https://arxiv.org/abs/1710.06081

[169] I. Goodfellow et al., ‘‘Generative adversarial nets,’’ in Proc. Adv. NeuralInf. Process. Syst., 2014, pp. 2672–2680.

[170] J. Kovačević and C. Amina, ‘‘An introduction to frames,’’ Found. TrendsSignal Process., vol. 2, no. 1, pp. 1–94, 2008.

[171] S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, ‘‘Contractiveauto-encoders: Explicit invariance during feature extraction,’’ inProc. Int.Conf. Mach. Learn., 2011, pp. 833–840.

[172] S. S. Liew, M. Khalil-Hani, and R. Bakhteri, ‘‘Bounded activation func-tions for enhanced training stability of deep neural networks on visualpattern recognition problems,’’ Neurocomputing, vol. 216, pp. 718–734,Dec. 2016.

[173] M. Abbasi and C. Gagné. (2017). ‘‘Robustness to adversarial examplesthrough an ensemble of specialists.’’ [Online]. Available: https://arxiv.org/abs/1702.06856

[174] A. Mogelmose, M. M. Trivedi, and T. B. Moeslund, ‘‘Vision-based traf-fic sign detection and analysis for intelligent driver assistance systems:Perspectives and survey,’’ IEEE Trans. Intell. Transp. Syst., vol. 13, no. 4,pp. 1484–1497, Dec. 2012.

[175] A. Vedaldi and K. Lenc, ‘‘MatConvNet—Convolutional neural networksfor MATLAB,’’ in Proc. ACM Int. Conf. Multimedia, 2015, pp. 689–692.

[176] Y. Jia et al. (2014). ‘‘Caffe: Convolutional architecture for fast featureembedding.’’ [Online]. Available: https://arxiv.org/abs/1408.5093

[177] M. Abadi et al. (2015). ‘‘TensorFlow: Large-scale machine learning onheterogeneous distributed systems.’’ [Online]. Available: https://arxiv.org/abs/1603.04467

[178] A. Giusti et al., ‘‘A machine learning approach to visual perception offorest trails for mobile robots,’’ IEEE Robot. Autom. Lett., vol. 1, no. 2,pp. 661–667, Jul. 2016.

[179] Objects Detection Machine Learning TensorFlow Demo. Accessed:Dec. 2017. [Online]. Available: https://play.google.com/store/apps/details?id=org.tensorflow.detect&hl=en

[180] Class Central, Deep Learning for Self-Driving Cars. Accessed:Dec. 2017. [Online]. Available: https://www.class-central.com/mooc/8132/6-s094-deep-learning-for-self-driving-cars

[181] C. Middlehurst. (2015). China Unveils World’s First FacialRecognition ATM. [Online]. Available: http://www.telegraph.co.uk/news/worldnews/asia/china/11643314/China-unveils-worlds-first-facial-recognition-ATM.html

[182] About Face ID Advanced Technology. Accessed: Dec. 2017. [Online].Available: https://support.apple.com/en-au/HT208108

[183] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, ‘‘Accessorizeto a crime: Real and stealthy attacks on state-of-the-art face recogni-tion,’’ in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., 2016,pp. 1528–1540.

[184] R. Shin and D. Song, ‘‘JPEG-resistant adversarial images,’’ in Proc.Mach. Learn. Comput. Secur. Workshop, 2017, pp. 1–6.

[185] W. Brendel and M. Bethge. (Apr. 2017). ‘‘Comment on ‘Biologicallyinspired protection of deep networks from adversarial attacks.’’’ [Online].Available: https://arxiv.org/abs/1704.01547

[186] N. Carlini and D. Wagner. (Nov. 2017). ‘‘MagNet and ‘efficient defensesagainst adversarial attacks’ are not robust to adversarial examples.’’[Online]. Available: https://arxiv.org/abs/1711.08478

NAVEED AKHTAR received themaster’s degree incomputer science from Hochschule Bonn-Rhein-Sieg, Germany and the Ph.D. degree in computervision from The University of Western Australia(UWA). His research in computer vision and pat-tern recognition has been published in prestigiousvenues of the field, including the IEEE CVPR andIEEE TPAMI. He has also served as a reviewer forthese venues. His current research interests includeadversarial machine learning, action recognition,

and hyperspectral image analysis. He served as a Research Fellow withthe Australian National University for one year. He has been a ResearchFellow at UWA since 2017. During his Ph.D., he was a recipient of multiplescholarships, and a runner-up for the Canon Extreme Imaging Competitionin 2015.

AJMAL MIAN received the Ph.D. degree (Hons.)from The University of Western Australia in 2006.He is currently with the School of ComputerScience and Software Engineering, The Univer-sity of Western Australia. His research interestsinclude computer vision, machine learning, 3-Dshape analysis, hyperspectral image analysis, pat-tern recognition, and multimodal biometrics. Hereceived the Australasian Distinguished DoctoralDissertation Award from Computing Research and

Education Association of Australasia. He received the prestigious Aus-tralian Post-doctoral and Australian Research Fellowships in 2008 and 2011,respectively. He received the UWA Outstanding Young Investigator Awardin 2011, the West Australian Early Career Scientist of the Year Awardin 2012, and the Vice-Chancellors Mid-Career Research Award in 2014. Hereceived seven Australian Research Council grants and one National Healthand Medical Research Council grant with a total funding of over $3 Million.He is a Guest Editor of Pattern Recognition, Computer Vision and ImageUnderstanding, and Image and Vision Computing journals.

14430 VOLUME 6, 2018

Threat of Adversarial Attacks on Deep Learning in Computer ...€¦ · N. Akhtar, A. Mian: Threat...

Documents

Transcript of Threat of Adversarial Attacks on Deep Learning in Computer ...€¦ · N. Akhtar, A. Mian: Threat...