Robust Discriminative Localization Maps · tacks [1, 3, 6]. Generally, one is interested in the...

2
Robust Discriminative Localization Maps Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, James Storer Brandeis University {aprakash,nemtiax,solomongarber,dilant,storer}@brandeis.edu Abstract Activation maps obtained from CNN filter responses have been used to visualize and improve the performance of deep learning models. However, as CNNs are suscepti- ble to adversarial attack, so are the activation maps. While recovering the predictions of the classifier is a difficult task and often requires complex transformations, we show that recovering activation maps is trivial and does not require any changes either to the classifier or the input image. Code: github.com/iamaaditya/robust-activation-maps 1. Activation Maps Convolution filters learned from natural images are sen- sitive to the object classes in the training data [7], [5]. In other words, regions of an image which contain relevant ob- jects tend to produce high activations when convolved with learned filters, while regions that do not contain any object class have lower activations, and this is particularly true for the deeper levels of the CNN. Taking the spatial average of the feature map (global average pooling) at the last convo- lution layer reveals possible locations of objects within the image [7], [10]. These kind of activations maps are popu- larly known as Class Activation Maps (CAM). Consider a convolutional network with k output channels on the final convolution layer (f ) with spatial dimensions of x and y, and let w be a vector of size k which is the result of applying a global max pool on each channel. This reduces channel to a single value, w k . The class activation map, M c for a class c is given by: M c (x, y)= X k w c k f k (x, y) (1) Class activation maps are a valuable tool for approxi- mate semantic object localization. CAMs have been used to improve image classification [10], visual question an- swering [11], semantic image compression [8] and image retrieval [2] and as attention networks [4] for various com- puter vision related tasks. 2. Impact of adversary It has been established that most image classification models can easily be fooled [9, 1]. Several techniques have been proposed which can generate an image that is percep- tually indistinguishable from another image but is classi- fied differently. This can be done robustly when model parameters are known, a paradigm called white-box at- tacks [1, 3, 6]. Generally, one is interested in the activation map for the class for which the model assigns the highest probability. However, in the presence of adversarial perturbations to the input, the highest-probability class is likely to be incorrect. This implies that Class Activation Maps will also be incor- rect for these adversarial images. The left column of 2 depicts CAM for clean images. The middle column shows CAM for the same image after it has been altered by the ad- versary. In all cases, it is evident that CAM under an adver- sary is an incorrect depiction of the object and its associated activations Fortunately, our experiments show that an adversary which successfully changes the most likely class tends to leave the rest of the top-k predicted classes unchanged. Our experiments show that 38% of the time the predicted class of adversarial images is the second highest class of the model for the clean image. Figure 1. Left: Rank of adversarial class within the top-5 predic- tions for original images. Right: Rank of original class within the top-5 predictions for adversarial images. In both cases, 0 means the class was not in the top-5. In 64% of the cases the mis-classified class was still within Top-5 picks barring the most likely class. As seen in Figure 1, the predicted class of the perturbed image is very frequently among the classifier’s top-5 predictions for 1

Transcript of Robust Discriminative Localization Maps · tacks [1, 3, 6]. Generally, one is interested in the...

Page 1: Robust Discriminative Localization Maps · tacks [1, 3, 6]. Generally, one is interested in the activation map for the class for which the model assigns the highest probability. However,

Robust Discriminative Localization Maps

Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, James StorerBrandeis University

{aprakash,nemtiax,solomongarber,dilant,storer}@brandeis.edu

Abstract

Activation maps obtained from CNN filter responseshave been used to visualize and improve the performanceof deep learning models. However, as CNNs are suscepti-ble to adversarial attack, so are the activation maps. Whilerecovering the predictions of the classifier is a difficult taskand often requires complex transformations, we show thatrecovering activation maps is trivial and does not requireany changes either to the classifier or the input image.

Code: github.com/iamaaditya/robust-activation-maps

1. Activation Maps

Convolution filters learned from natural images are sen-sitive to the object classes in the training data [7], [5]. Inother words, regions of an image which contain relevant ob-jects tend to produce high activations when convolved withlearned filters, while regions that do not contain any objectclass have lower activations, and this is particularly true forthe deeper levels of the CNN. Taking the spatial average ofthe feature map (global average pooling) at the last convo-lution layer reveals possible locations of objects within theimage [7], [10]. These kind of activations maps are popu-larly known as Class Activation Maps (CAM).

Consider a convolutional network with k output channelson the final convolution layer (f ) with spatial dimensions ofx and y, and let w be a vector of size k which is the result ofapplying a global max pool on each channel. This reduceschannel to a single value, wk. The class activation map, Mc

for a class c is given by:

Mc(x, y) =∑k

wck fk(x, y) (1)

Class activation maps are a valuable tool for approxi-mate semantic object localization. CAMs have been usedto improve image classification [10], visual question an-swering [11], semantic image compression [8] and imageretrieval [2] and as attention networks [4] for various com-puter vision related tasks.

2. Impact of adversaryIt has been established that most image classification

models can easily be fooled [9, 1]. Several techniques havebeen proposed which can generate an image that is percep-tually indistinguishable from another image but is classi-fied differently. This can be done robustly when modelparameters are known, a paradigm called white-box at-tacks [1, 3, 6].

Generally, one is interested in the activation map for theclass for which the model assigns the highest probability.However, in the presence of adversarial perturbations to theinput, the highest-probability class is likely to be incorrect.This implies that Class Activation Maps will also be incor-rect for these adversarial images. The left column of 2depicts CAM for clean images. The middle column showsCAM for the same image after it has been altered by the ad-versary. In all cases, it is evident that CAM under an adver-sary is an incorrect depiction of the object and its associatedactivations

Fortunately, our experiments show that an adversarywhich successfully changes the most likely class tends toleave the rest of the top-k predicted classes unchanged. Ourexperiments show that 38% of the time the predicted classof adversarial images is the second highest class of themodel for the clean image.

0 1 2 3 4 5Predicted class. 0 means not in Top-5

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Frac

tion

of im

ages

in th

e se

t

Frequency of Adv class in Top-5 of Original image

0 1 2 3 4 5Predicted class. 0 means not in Top-5

0.0

0.1

0.2

0.3

0.4

Frac

tion

of im

ages

in th

e se

t

Frequency of Target class picked by Untargeted Attacks

Figure 1. Left: Rank of adversarial class within the top-5 predic-tions for original images. Right: Rank of original class within thetop-5 predictions for adversarial images. In both cases, 0 meansthe class was not in the top-5.

In 64% of the cases the mis-classified class was stillwithin Top-5 picks barring the most likely class. As seenin Figure 1, the predicted class of the perturbed image isvery frequently among the classifier’s top-5 predictions for

1

Page 2: Robust Discriminative Localization Maps · tacks [1, 3, 6]. Generally, one is interested in the activation map for the class for which the model assigns the highest probability. However,

the original image. In fact, nearly 40% of the time, the ad-versarial class was the second most-probable class of theoriginal image.

ImageNet has one thousand classes, many of which arefine-grained. Frequently, the second most likely class is asynonym or close relative of the main class (e.g. “IndianElephant” and “African Elephant”). To obtain a map whichis robust to fluctuations of the most likely class, we takean exponentially weighted average of the maps of the top-kclasses.

M(x, y) =

k∑i

Mci(x, y)

2i(2)

We will refer to this as robust activation maps (M(x, y)).We normalize the map by diving it by its max so that valuesare in the range of [0, 1]. Even if the top-1 class is incorrect,this averaging reduces the impact of mis-localization of theobject in the image.

The appropriate number of classes k to average over de-pends on the total number of classes. For ImageNet-1000,we used a fixed k = 5. While each possible class has itsown class activation map (CAM), only a single robust acti-vation map is generated for a particular image, combininginformation about all classes. ImageNet covers wide va-riety of object classes and most structures found in otherdatasets are represented in ImageNet even if class namesare not bijectional. Therefore, Robust Activation Maps istrained once on ImageNet but can also localize objects fromPascal-VOC or Traffic Signs.

References[1] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining

and harnessing adversarial examples. CoRR, abs/1412.6572,2014.

[2] A. Jimenez, J. M. Alvarez, and X. Giro. Class-weightedconvolutional features for visual instance search. CoRR,abs/1707.02581, 2017.

[3] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial ex-amples in the physical world. CoRR, abs/1607.02533, 2016.

[4] K. Li, Z. Wu, K.-C. Peng, J. Ernst, and Y. Fu. Tell mewhere to look: Guided attention inference network. CoRR,abs/1802.10171, 2018.

[5] M. Lin, Q. Chen, and S. Yan. Network in network. arXivpreprint arXiv:1312.4400, 2013.

[6] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, andA. Vladu. Towards deep learning models resistant to adver-sarial attacks. CoRR, abs/1706.06083, 2017.

[7] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Is object lo-calization for free?-weakly-supervised learning with convo-lutional neural networks. In Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition, 2015.

[8] A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer.Semantic perceptual image compression using deep convolu-tion networks. 2017 Data Compression Conference (DCC),2017.

CAM on clean ImageTop Class: Warplane (0.91)

CAM on adversarial ImageTop Class: Flatworm (0.99)

Robust CAM on adversarial ImageTop Class: Flatworm (0.99)

CAM on clean ImageTop Class: Warplane (0.91)

CAM on adversarial ImageTop Class: Meat Loaf (0.99)

Robust CAM on adversarial ImageTop Class: Meat Loaf (0.99)

CAM on clean ImageTop Class: Ibizan hound (0.84)

CAM on adversarial ImageTop Class: Pizza (0.98)

Robust CAM on adversarial ImageTop Class: Pizza (0.99)

CAM on clean ImageTop Class: Cabbage Butterfly (0.84)

CAM on adversarial ImageTop Class: Spatula (0.99)

Robust CAM on adversarial ImageTop Class: Spatula (0.99)

CAM on clean ImageTop Class: Labrador Retriever (0.97)

CAM on adversarial ImageTop Class: Tarantula (0.98)

Robust CAM on adversarial ImageTop Class: Tarantula (0.98)

Figure 2. Difference between class activation maps and robust ac-tivation maps under the presence of an adversary.

[9] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan,I. J. Goodfellow, and R. Fergus. Intriguing properties of neu-ral networks. CoRR, abs/1312.6199, 2013.

[10] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Tor-ralba. Learning deep features for discriminative localization.In The IEEE Conference on Computer Vision and PatternRecognition (CVPR), June 2016.

[11] B. Zhou, Y. Tian, S. Sukhbaatar, A. Szlam, and R. Fer-gus. Simple baseline for visual question answering. CoRR,abs/1512.02167, 2015.

2