Learning Deep Features for Discriminative Localization
• Global Average Pooling (GAP) Class Activation Mapping (CAM)
• Bounding box (localization)
•
Convolution layer activation
conv5
input image
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/object_localization_and_detection.html
conv5
input image
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/object_localization_and_detection.html
Unit
Unit
Activation
Global Average Pooling (GAP)
softmax Unit
Activation
Unit
Class Activation Map
Class Activation Map
Class Activation Map
• FC
• AlexNet, VGGnet, GoogLeNet
• GAP Unit Conv
• AlexNet: conv5 13x13
• VGGnet: conv5-3 14x14
• GoogLeNet: inception4e 14x14
• convolution GAP Softmax
• size: 3x3, stride: 1, padding: 1, unit: 1024
• ILSVRC 1000 130 fine tuning
• fine tune
Localization
20% BOX
Grand Truth
GAP CNN
• GAP SVM
•
•
•
•
• Visual Question Answering
• Hard-negative mining algorithm
• positive set:
• negative set:
• positive set: Google StreetView 350
• negative set:
Visual Question Answering
Class-Specific Units
• GAP Softmax Unit
• CNN bag of words
• Global Average Pooling (GAP) Class Activation Mapping (CAM)
• Bounding box (localization)
•
• FC
• Localization
• GAP CNN