Dual-Gradients Localization framework for Weakly ...4 Dual-Gradients Localization(DGL) framework...
Transcript of Dual-Gradients Localization framework for Weakly ...4 Dual-Gradients Localization(DGL) framework...
1
Dual-Gradients Localization framework for Weakly Supervised Object Localization
Chuangchuang Tan 1*, Tao Ruan 1*, Guanghua Gu 2, Shikui Wei 1, Yao Zhao 1
1 Beijing Jiaotong University 2 Yanshan University
2
Target
Beijing Jiaotong University
o Weakly Supervised Object Localization (WSOL)
o WSOL is understanding an image at pixel level only using image-level annotations
o use much cheaper annotations
3
WSOL
Beijing Jiaotong University
o Steps of previous works ๏ผo Force classification network to focus on more regions of feature map.
o Produce localization map on the last convolutional layer by applying CAM.
o Problem:o ignore the localization ability of other layers.
o Both localization and classification tasks are
trained online I can produce WSOL, too
4
Dual-Gradients Localization(DGL) framework
Beijing Jiaotong University
o Characteristicso Simple, DGL is a offline approach,
neednโt to train for localization.
o Effective, achieving localization on any convolutional layer.
o Main ideas๏ผo Utilize gradients of classification loss function to mine entire target object regions.
o Leverage gradients of target class to identify the correlation ratio of pixels to the target
class within any convolutional feature maps
Source image Mixed_6f Mixed_6e
5
๐๐
Overview of the DGL framework
Beijing Jiaotong University
Cross-entropy loss
GAP
softmax
FCโฆ โฆ
๐๐ฝ(p, ๐ผ๐ฆ๐)
๐๐
๐๐ฆ๐ถ๐๐
Class-aware Enhanced Map Branch
โ
๐2 ๐๐๐๐๐๐๐๐ง๐
๐2 ๐๐๐๐๐๐๐๐ง๐
Enhanced map โจ
Pixel-level Selection Branch
โฆ
Classification model
Feature Maps
๐ ๐ข๐ ๐๐๐ ๐๐๐ ๐๐ง๐
๐๐ ๐๐โ1 ๐๐
๐๐
Localization Maps
6
Classification model
Beijing Jiaotong University
Cross-entropy loss
GAP
softmax
FCโฆ โฆClassification model
๐๐ ๐๐โ1 ๐๐
o Classification model architecture๏ผo use a customized InceptionV3, i.e. SPG-plain.
o remove the layers after the second Inception block, i.e., the third Inception block, pooling and linear layer.
o add two convolutional layers
o add a GAP layer and a softmax layer
7
๐๐
Class-aware Enhanced Map Branch
Beijing Jiaotong University
๐cost(p, ๐ผ๐ฆ๐)
๐๐
Class-aware Enhanced Map Branch
โ
๐2 ๐๐๐๐๐๐๐๐ง๐
๐2 ๐๐๐๐๐๐๐๐ง๐
Enhanced map A
Feature Maps
๐๐
o feature maps predicted to class c only capture the discrimination parts of objects, when the feature maps close the boundary of classification regions
o the feature maps located at center of classification regions can highlight more object regions
8
๐๐
Class-aware Enhanced Map Branch
Beijing Jiaotong University
๐cost(p, ๐ผ๐ฆ๐)
๐๐
Class-aware Enhanced Map Branch
โ
๐2 ๐๐๐๐๐๐๐๐ง๐
๐2 ๐๐๐๐๐๐๐๐ง๐
Enhanced map A
Feature Maps
๐๐
o our key idea of Class-aware Enhanced Map is pulling the feature maps toward inside of the classification region for specific-class, along with gradients of classification loss function.
9
Pixel-level Selection Branch
Beijing Jiaotong University
๐๐ฆ๐ถ๐๐
โจ
Pixel-level Selection Branch
โฆ๐ ๐ข๐ ๐๐๐ ๐๐๐ ๐๐ง๐
Enhanced map A
o Is gradients or weights?o CAM actually achieves localization by employing a weighted sum of
feature maps and gradients of target class on the last convolutional layer, instead of weights of the final FC layer.
o Pixel-level Selection is a generalization to CAM.
10
Results on the Validation Set of LID
Beijing Jiaotong University
MS: Multi-scale inputs during test MC: Morph close the localization map during test
MS MC mIoU
โ โ 58.23
โ โ 61.46
โ โ 62.22
o Fusion the localization maps of branch1 and branch2 on Mixed_6e layer.
o Input size 324
11
o Examples of DGL on test set
Qualitative Results
Beijing Jiaotong University
12
Thanks
Beijing Jiaotong University