ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

ILSVRC 2015 CLS-LOC.

Multi-Class AttentionNet.D. Yoo1, K. Paeng1, S. Park1, S. Hwang2, H. E. Kim2, J. Lee2,

M. Jang2, A. S. Paek2, K. K. Kim1, S. D. Kim1, I. S. Kweon1.1KAIST, 2Lunit Inc.

State-of-the-art methods for object localization.

1) Box-regression with a CNN.

[Szegedy et al., NIPS’13],

DeepMultiBox [Erhan et al., CVPR’14],OverFeat [Sermanet et al., ICLR’14],

1) Box-regression with a CNN.

(−) Direct mapping from an image to an exactbounding box is relatively difficult for a CNN.

(X1,Y1)

(X2,Y2)

2) Region proposal + classifier.

R-CNN [Gkioxari et al., CVPR’14],Fast R-CNN [Gkioxari, ICCV’15],

Faster R-CNN [Ren et al., NIPS’15],DeepMultiBox [Erhan et al., CVPR’14],

2) Region proposal + classifier.

(−) Prone to focus on discriminative part (e.g. face)rather than entire object (e.g. human body).

Idea:Ensemble of weak directions.

Stop signal.

Model:

Model:(CNN regression model)

Model:Rather than CNN regression model, we use CNN classification model.

Define weak directions:fixed length, and quantized.

Strength to the previous methods.

Box-regression:(−) Relatively

difficult for a CNN.

Weak direction:

(+) Relatively

easy for a CNN.

Strength to the previous methods.

R-CNN:(−) Focuses on

distinctive parts.

Box-regression:(−) Relatively

difficult for a CNN.

Weak direction:

(+) Relatively

easy for a CNN.

Stop signal:

(+) Supervision of

clear terminal point.

Stop signal.

AttentionNet:Two layers for each corner.

Top-left corner. Bottom-right corner.

AttentionNet: iterative classification.

AttentionNet.

Resize

AttentionNet.

Resize

Reject.

Detected.

AttentionNet.

• &&

Resize

Reject.

Detected.

AttentionNet.

• &&

Resize

Reject.

Detected.

AttentionNet.

• &&

Resize

Reject.

Detected.

AttentionNet.

• &&

Resize

Reject.

Detected.

AttentionNet.

• &&

Resize

Reject.

Initial box proposal:

Boxes satisfying .

Rejected.

Boxes satisfying .

Rejected.

Boxes satisfying .

Continue.

Boxes satisfying .

Detected.

AttentionNet.

• &&

Resize

Reject.

Boxes satisfying .

Multi-{scale, aspect ratio} sliding window searchusing fully-convolutional network.

Initial detection and refinement.

Extension to multiple classes.

AttentionNet.

Class 1.

Multi-class AttentionNet.

Class 1. Class 2.

Class 1. Class 2. Class 3.

Class 1. Class 2. Class 3. Class N.

Class-wise direction layers. Classification layer.

Final architecture.

•↘ →↓ •↑ ←↖

Conv8-C1-TL.

1*1*1,024*4

Conv8-C1-BR.

1*1*1,024*4

Conv8-C2-TL.

1*1*1,024*4

Conv8-C2-BR.

1*1*1,024*4

Conv8-CN-TL.

1*1*1,024*4

Conv8-CN-BR.

1*1*1,024*4

•↘ →↓ •↑ ←↖ •↘ →↓ •↑ ←↖

Conv8-CLS.

1*1*1,024*(N+1)

FC1 C2 C3 ⋯ CN

� � � ��

Classification layer.

Directional layers.

GoogLeN

l, CVPR’1

Training multi-class AttentionNet.

•Pre-training.• GoogLeNet [Szegedy et al, CVPR’15].

• ILSVRC-CLS dataset.

•Pre-training.• GoogLeNet [Szegedy et al, CVPR’15].

• ILSVRC-CLS dataset.

•Fine-tuning.• # epochs: 5.

• # training region: 22M. (randomly generated.)• Learning rate of the classification layer: 0.01.

• Learning rate of the 2K(=1K+1K) directional layers: 0.01.

• Learning rate of the layers from conv1 to conv21: 0.001.

𝐿𝑜𝑠𝑠 =1

3𝐿𝑜𝑠𝑠𝑇𝐿 +

3𝐿𝑜𝑠𝑠𝐵𝑅 +

3𝐿𝑜𝑠𝑠𝐶𝐿𝑆,

𝐿𝑜𝑠𝑠 =1

3𝐿𝑜𝑠𝑠𝑇𝐿 +

3𝐿𝑜𝑠𝑠𝐵𝑅 +

3𝐿𝑜𝑠𝑠𝐶𝐿𝑆,

𝐿𝑜𝑠𝑠𝑇𝐿 =1

𝑖=1

𝑡𝑐𝑖𝑇𝐿 ≠ 0 ⋅ 𝑆𝑜𝑓𝑡𝑀𝑎𝑥𝐿𝑜𝑠𝑠 𝑦𝑐𝑖

𝑇𝐿, 𝑡𝑐𝑖𝑇𝐿 ,

𝐿𝑜𝑠𝑠𝐵𝑅 =1

𝑖=1

𝑡𝑐𝑖𝐵𝑅 ≠ 0 ⋅ 𝑆𝑜𝑓𝑡𝑀𝑎𝑥𝐿𝑜𝑠𝑠 𝑦𝑐𝑖

𝐵𝑅 , 𝑡𝑐𝑖𝐵𝑅 ,

𝐿𝑜𝑠𝑠𝐶𝐿𝑆 = 𝑆𝑜𝑓𝑡𝑀𝑎𝑥𝐿𝑜𝑠𝑠 𝑦𝐶𝐿𝑆, 𝑡𝐶𝐿𝑆 .

Test:Given top-5 class predictions,

we detect the classes by AttentionNet.

•Top-5 class prediction (7% Err): Ensemble of GoogLeNet, GoogLeNet-BN, VGG-16.

•Number of multi-{scale, aspect ratio} inputs: 6.

Results on validation set.

Method. Top-5 CLS-LOC Error.

OverFeat [Sermanet et al., ICLR’14] 30.00%

VGG [Simonyan and Zisserman, ICLR’15] 26.90%

GoogLeNet [Szegedy et al, CVPR’15] 26.70% (test set)

A single “Multi-class AttentionNet”, without test augmentation.

16.11%

A single “Multi-class AttentionNet”, with test augmentation (original and flip).

14.96%

16.11%

A single “Multi-class AttentionNet”, with test augmentation (original and flip).

14.96%

Note that we use a SINGLE “Multi-class AttiontionNet”.

Related publication:

Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony S. Paek, In So Kweon,

AttentionNet: Aggregating Weak Directions for Accurate Object Detection,

In ICCV, 2015.

ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Documents

Transcript of ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Object Recognition 2cvlab.khu.ac.kr/CVLecture20.pdf · 2018-11-27 · VGG Net 4 VGG “Very Deep Convolutional Networks for Large-Scale Image Recognition“. 2015 - ILSVRC 2014 Winner

Deep Residual Learning - cs.kangwon.ac.krcs.kangwon.ac.kr/~leeck/AI2/deep_residual_learning.pdf · Deep Residual Learning MSRA @ ILSVRC & COCO 2015 competitions Kaiming He with Xiangyu

Large Scale Visual Recognition Challenge (ILSVRC) 2013: Classification spotlights

Cloud-based Implementation of New Frontline Clinical Workflows · Jennie Yoo1, Peter Oishi MD2, Shelley Diane RN2, Gemma Jamena MD3, Phyllis Pei RN3, Hillary Baldocchi NP3, Tam Nguyen

Deep Residual Learning - ImageNet · Deep Residual Learning MSRA @ ILSVRC & COCO 2015 competitions Kaiming He with Xiangyu Zhang, Shaoqing Ren, ... Lo c a lR e s pNor m Ma x Pool

Large Scale Visual Recognition Challenge (ILSVRC) 2017 · 2017-07-27 · (VID) 3 0.758 0.545 THU-CAS 0 0.730 0.512 IC&USYD Jiankang Deng1, Yuxiang Zhou1, Baosheng Yu2, Zhe Chen2,

Liganddependent interaction of PPARδ with T cell …...1 Liganddependent interaction of PPARδ with T cell protein tyrosine phosphatase 45 enhances insulin signaling Taesik Yoo1,

Antioxidant Activity of Lactic Acid Bacteria Isolated from Korean Traditional Food … · from Korean Traditional Food Kimchi Da-Young Kim1, Hong Seok Kim2, Jung Sik Yoo1, Yoon Ah

S L: Self-Supervised Semi-Supervised Learning · erature where semi-supervised learning algorithms are eval-uated on larger, more challenging datasets such as ILSVRC-2012 [31]. To

Chapter 1 Performance Analysis Tool for HPC and Big … 1 Performance Analysis Tool for HPC and Big Data Applications on Scientiﬁc Clusters Wucherl Yoo1, Michelle Koo2, Yi Cao3,

ClinicalSigniﬁcanceofFourMolecularSubtypesof Gastric ... · Atlas Project Bo Hwa Sohn1, Jun-Eul Hwang2, Hee-Jin Jang1,3, Hyun-Sung Lee3, Sang Cheul Oh4, Jae-Jun Shim5, Keun-Wook

Visual Path Prediction in Complex Scenes With Crowded Moving … · 2016-05-16 · Visual Path Prediction in Complex Scenes with Crowded Moving Objects YoungJoon Yoo1, Kimin Yun1,

DeViSE: A Deep Visual-Semantic Embedding Modelpapers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model.pdf · predict one of 1,000 object categories from the ILSVRC

Byung *, Alphonsus H.C. Ng ‡, and James R. Heath † These ... · 1 Supplementary Information for 4D Electron Microscopy of T-Cell Activation Yue Lu1,2 ,†, Byung-Kuk Yoo1 †,*,

Large Scale Visual Recognition Challenge (ILSVRC) 2013: Classification spotlights.

Hyponormality of Toeplitz operators in several …wylee/2019LMA.pdfHyponormality of Toeplitz operators in several variables by the weighted shifts approach Caixing Gu1, In Sung Hwang2

Change of the upper airway after mandibular setback ...s-space.snu.ac.kr/bitstream/10371/164767/1/40902... · open bite Kyungjin Lee1 and Soon Jung Hwang2* Abstract Purpose: It has

Large Scale Visual Recognition Challenge (ILSVRC) 2013: Detection spotlights.

ImageNet Classification with Deep Convolutional Neural ... et al DNN ImageNet 2012.pdfneural networks to date on the subsets of ImageNet used in the ILSVRC-2010 and ILSVRC-2012 ...

OXFORD VGG ILSVRC 2012 - image-net.org

Byung , Alphonsus H.C. Ng ‡, and James R. Heath † These ... · 1 Supplementary Information for 4D Electron Microscopy of T-Cell Activation Yue Lu1,2 ,†, Byung-Kuk Yoo1 †,,