3. Image Ranking: 5. Monocular Depth Estimation on KITTI · We learn the two continuous domain line...

1
1. Objective Our goal is to substitute the 1-hot label representations in ordinal regression problems. Instead of building an ensemble of K-1 binary classifiers that distinguish rank thresholds, we use the same default, fully-connected output layer of an off-the-shelf classification CNN with a K-element soft label: 5. Monocular Depth Estimation on KITTI SORD can estimate depth using segmentation networks, adapting a scale-increasing discretization method (SID) with K=120 intervals: 2. Method Our Soft Ordinal labels (SORD) encode metric penalties à la Softmax : is the list of ordinal ranks in the problem. penalizes how far a given rank is from the true value . When using Softmax and a cross-entropy loss, the computation of the gradient with respect to the output probits is simply: SORD trains the network to learn features so that the outputs for each rank match the interclass distance , reaching the minima when: SORD Features: - General-purpose: a wild variety of tasks can use SORD. - Simple to implement: just two lines of code. - No new CNN architectures: an off-the-shelf network is sufficient. - Inference can be done with either argmax or expected value. - Continuous domains: is valid when . This smoothly balances the SORD label without needing discretization or quantization. 3. Image Ranking: inputs with equally spaced labels We use an off-the-shelf VGG-16 and the metric: Image Aesthetics dataset: 15K images labeled in 5 ordinal categories Adience dataset: 26K images labeled in 8 age groups 1 ‐ Unacceptable 2 ‐ Flawed 3 ‐ Ordinary 4 ‐ Professional 5 ‐ Exceptional 1 – (0-2) 2 – (4-6) 3 – (8-13) 4 – (15-20) 5 – (25-32) 6 – (38-43) 7 – (48-53) 8 – (>60) 0.45 0.5 0.55 0.6 0.65 Accuracy Mean Average Error Lean DNN 1‐HOT Niu et al CNN‐POR SORD 4. Horizon Lines in the Wild: 100K images from SfM models We learn the two continuous domain line parameters by interpolating 100 bins from the training data, using Resnet50 and the metrics: 65 70 75 80 85 90 95 AUC Score (%) HLW SLNet SORD SORD‐EV Raúl Díaz and Amit Marathe {raul.diaz.garcia, amit.marathe}@hp.com Overall Nature Animal Urban People Accuracy % Overall Nature Animal Urban People Mean Absolute Error RED‐SVM 1‐HOT Niu et al CNN‐POR SORD Results on validation data FCN Squared FCN Squared Log DeepLabv3+ Squared DeepLabv3+ Squared Log DeepLabv3+ Squared Log (Cityscapes) 6 6.5 7 7.5 8 8.5 9 absErrorRel 1 1.2 1.4 1.6 1.8 2 2.2 2.4 sqErrorRel 90 92 94 96 98 δ < 1.25 9 10 11 12 13 SILog 8 9 10 11 12 13 absErrorRel 2 2.5 3 3.5 4 4.5 sqErrorRel 12 13 14 15 16 iRMSE 11 12 13 14 15 SILog Input Image Depth Prediction Error Map Selected References [1] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao. Deep ordinal regression network for monocular depth estimation. In CVPR, 2018. [2] J.-T. Lee, H.-U. Kim, C. Lee, and C.-S. Kim. Semantic line detection and its applications. In ICCV, 2017. [3] Y. Liu, A. Wai Kin Kong, and C. Keong Goh. A constrained deep neural network for ordinal regression. In CVPR, 2018. [4] S. Workman, M. Zhai, and N. Jacobs. Horizon lines in the wild. In BMVC, 2016. CNN F T F T F T F T ? ? ? ? Binary Classifiers 1 2 3 4 5 6 K Soft Label Results on the official KITTI benchmark APMoE DABC VGG16‐Unet SORD DORN

Transcript of 3. Image Ranking: 5. Monocular Depth Estimation on KITTI · We learn the two continuous domain line...

1. ObjectiveOur goal is to substitute the 1-hot label representations in ordinalregression problems. Instead of building an ensemble of K-1 binaryclassifiers that distinguish rank thresholds, we use the same default,fully-connected output layer of an off-the-shelf classification CNN witha K-element soft label:

5. Monocular Depth Estimation on KITTISORD can estimate depth using segmentation networks, adapting ascale-increasing discretization method (SID) with K=120 intervals:

2. MethodOur Soft Ordinal labels (SORD) encode metric penalties à la Softmax :

• is the list of ordinal ranks in the problem.

• penalizes how far a given rank is from the true value .

When using Softmax and a cross-entropy loss, the computation of thegradient with respect to the output probits is simply:

SORD trains the network to learn features so that the outputs foreach rank match the interclass distance , reaching the minima when:

SORD Features:

- General-purpose: a wild variety of tasks can use SORD.

- Simple to implement: just two lines of code.

- No new CNN architectures: an off-the-shelf network is sufficient.

- Inference can be done with either argmax or expected value.

- Continuous domains: is valid when . This smoothly balancesthe SORD label without needing discretization or quantization.

3. Image Ranking: inputs with equally spaced labelsWe use an off-the-shelf VGG-16 and the metric:

Image Aesthetics dataset: 15K images labeled in 5 ordinal categories

Adience dataset: 26K images labeled in 8 age groups

1 ‐ Unacceptable 2 ‐ Flawed 3 ‐ Ordinary 4 ‐ Professional 5 ‐ Exceptional

1 – (0-2) 2 – (4-6) 3 – (8-13) 4 – (15-20)

5 – (25-32) 6 – (38-43) 7 – (48-53) 8 – (>60)

0.45

0.5

0.55

0.6

0.65

Accuracy Mean Average Error

Lean DNN 1‐HOT Niu et al CNN‐POR SORD

4. Horizon Lines in the Wild: 100K images from SfM modelsWe learn the two continuous domain line parameters by interpolating100 bins from the training data, using Resnet50 and the metrics:

65

70

75

80

85

90

95AUC Score (%)

HLW SLNetSORD SORD‐EV

Raúl Díaz and Amit Marathe{raul.diaz.garcia, amit.marathe}@hp.com

Overall

Nature

AnimalUrban

People

Accuracy %Overall

Nature

AnimalUrban

People

Mean Absolute Error

RED‐SVM 1‐HOT Niu et al CNN‐POR SORD

Results on validation data

FCN Squared FCN Squared Log DeepLabv3+ Squared DeepLabv3+ Squared Log DeepLabv3+ Squared Log (Cityscapes)

6

6.5

7

7.5

8

8.5

9

absErrorRel1

1.21.41.61.82

2.22.4

sqErrorRel90

92

94

96

98

δ < 1.259

10

11

12

13

SILog

8

9

10

11

12

13

absErrorRel2

2.5

3

3.5

4

4.5

sqErrorRel12

13

14

15

16

iRMSE11

12

13

14

15

SILog

Input Image Depth Prediction Error Map

Selected References[1] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao. Deep ordinal

regression network for monocular depth estimation. In CVPR, 2018.

[2] J.-T. Lee, H.-U. Kim, C. Lee, and C.-S. Kim. Semantic line detection and its applications. In ICCV, 2017.

[3] Y. Liu, A. Wai Kin Kong, and C. Keong Goh. A constrained deep neural network for ordinal regression. In CVPR, 2018.

[4] S. Workman, M. Zhai, and N. Jacobs. Horizon lines in the wild. In BMVC, 2016.

CNN

F T

F T

F T

F T

𝑦 𝑟 ?

𝑦 𝑟 ?

𝑦 𝑟 ?

𝑦 𝑟 ?

Binary Classifiers

1

2

3

4

5

6

∙∙∙K

Soft Label

Results on the official KITTI benchmark

APMoE DABC VGG16‐Unet SORD DORN