3. Image Ranking: 5. Monocular Depth Estimation on KITTI · We learn the two continuous domain line...
Transcript of 3. Image Ranking: 5. Monocular Depth Estimation on KITTI · We learn the two continuous domain line...
1. ObjectiveOur goal is to substitute the 1-hot label representations in ordinalregression problems. Instead of building an ensemble of K-1 binaryclassifiers that distinguish rank thresholds, we use the same default,fully-connected output layer of an off-the-shelf classification CNN witha K-element soft label:
5. Monocular Depth Estimation on KITTISORD can estimate depth using segmentation networks, adapting ascale-increasing discretization method (SID) with K=120 intervals:
2. MethodOur Soft Ordinal labels (SORD) encode metric penalties à la Softmax :
• is the list of ordinal ranks in the problem.
• penalizes how far a given rank is from the true value .
When using Softmax and a cross-entropy loss, the computation of thegradient with respect to the output probits is simply:
SORD trains the network to learn features so that the outputs foreach rank match the interclass distance , reaching the minima when:
SORD Features:
- General-purpose: a wild variety of tasks can use SORD.
- Simple to implement: just two lines of code.
- No new CNN architectures: an off-the-shelf network is sufficient.
- Inference can be done with either argmax or expected value.
- Continuous domains: is valid when . This smoothly balancesthe SORD label without needing discretization or quantization.
3. Image Ranking: inputs with equally spaced labelsWe use an off-the-shelf VGG-16 and the metric:
Image Aesthetics dataset: 15K images labeled in 5 ordinal categories
Adience dataset: 26K images labeled in 8 age groups
1 ‐ Unacceptable 2 ‐ Flawed 3 ‐ Ordinary 4 ‐ Professional 5 ‐ Exceptional
1 – (0-2) 2 – (4-6) 3 – (8-13) 4 – (15-20)
5 – (25-32) 6 – (38-43) 7 – (48-53) 8 – (>60)
0.45
0.5
0.55
0.6
0.65
Accuracy Mean Average Error
Lean DNN 1‐HOT Niu et al CNN‐POR SORD
4. Horizon Lines in the Wild: 100K images from SfM modelsWe learn the two continuous domain line parameters by interpolating100 bins from the training data, using Resnet50 and the metrics:
65
70
75
80
85
90
95AUC Score (%)
HLW SLNetSORD SORD‐EV
Raúl Díaz and Amit Marathe{raul.diaz.garcia, amit.marathe}@hp.com
Overall
Nature
AnimalUrban
People
Accuracy %Overall
Nature
AnimalUrban
People
Mean Absolute Error
RED‐SVM 1‐HOT Niu et al CNN‐POR SORD
Results on validation data
FCN Squared FCN Squared Log DeepLabv3+ Squared DeepLabv3+ Squared Log DeepLabv3+ Squared Log (Cityscapes)
6
6.5
7
7.5
8
8.5
9
absErrorRel1
1.21.41.61.82
2.22.4
sqErrorRel90
92
94
96
98
δ < 1.259
10
11
12
13
SILog
8
9
10
11
12
13
absErrorRel2
2.5
3
3.5
4
4.5
sqErrorRel12
13
14
15
16
iRMSE11
12
13
14
15
SILog
Input Image Depth Prediction Error Map
Selected References[1] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao. Deep ordinal
regression network for monocular depth estimation. In CVPR, 2018.
[2] J.-T. Lee, H.-U. Kim, C. Lee, and C.-S. Kim. Semantic line detection and its applications. In ICCV, 2017.
[3] Y. Liu, A. Wai Kin Kong, and C. Keong Goh. A constrained deep neural network for ordinal regression. In CVPR, 2018.
[4] S. Workman, M. Zhai, and N. Jacobs. Horizon lines in the wild. In BMVC, 2016.
CNN
F T
F T
F T
F T
𝑦 𝑟 ?
𝑦 𝑟 ?
𝑦 𝑟 ?
𝑦 𝑟 ?
Binary Classifiers
1
2
3
4
5
6
∙∙∙K
Soft Label
Results on the official KITTI benchmark
APMoE DABC VGG16‐Unet SORD DORN