TheLovász-Softmaxloss2018.ds3-datascience-polytechnique.fr/wp-content/... ·...
Transcript of TheLovász-Softmaxloss2018.ds3-datascience-polytechnique.fr/wp-content/... ·...
The Lovász-Softmax lossA tractable surrogate for the optimization of the
intersection-over-union measure in neural networks
Maxim Berman Amal Rannen Triki Matthew B. BlaschkoKU Leuven – Dept. ESAT, Center for Processing Speech and Images
R {maxim.berman, amal.rannen, matthew.blaschko}@esat.kuleuven.be � github.com/bermanmaxim/LovaszSoftmax
The Lovász-Softmax lossA tractable surrogate for the optimization of the
intersection-over-union measure in neural networks
Maxim Berman Amal Rannen Triki Matthew B. BlaschkoKU Leuven – Dept. ESAT, Center for Processing Speech and Images
R {maxim.berman, amal.rannen, matthew.blaschko}@esat.kuleuven.be � github.com/bermanmaxim/LovaszSoftmax
1. Goal: optimize IoU
• Semantic segmentation measure:intersection-over-union (IoU)
• Loss used to train neural networks:cross-entropy loss⇒ Optimize IoU directly?
2. Why IoU?
ground truth y∗ image x prediction y intersection union misclassified Mbird
IoU = intersection area/union area• No bias towards large objects, closer to human perception [1]• Popular accuracy measure (Pascal VOC, Cityscapes. . . )• Multiclass setting: averaged accross classes (mIoU)• Function of the discrete values of all pixels⇒ Optimizing IoU is challenging!
3. Approach
• Jaccard loss for class c ∈ C
∆Jc = 1− IoUc = |Mc||{y∗ = c} ∪Mc|
.
with Mc the misclassified pixels• ∆Jc is submodular:
∆Jc(M) + ∆Jc(N) ≥ ∆Jc(M∪N) + ∆Jc(M∩N)⇒ We can compute the convex surrogate
∆Jc of ∆Jc, the Lovász extension [3].Tight convex relaxation, efficient computa-tion and gradient.
• Optimize ∆Jc(m) withm the vector of errorsat each pixel.
4. Lovász extension
• The Lovász extension of a set function∆: {0, 1}p→ R such that ∆(0) = 0 is
∆: m ∈ Rp 7→p∑i=1
mi gi(m)
with gi(m) = ∆({π1, . . . , πi})−∆({π1, . . . , πi−1}),π being a permutation such that xπ1 ≥ xπ2 . . . ≥ xπp.• O(p log p) algorithm to compute ∆Jc and its gradient.
5. Vector of errors m
• Binary case: hinge loss of each pixel imi = (1− Fi(x) y∗i )+
⇒Lovászhinge [5]
• Multiclass case– map Fi to probabilities fi with Softmax– mi(c) probability of misclassification– average surrogates accross classes
⇒Lovász-Softmax
6. Loss surfaces
Surrogates for the foreground loss ∆J1 for two pixels and two classes
(a) GT = [−1,−1] (b) GT = [−1, 1]
(c) GT = [1,−1] (d) GT = [1, 1]Fig. 1: Lovász hinge as a function of ri = 1− Fi(x) y∗i
(a) GT = [−1,−1] (b) GT = [−1, 1]
(c) GT = [1,−1] (d) GT = [1, 1]Fig. 2: Softmax-Lovász as a function of di = Fi(y∗i )− Fi(1− y∗i )
7. Binary toyexperiment
8. Pascal VOCbinary experiment
Training loss → Cross-entropy Hinge Lovász hingeCross-entropy 6.8 7.0 8.0Hinge 7.8 7.0 7.1Lovász hinge 8.4 7.5 5.4Image–IoU (%) 77.1 75.8 80.5
9. Pascal VOC multiclass exp.
• Network: DeepLab-v2 single-scale [2])
(a) image (b) Cross-entropy (c) ground truth (d) Lovász-Softmax
Fig. 3: Validation mIoU evolution
• Pascal VOC test server mIoU increased from 76.4% to 79.0%
10. Cityscapes & ENet experiment
• Network: ENet [4], designed for speed (77 fps on Titan X)• Fine-tuning with Lovász-Softmax
Class IoU Class iIoU Cat. IoU Cat. iIoUENet (%) 58.3 34.4 80.4 64.0Finetuned (%) 63.1 34.1 83.6 61.1
(a) ENet output (b) ground truth (c) finetunedFig. 4: Finetuning ENet
References
[1] G. Csurka, D. Larlus, F. Perronnin, and F. Meylan. What is a good evaluation measure for semanticsegmentation? In BMVC 2013.
[2] C. Liang-Chieh, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. DeepLab: Semantic imagesegmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI, 2017.
[3] L. Lovász. Submodular functions and convexity. In Mathematical Programming The State of the Art,pages 235–257. Springer, 1983.
[4] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello. ENet: A deep neural network architecture forreal-time semantic segmentation. arXiv:1606.02147, 2016.
[5] J. Yu and M. B. Blaschko. Learning submodular losses with the Lovász hinge. In ICML 2015.