TheLovász-Softmaxloss2018.ds3-datascience-polytechnique.fr/wp-content/... ·...

1
The Lovász-Softmax loss A tractable surrogate for the optimization of the intersection-over-union measure in neural networks Maxim Berman Amal Rannen Triki Matthew B. Blaschko KU Leuven – Dept. ESAT, Center for Processing Speech and Images {maxim.berman, amal.rannen, matthew.blaschko}@esat.kuleuven.be github.com/bermanmaxim/LovaszSoftmax The Lovász-Softmax loss A tractable surrogate for the optimization of the intersection-over-union measure in neural networks Maxim Berman Amal Rannen Triki Matthew B. Blaschko KU Leuven – Dept. ESAT, Center for Processing Speech and Images {maxim.berman, amal.rannen, matthew.blaschko}@esat.kuleuven.be github.com/bermanmaxim/LovaszSoftmax 1. Goal: optimize IoU Semantic segmentation measure: intersection-over-union (IoU) Loss used to train neural networks: cross-entropy loss Optimize IoU directly? 2. Why IoU? ground truth y * image x prediction ˜ y intersection union misclassified M bird IoU = intersection area/union area No bias towards large objects, closer to human perception [1] Popular accuracy measure (Pascal VOC, Cityscapes. . . ) Multiclass setting: averaged accross classes (mIoU) Function of the discrete values of all pixels Optimizing IoU is challenging! 3. Approach Jaccard loss for class c ∈C Δ J c =1 - IoU c = |M c | |{y * = c}∪ M c | . with M c the misclassified pixels Δ J c is submodular: Δ J c (M)+Δ J c (N) Δ J c (M N)+Δ J c (M N) We can compute the convex surrogate Δ J c of Δ J c , the Lovász extension [3]. Tight convex relaxation, efficient computa- tion and gradient. Optimize Δ J c (m) with m the vector of errors at each pixel. 4. Lovász extension The Lovász extension of a set function Δ: {0, 1} p R such that Δ(0) = 0 is Δ: m R p p X i=1 m i g i (m) with g i (m) = Δ({π 1 ,...,π i }) - Δ({π 1 ,...,π i-1 }), π being a permutation such that x π 1 x π 2 ... x π p . •O (p log p) algorithm to compute Δ J c and its gradient. 5. Vector of errors m Binary case: hinge loss of each pixel i m i = (1 - F i (x) y * i ) + Lovász hinge [5] Multiclass case map F i to probabilities f i with Softmax m i (c) probability of misclassification average surrogates accross classes Lovász- Softmax 6. Loss surfaces Surrogates for the foreground loss Δ J 1 for two pixels and two classes (a) GT = [ -1, -1] (b) GT = [ -1, 1] (c) GT = [1, -1] (d) GT = [1, 1] Fig. 1: Lovász hinge as a function of r i =1 - F i (x) y * i (a) GT = [ -1, -1] (b) GT = [ -1, 1] (c) GT = [1, -1] (d) GT = [1, 1] Fig. 2: Softmax-Lovász as a function of d i = F i (y * i ) - F i (1 - y * i ) 7. Binary toy experiment 8. Pascal VOC binary experiment Training loss Cross-entropy Hinge Lovász hinge Cross-entropy 6.8 7.0 8.0 Hinge 7.8 7.0 7.1 Lovász hinge 8.4 7.5 5.4 Image–IoU (%) 77.1 75.8 80.5 9. Pascal VOC multiclass exp. Network: DeepLab-v2 single-scale [2]) (a) image (b) Cross-entropy (c) ground truth (d) Lovász-Softmax Fig. 3: Validation mIoU evolution Pascal VOC test server mIoU increased from 76.4% to 79.0% 10. Cityscapes & ENet experiment Network: ENet [4], designed for speed (77 fps on Titan X) Fine-tuning with Lovász-Softmax Class IoU Class iIoU Cat. IoU Cat. iIoU ENet (%) 58.3 34.4 80.4 64.0 Finetuned (%) 63.1 34.1 83.6 61.1 (a) ENet output (b) ground truth (c) finetuned Fig. 4: Finetuning ENet References [1] G. Csurka, D. Larlus, F. Perronnin, and F. Meylan. What is a good evaluation measure for semantic segmentation? In BMVC 2013. [2] C. Liang-Chieh, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI, 2017. [3] L. Lovász. Submodular functions and convexity. In Mathematical Programming The State of the Art, pages 235–257. Springer, 1983. [4] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello. ENet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147, 2016. [5] J. Yu and M. B. Blaschko. Learning submodular losses with the Lovász hinge. In ICML 2015.

Transcript of TheLovász-Softmaxloss2018.ds3-datascience-polytechnique.fr/wp-content/... ·...

Page 1: TheLovász-Softmaxloss2018.ds3-datascience-polytechnique.fr/wp-content/... · TheLovász-Softmaxloss Atractablesurrogatefortheoptimizationofthe intersection-over-unionmeasureinneuralnetworks

The Lovász-Softmax lossA tractable surrogate for the optimization of the

intersection-over-union measure in neural networks

Maxim Berman Amal Rannen Triki Matthew B. BlaschkoKU Leuven – Dept. ESAT, Center for Processing Speech and Images

R {maxim.berman, amal.rannen, matthew.blaschko}@esat.kuleuven.be � github.com/bermanmaxim/LovaszSoftmax

The Lovász-Softmax lossA tractable surrogate for the optimization of the

intersection-over-union measure in neural networks

Maxim Berman Amal Rannen Triki Matthew B. BlaschkoKU Leuven – Dept. ESAT, Center for Processing Speech and Images

R {maxim.berman, amal.rannen, matthew.blaschko}@esat.kuleuven.be � github.com/bermanmaxim/LovaszSoftmax

1. Goal: optimize IoU

• Semantic segmentation measure:intersection-over-union (IoU)

• Loss used to train neural networks:cross-entropy loss⇒ Optimize IoU directly?

2. Why IoU?

ground truth y∗ image x prediction y intersection union misclassified Mbird

IoU = intersection area/union area• No bias towards large objects, closer to human perception [1]• Popular accuracy measure (Pascal VOC, Cityscapes. . . )• Multiclass setting: averaged accross classes (mIoU)• Function of the discrete values of all pixels⇒ Optimizing IoU is challenging!

3. Approach

• Jaccard loss for class c ∈ C

∆Jc = 1− IoUc = |Mc||{y∗ = c} ∪Mc|

.

with Mc the misclassified pixels• ∆Jc is submodular:

∆Jc(M) + ∆Jc(N) ≥ ∆Jc(M∪N) + ∆Jc(M∩N)⇒ We can compute the convex surrogate

∆Jc of ∆Jc, the Lovász extension [3].Tight convex relaxation, efficient computa-tion and gradient.

• Optimize ∆Jc(m) withm the vector of errorsat each pixel.

4. Lovász extension

• The Lovász extension of a set function∆: {0, 1}p→ R such that ∆(0) = 0 is

∆: m ∈ Rp 7→p∑i=1

mi gi(m)

with gi(m) = ∆({π1, . . . , πi})−∆({π1, . . . , πi−1}),π being a permutation such that xπ1 ≥ xπ2 . . . ≥ xπp.• O(p log p) algorithm to compute ∆Jc and its gradient.

5. Vector of errors m

• Binary case: hinge loss of each pixel imi = (1− Fi(x) y∗i )+

⇒Lovászhinge [5]

• Multiclass case– map Fi to probabilities fi with Softmax– mi(c) probability of misclassification– average surrogates accross classes

⇒Lovász-Softmax

6. Loss surfaces

Surrogates for the foreground loss ∆J1 for two pixels and two classes

(a) GT = [−1,−1] (b) GT = [−1, 1]

(c) GT = [1,−1] (d) GT = [1, 1]Fig. 1: Lovász hinge as a function of ri = 1− Fi(x) y∗i

(a) GT = [−1,−1] (b) GT = [−1, 1]

(c) GT = [1,−1] (d) GT = [1, 1]Fig. 2: Softmax-Lovász as a function of di = Fi(y∗i )− Fi(1− y∗i )

7. Binary toyexperiment

8. Pascal VOCbinary experiment

Training loss → Cross-entropy Hinge Lovász hingeCross-entropy 6.8 7.0 8.0Hinge 7.8 7.0 7.1Lovász hinge 8.4 7.5 5.4Image–IoU (%) 77.1 75.8 80.5

9. Pascal VOC multiclass exp.

• Network: DeepLab-v2 single-scale [2])

(a) image (b) Cross-entropy (c) ground truth (d) Lovász-Softmax

Fig. 3: Validation mIoU evolution

• Pascal VOC test server mIoU increased from 76.4% to 79.0%

10. Cityscapes & ENet experiment

• Network: ENet [4], designed for speed (77 fps on Titan X)• Fine-tuning with Lovász-Softmax

Class IoU Class iIoU Cat. IoU Cat. iIoUENet (%) 58.3 34.4 80.4 64.0Finetuned (%) 63.1 34.1 83.6 61.1

(a) ENet output (b) ground truth (c) finetunedFig. 4: Finetuning ENet

References

[1] G. Csurka, D. Larlus, F. Perronnin, and F. Meylan. What is a good evaluation measure for semanticsegmentation? In BMVC 2013.

[2] C. Liang-Chieh, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. DeepLab: Semantic imagesegmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI, 2017.

[3] L. Lovász. Submodular functions and convexity. In Mathematical Programming The State of the Art,pages 235–257. Springer, 1983.

[4] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello. ENet: A deep neural network architecture forreal-time semantic segmentation. arXiv:1606.02147, 2016.

[5] J. Yu and M. B. Blaschko. Learning submodular losses with the Lovász hinge. In ICML 2015.