Semantic segmentation review - Delta Course · Software and workloads used in performance tests may...
Transcript of Semantic segmentation review - Delta Course · Software and workloads used in performance tests may...
![Page 1: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/1.jpg)
Semantic segmentation review
Sidnev A., Korolev I., Sidnev D., Druzhkov P., Nosov S.
05/25/2018
{alexey.sidnev, ivan.korolev, dmitry.sidnev, pavel.druzhkov, sergei.nosov}@intel.com
![Page 2: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/2.jpg)
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com.Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.*Other names and brands may be claimed as the property of others.© 2018 Intel Corporation.
Legal information
![Page 3: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/3.jpg)
Agenda1. Classification / Object detection / Instance segmentation / Semantic segmentation2. Semantic segmentation datasets and evaluation metrics3. Architectures
1. FCN2. CRF / DeepLab v1 / DeepLab v23. Parsenet4. U-Net5. SegNet6. ENet7. PSPNet8. ICNet9. DeepLab v3 / DeepLab v3+
![Page 4: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/4.jpg)
Image recognition (ImageNet ILSVRC)
1000 categories, 1.2M train images, 100K test images
Top
5 cl
assi
ficat
ion
erro
r
AlexNet
GoogLeNetResNet
Trimps-Soushen
Squeeze-and-Excitation
Zeiler (Clarifai)
SIFT + LBP + SVM
Karpathy
Chimes are hard
![Page 5: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/5.jpg)
Image recognition: What do you see?
![Page 6: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/6.jpg)
Image recognition: Annotation
Pajama!!!Spatula!
![Page 7: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/7.jpg)
Object detection (MS COCO)
Faster R-CNN
Ensemble of Faster R-CNN
FPN, GCN, Supervision
80 categories, 200K train images, 80K test images
AP
at I
oU=.
50:.0
5:.9
5
![Page 8: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/8.jpg)
Instance segmentation (MS COCO)
MNC
FCIS
PANet
80 categories, 200K train images, 80K test images
AP
at I
oU=.
50:.0
5:.9
5
![Page 9: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/9.jpg)
Semantic segmentation (Cityscapes)PSPNet: Pyramid Scene Parsing Network
Semantic ≠ Instance
![Page 10: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/10.jpg)
Datasets (semantic segmentation)General:
● Pascal VOC 2012 - 11K images, 20 classes, 7K instances● ADE20K / SceneParse150K - 22K images, 2 693 classes, 434K instances● MS COCO - 200K images, 80 classes, instance segmentation● DAVIS 2017 - video (review)
ADAS:● Cityscapes - 25K images, 30 classes, 65K instances● Mapillary Vistas - 20K images, 100 classes● CamVid - 10 min video, 32 classes● KITTI road/lane - 289 images● CMP Facades - 606 images, 12 classes
Aerial / Satellite:● CITY-OSM - ISPRS Vaihingen and Potsdam● DSTL Kaggle
Human parsing:● LIP (dataset) - 50K images, 19 classes● MHP - 20K images
More datasets: http://riemenschneider.hayko.at/vision/dataset/
http://on-demand.gputechconf.com/gtc-il/2017/presentation/sil7145-eyal-gruss%20a-review-of-semantic-segmentation-with-deep-learning.pdf
![Page 11: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/11.jpg)
Datasets (papers)
https://github.com/nightrome/really-awesome-semantic-segmentation
VOC 2012
![Page 12: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/12.jpg)
Evaluation metrics● Pixel accuracy (dominated by background class)● Mean accuracy over classes● Jaccard index = Intersection over Union (IoU) = (GT ∩ Pred) / (GT U Pred)
○ = TP / (TP + FN + FP)○ Usually: mean over classes on the whole dataset○ Can be weighted by inverse instance size (Cityscapes, important in traffic use cases)
● Dice index = F1 score = 2(GT ∩ Pred) / (GT + Pred)○ = 2TP / (2TP + FN + FP)○ = 2IoU / (1 + IoU)
● [Adjusted] Rand Index (RI) / Rand Error (RE)○○ a - the # of pairs that have the same labels in both prediction and GT○ b - the # of pairs that have the different labels in both prediction and GT
C. Lawrence Zitnick, P. Dollár. Edge Boxes : Locating Object Proposals from Edges. 2014
FN
FPTP
Ground Truth
Predictionhttp://on-demand.gputechconf.com/gtc-il/2017/presentation/sil7145-eyal-gruss%20a-review-of-semantic-segmentation-with-deep-learning.pdf
![Page 13: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/13.jpg)
Semantic segmentation architecturesA
ccur
acy
(City
scap
es),
mIo
U%
Time (1024×2048), Frames/Second
DeepLab v3³
Parsenet²
Hardware: NVidia Titan X
UNet¹ - 512×1024 frames (hardware?) (https://arxiv.org/pdf/1803.02758.pdf)
Parsenet² - no FPS results (http://ais.informatik.uni-freiburg.de/publications/papers/valada17icra.pdf)
DeepLab v3³ - no FPS results
Real-time
2014
2014
2016
2017
2015
2015
2015
2016
2017
2016
https://arxiv.org/abs/1704.08545
![Page 14: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/14.jpg)
FCN: Fully Convolutional Networks for Semantic Segmentation
● The fully connected layers can also be viewed as convolutions with kernels that cover their entire input regions
● The spatial output maps of these convolutionalized models make them a natural choice for dense problems like semantic segmentation.
Fig.1. Classification CNN
Fig.2. FCN
https://arxiv.org/abs/1411.4038
![Page 15: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/15.jpg)
FCN: ArchitectureCombines coarse, high layer information with fine, low layer information
Fig.1. FCN-32s
Fig.3. FCN-16s and FCN-8s
Fig.2. FCN results
https://arxiv.org/abs/1411.4038
![Page 16: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/16.jpg)
CRF: Conditional Random Field
● FCNs classify each pixel in segmentation map independently.● Probabilistic graphical models, such as Conditional Random Fields (CRFs) have
been used extensively in prior literature to predict structures and incorporate prior knowledge.
Coarse output from pixel-wise classifier
Output after CRF inference
CRF modeling
https://arxiv.org/abs/1210.5644
![Page 17: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/17.jpg)
CRF: Conditional Random Field
● Define a discrete random variable, Xi, for each pixel i.● Each Xi takes a value from the label set L.● The random variables are connected to form a random field. The most
probable assignment, conditioned on the image, is our semantic segmentation result.
https://arxiv.org/abs/1210.5644
![Page 18: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/18.jpg)
DeepLab v1 / DeepLab v2
Acc
urac
y (C
itysc
apes
), m
IoU
%
Time (1024×2048), Frames/Second
DeepLab v3³
Parsenet²
Real-time
2014
2014
2016
2017
2015
2015
2015
2016
2017
2016
![Page 19: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/19.jpg)
DeepLab v1Contributions:
● Brings together DL methods and probabilistic graphical models.
● First to apply dilated/atrous convolutions in deep learning.
● No decoder -- CRF as a refinement model.
● Sets SOTA on Pascal VOC(71.6 mIOU test).
Fig.1. Dilated convolutions.
Fig.3. Results.
Fig.2. Pipeline.
https://arxiv.org/abs/1412.7062
![Page 20: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/20.jpg)
DeepLab v2Contributions:
● Replace VGG with ResNet.● Propose atrous spatial pyramid pooling
(ASPP) to robustly do multiscale segmentation.
● Provide SOTA results on Pascal VOC (79.7 mIOU test with bells and whistles) and Citiscapes.
DeepLab v1 DeepLab v2: ASPP
ASPP kernelshttps://arxiv.org/abs/1606.00915
![Page 21: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/21.jpg)
Acc
urac
y (C
itysc
apes
), m
IoU
%
Time (1024×2048), Frames/Second
DeepLab v3³
Parsenet²
Real-time
2014
2014
2016
2017
2015
2015
2015
2016
2017
2016
https://arxiv.org/abs/1506.04579
Parsenet: Looking wider to see better
![Page 23: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/23.jpg)
To evaluate “empirical” receptive field of a neuron, authors propose the following procedure:
● walk through image with a sliding window of small random noise,
● if the activation doesn’t change beyond a certain threshold - it means the window is outside the “empirical” RF.
Experiment results show that the network tends to simply “learn patches” (but not context).
(a) Original image (b) Activation map (c) Theoretical Receptive Field
(d) Empirical Receptive Field
Parsenet: Theoretical vs Empirical receptive field
https://arxiv.org/abs/1506.04579
![Page 24: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/24.jpg)
1. Global average pooling from the last feature map (pooling from other layers is possible, if necessary).
2. L2 Normalization normalize each individual feature first, and also learn to scale each differently, it makes the training more stable and improves performance.
3. UnPooling - replication of the feature vector until it has the corresponding size.
Parsenet: Architecture
https://arxiv.org/abs/1506.04579
![Page 25: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/25.jpg)
model \ accuracymean IoU
w/o Norm w/ Norm
FCN-32s 36.6 36.2
FCN-32s + global context 38.2 37.6
FCN-16s + global context 39.5 39.9
FCN-8s + global context 36.5 40.2
FCN-4s + global context 0.009 40.4
Results on PASCAL-Context
Parsenet: Combining Local and Global features
https://arxiv.org/abs/1506.04579
Features from 4 different layers have different scales: conv4, conv5, fc7, pool6
scal
e
channels
![Page 26: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/26.jpg)
Acc
urac
y (C
itysc
apes
), m
IoU
%
Time (1024×2048), Frames/Second
DeepLab v3³
Parsenet²
Real-time
2014
2014
2016
2017
2015
2015
2015
2016
2017
2016
https://arxiv.org/abs/1505.04597
U-Net: Convolutional Networks for Biomedical Image Segmentation
![Page 27: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/27.jpg)
Drosophila first instar larva ventral nerve cord Ground truth segmentation
ISBI Challenge: Segmentation of neuronal structures in EM stackshttps://arxiv.org/abs/1505.04597
U-Net: Biomedical Image Segmentation
![Page 28: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/28.jpg)
● U-Net● Fully convolutional
network● Encoder topology with
skip connections
U-Net: Architecture
https://arxiv.org/abs/1505.04597
![Page 29: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/29.jpg)
U-Net: Training
Loss: per-pixel softmax + cross-entropy with weighting (compensate class frequency and emphasize edges)
https://arxiv.org/abs/1505.04597
![Page 30: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/30.jpg)
U-Net: Augmentations
● General: smooth deformations. ● Additional: shifting, rotating,
gray value variations.
Example of smooth deformation
https://arxiv.org/abs/1505.04597 https://www.slideshare.net/Eduardyantov/ultrasound-segmentation-kaggle-review
![Page 31: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/31.jpg)
Acc
urac
y (C
itysc
apes
), m
IoU
%
Time (1024×2048), Frames/Second
DeepLab v3³
Parsenet²
Real-time
2014
2014
2016
2017
2015
2015
2015
2016
2017
2016
SegNet: A Deep Convolutional Encoder-Decoder
https://arxiv.org/abs/1511.00561
![Page 32: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/32.jpg)
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
https://arxiv.org/abs/1511.00561
![Page 34: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/34.jpg)
Acc
urac
y (C
itysc
apes
), m
IoU
%
Time (1024×2048), Frames/Second
DeepLab v3³
Parsenet²
Real-time
2014
2014
2016
2017
2015
2015
2015
2016
2017
2016
ENet: Real-Time Semantic Segmentation
https://arxiv.org/abs/1511.00561
![Page 35: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/35.jpg)
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
Initial block
Bottleneck
https://arxiv.org/abs/1606.02147E
ncod
erD
ecod
er
![Page 36: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/36.jpg)
Acc
urac
y (C
itysc
apes
), m
IoU
%
Time (1024×2048), Frames/Second
DeepLab v3³
Parsenet²
Real-time
2014
2014
2016
2017
2015
2015
2015
2016
2017
2016
PSPNet: Pyramid Scene Parsing Network
https://arxiv.org/abs/1612.01105
![Page 38: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/38.jpg)
PSPNet: Architecture
1. Pooling - AVE2. Dimension reduction after pooling3. Auxiliary loss4. Multi-scale testing
https://arxiv.org/abs/1612.01105
×0.4
![Page 39: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/39.jpg)
Acc
urac
y (C
itysc
apes
), m
IoU
%
Time (1024×2048), Frames/Second
DeepLab v3³
Parsenet²
Real-time
2014
2014
2016
2017
2015
2015
2015
2016
2017
2016
ICNet for Real-Time Semantic Segmentation
https://arxiv.org/abs/1704.08545
![Page 40: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/40.jpg)
ICNet: Intuitive Speedup1. Downsampling input 2. Downsampling features
3. Model compression
https://arxiv.org/abs/1704.08545
![Page 41: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/41.jpg)
ICNet: Architecture
x 0.4
Cascade Feature Fusion (CFF) module
https://arxiv.org/abs/1704.08545
![Page 42: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/42.jpg)
ICNet: Results
PSPNet50 with 0.5
compression
5x+ speedup of inference, reduces memory consumption by 5+ times.
30.3 FPS at resolution 1024×2048.
https://arxiv.org/abs/1704.08545
![Page 43: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/43.jpg)
Acc
urac
y (C
itysc
apes
), m
IoU
%
Time (1024×2048), Frames/Second
DeepLab v3³
Parsenet²
Real-time
2014
2014
2016
2017
2015
2015
2015
2016
2017
2016
DeepLab v3 & DeepLab v3+No CRF postprocessing
this time!
- 81.2
- 81.3
- 82.12018
2017
2016
https://arxiv.org/abs/1706.05587
![Page 44: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/44.jpg)
DeepLab v3: Rethinking Atrous Convolution
https://arxiv.org/abs/1706.05587https://www.nature.com/articles/s41598-018-24304-3
65x65 feature map
![Page 45: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/45.jpg)
DeepLab v3: Architecture (ResNet-based)
https://arxiv.org/abs/1706.05587
● Multi-grid method: Block4 has three 3x3 convolutions with rates (2, 4, 8).● Augment ASPP with global context and batch normalization.● Upsample logits but not GT.● Use lower output resolution but larger batches on early stages, that
freeze BN and reduce batch size.
![Page 46: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/46.jpg)
DeepLab v3+: Architecture (Xception-based)
https://arxiv.org/abs/1802.02611
![Page 47: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/47.jpg)
Supplemental materials
![Page 48: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/48.jpg)
CRF: Conditional Random Field
Let be an array of image pixels’ color vectors.Let be an array of pixels’ labels.Conditional random field (I,X) is characterized by a Gibbs distribution:
where G is a graph on X, is a set of cliques in that graph. Each clique induces a potential .Then MAP labeling is
![Page 49: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/49.jpg)
CRF: Conditional Random Field
In the fully connected pairwise CRF model G is the full graph on X and is the set of all unary and pairwise cliques.
is the output of a pixel classifier. When classifier is applied to different pixels independently, MAP labeling produced by this term alone is noisy and inconsistent.
![Page 50: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/50.jpg)
CRF: Conditional Random Field
where k is a Gaussian kernel, f is a feature vector describing corresponding pixel.Contrast-sensitive two-kernel potentials that depends on pixel intensities (I) and positions (p) is used:
First term is appearance kernel (nearby pixels with same color are likely to be in the same class), second -- smoothness (penalizes small isolated regions).
penalizes for nearby similar pixels that are assigned different labels.
![Page 51: Semantic segmentation review - Delta Course · Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance](https://reader036.fdocuments.in/reader036/viewer/2022070720/5edfada4ad6a402d666b021a/html5/thumbnails/51.jpg)
Model
NVIDIA TX1
480x320 640x360 1280x720
ms fps ms fps ms fps
SegNet 757 1.3 1251 0.8 - -
ENet 47 21.1 69 14.6 262 3.8
NVIDIA Titan X
640x360 1280x720 1920x1080
ms fps ms fps ms fps
69 14.6 289 3.5 637 1.6
7 135.4 21 46.8 46 21.6
Model GFLOPs Parameters Model size (fp16)
SegNet 286.03 29.46M 56.2 Mb
ENet 3.83 0.37M 0.7 Mb
FLOPs are estimated for an input of 3x640x360
Model Class IoU
Class iIoU
Cat. IoU
Cat. iIoU
SegNet 56.1 34.2 79.8 66.4
ENet 58.3 34.4 80.4 64.0
Cityscapes test set result
Performance comparison
https://arxiv.org/pdf/1606.02147.pdf
ENet: Performance Analysis