PROJECT LIVENESS REPORT 01 - PUC RIO

100
Semantic Segmentation R. Q. FEITOSA

Transcript of PROJECT LIVENESS REPORT 01 - PUC RIO

Page 1: PROJECT LIVENESS REPORT 01 - PUC RIO

Semantic Segmentation

R. Q. FEITOSA

Page 2: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skip connections)

• U-Net ++ ( redesigned skip connections & supervision)

• ResUNet

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3 & DeepLabv3+

• ResUNet-a

• Loss Functions

• Practical Remark2

Page 3: PROJECT LIVENESS REPORT 01 - PUC RIO

Main Application Groups

3

Image

ClassificationDetection/

Localization

Semantic

Segmentation

Page 4: PROJECT LIVENESS REPORT 01 - PUC RIO

Semantic vs Instance SegmentationSemantic Segmentation: identifies the object category of each pixel labels are class aware.

Instance Segmentation: identifies each object instance of each pixel labels are instance-aware.

4

Page 5: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skip connections)

• U-Net ++ ( redesigned skip connections & supervision)

• ResUNet

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3 & DeepLabv3+

• ResUNet-a

• Loss Functions

• Practical Remark5

Page 6: PROJECT LIVENESS REPORT 01 - PUC RIO

Metrics for SS

6

𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 =𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒

𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒=

𝑰𝒏𝒕𝒆𝒓𝒔𝒆𝒄𝒕𝒊𝒐𝒏 𝒐𝒗𝒆𝒓 𝑼𝒏𝒊𝒐𝒏 𝑰𝒐𝑼 =

𝑨𝒗𝒆𝒓𝒂𝒈𝒆 𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 𝑨𝑷 = 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑜𝑣𝑒𝑟 𝑎𝑙𝑙 𝑐𝑙𝑎𝑠𝑠𝑒𝑠

false

positive

outcome

reference

false

negative

true

positive

Page 7: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skipe connections)

• U-Net ++ ( redesigned skip connections & supervision)

• ResUNet

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3 & DeepLabv3+

• ResUNet-a

• Loss Functions

• Practical Remark7

Page 8: PROJECT LIVENESS REPORT 01 - PUC RIO

Typical ConvNet for Image Classification

8

fully

connectedconv flatten

scores

Page 9: PROJECT LIVENESS REPORT 01 - PUC RIO

1st idea for SS: sliding window

9

garden

garden

roof

input image crop

patch

classify center

pixel with a CNN

Redundant operations due to overlap. Inefficient!

Page 10: PROJECT LIVENESS REPORT 01 - PUC RIO

Spatial accuracyExample: • The CNN scores high for “car” and

“road“ for both patches on the right.

• In consequence, CNN patch classification oversmooths objects boundaries.

• FCN, on the contrary, learns class specific structures contained in each patch.

10

Page 11: PROJECT LIVENESS REPORT 01 - PUC RIO

A bunch of layers at input resolution to classify all pixels at once.

Convolution at original resolution expensive!

2nd idea for SS: Fully Convolutional (a)

11

scores

Page 12: PROJECT LIVENESS REPORT 01 - PUC RIO

A bunch of layers with downsampling and upsampling inside the network.

3rd idea for SS: Fully Convolutional (b)

12

contract expand

Page 13: PROJECT LIVENESS REPORT 01 - PUC RIO

Upsampling: Unpooling

13

1 1 1

11 2 2

2 2

2

3

3 3

3 4 4

44

43

1 0

0

0

0

0

0

0

0

0 0

0 0

2

43

1 2

43

Nearest Neighbor Bed of Nails

input: 2×2 output: 4×4 input: 2×2 output: 4×4

Page 14: PROJECT LIVENESS REPORT 01 - PUC RIO

Upsampling: Max Unpooling

14

53 5

21 6 3

2 1

2

1

7 3

2 2 1

84

00

0 0

0

1

0

0

0

0

4

0 0

3 0

6

87

1 2

43

Max Poolingmemorizes where the max

came from

input: 4×4 output: 2×2 input: 2×2 output: 4×4

Max Unpoolingputs at the location where

the max came from

corresponding pairs of down- and upsampling layers

Page 15: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skip connections)

• U-Net ++ ( redesigned skip connections & supervision)

• ResUNet

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3 & DeepLabv3+

• ResUNet-a

• Loss Functions

• Practical Remark15

Page 16: PROJECT LIVENESS REPORT 01 - PUC RIO

SegNetSegNet uses Max Pooling indices to upsample:

16Picture from Sik-Ho Tsang available at https://towardsdatascience.com/review-segnet-semantic-segmentation-e66f2e30fb96:.

Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla "SegNet: A Deep Convolutional Encoder-Decoder Architecture for

Image Segmentation." PAMI, 2017. ( arxiv .pdf ) ( poster )

Page 17: PROJECT LIVENESS REPORT 01 - PUC RIO

SegNetSegNet demo: Road Scene Segmentation (1)

17demo by authors at : https://towardsdatascience.com/review-segnet-semantic-segmentation-e66f2e30fb96

Page 18: PROJECT LIVENESS REPORT 01 - PUC RIO

SegNetSegNet demo: Road Scene Segmentation (2)

18Another demo by authors at Youtube: https://www.youtube.com/watch?v=CxanE_W46ts

Page 19: PROJECT LIVENESS REPORT 01 - PUC RIO

SegNet

19

SegNet demo: Random image

Clik on the image to test SegNet on a random image

Page 20: PROJECT LIVENESS REPORT 01 - PUC RIO

3 × 3 transpose convolution, stride 2 pad 1

Filter moves 2 pixels in the output for every pixel in the input.

Stride gives ratio between movement in output and input.

Learnable Upsampling: Transpose Convolution

20

.2 .5

.3.1

input: 2×2 filter 3×3 output: 4×4

1 2 1

2 4 2

1 2 1

.8 .4

.4 .2

Page 21: PROJECT LIVENESS REPORT 01 - PUC RIO

3 × 3 transpose convolution, stride 2 pad 1

Output contains copies of the filter weighted by the input, summing up at overlaps in the output.

Learnable Upsampling: Transpose Convolution

21

.2 .5

.3.1

input: 2×2 filter 3×3 output: 4×4

1 2 1

2 4 2

1 2 1

0.8 .4

.4 .2

1.4 2

.7 1

1

.5

Page 22: PROJECT LIVENESS REPORT 01 - PUC RIO

3 × 3 transpose convolution, stride 2 pad 1

Other names:–deconvolution, –upconvolution, –fractionally strided convolutionl,–backward strided convolution.

Learnable Upsampling: Transpose Convolution

22

.2 .5

.3.1

input: 2×2 filter 3×3 output: 4×4

1 2 1

2 4 2

1 2 10

.8 .4

.6 .1

1.4 2

.8 1

1

.5

.4 .2

.1.2

Page 23: PROJECT LIVENESS REPORT 01 - PUC RIO

3 × 3 transpose convolution, stride 2 pad 1

Other names:–deconvolution, –upconvolution, –fractionally strided convolution,–backward strided convolution.

Learnable Upsampling: Transpose Convolution

23

.2 .5

.3.1

input: 2×2 filter 3×3 output: 4×4

1 2 1

2 4 2

1 2 10

.8 .4

.6 .1

1.4 2

.8 1

.1

.5

.4 .8

.4.2

1.2 .6

.6 .3

Page 24: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skip connections)

• U-Net ++ ( redesigned skip connections & supervision)

• ResUNet

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3 & DeepLabv3+

• ResUNet-a

• Loss Functions

• Practical Remark24

Page 25: PROJECT LIVENESS REPORT 01 - PUC RIO

FCN Example: U-netTo improve spatial accuracy, output of corresponding layer in contractive stage is appended to the inputs of the expansive stage.

25

Picture from: Ronneberger O., Fischer P., Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N.,

Hornegger J., Wells W., Frangi A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015.

Lecture Notes in Computer Science, vol 9351. Springer, Cham.

contractive stage expansive stage

Page 26: PROJECT LIVENESS REPORT 01 - PUC RIO

FCN Example: U-netTo improve spatial accuracy, output of corresponding layer in contractive stage is appended to the inputs of the expansive stage.

26Picture from Ronneberger, O., et al., (2015). "U-Net: Convolutional Networks for Biomedical Image Segmentation". arXiv:1505.04597:.

contractive stage expansive stage

Page 27: PROJECT LIVENESS REPORT 01 - PUC RIO

U-NetU-Net demo: cell tracking

27Full demo available at https://www.youtube.com/watch?v=81AvQQnpG4Q

Page 28: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skip connections)

• U-Net ++ ( redesigned skip connections & supervision)

• ResUNet

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3 & DeepLabv3+

• ResUNet-a

• Loss Functions

• Practical Remark28

Page 29: PROJECT LIVENESS REPORT 01 - PUC RIO

UNet++Architecture

Novelties

29Zhou et al., 2018 - UNet++: A Nested U-Net Architecture for Medical Image Segmentation

convolutions on skip pathways

deep supervision

Page 30: PROJECT LIVENESS REPORT 01 - PUC RIO

UNet++1st Concept: Re-designed Skip Pathways

formally,

xi,j = ቐH xi−1,j , j = 0

H xi,kk=0

j−1, U xi+1,j−1 , j > 0

, where

• H() is a convolution operation followed by an activation function,

• U() denotes an up-sampling layer, and

• [ ] denotes the concatenation layer

30Zhou et al., 2018 - UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Page 31: PROJECT LIVENESS REPORT 01 - PUC RIO

UNet++

31

2nd Concept: Deep Supervision

Accurate mode: average of all branches

Fast mode: just one branch is selected

Zhou et al., 2018 - UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Page 32: PROJECT LIVENESS REPORT 01 - PUC RIO

UNet++Qualitative results

32Zhou et al., 2018 - UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Page 33: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skip connections)

• U-Net ++ ( redesigned skip connections & supervision)

• ResUNet

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3 & DeepLabv3+

• ResUNet-a

• Loss Functions

• Practical Remark33

Page 34: PROJECT LIVENESS REPORT 01 - PUC RIO

ResUNetReplaces UNet neural units by residual blocks to ease training

34

Building blocks of neural networks. (a) Plain neural

unit used in U-Net. (b) Residual unit with identity

mapping used in the proposed ResUnet.

Picture from Zhang et al. (2018) “Road Extraction by Deep Residual U-Net”

ResUNet Architecture

Page 35: PROJECT LIVENESS REPORT 01 - PUC RIO

ResUNet (sample results)

35Source“Zhang et al. (2018) Road Extraction by Deep Residual U-Net”

Example results on the test set of Massachusetts roads data set. (a) Input image. (b) Ground truth. (c) Saito et al.

[5]. (d) U-Net [24]. (e) Proposed ResUnet. Zoomed-in view to see more details.

Page 36: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skipe connections)

• U-Net ++ ( redesigned skip connections & supervision)

• ResUNet

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3 & DeepLabv3+

• ResUNet-a

• Loss Functions

• Practical Remark36

Page 37: PROJECT LIVENESS REPORT 01 - PUC RIO

Importance of Context/Scale

37

What is in the picture?

Page 38: PROJECT LIVENESS REPORT 01 - PUC RIO

Importance of Context/Scale

38

What is in the picture?

Source: http://bestpaintingsforsale.com/painting/lincoln_in_dali_vision-4020.html

Page 39: PROJECT LIVENESS REPORT 01 - PUC RIO

Importance of Context/ScaleLarger filters capture larger context…

but involve more parameters/computation.

39

Page 40: PROJECT LIVENESS REPORT 01 - PUC RIO

Importance of ScaleLarger filters capture larger context…

but involve more parameters/computation

40

3×3 filter

Receptive field = 3×3

Page 41: PROJECT LIVENESS REPORT 01 - PUC RIO

Importance of ScaleLarger filters capture larger context…

but involve more parameters/computation

41

5×5 filter

Receptive field = 5×5

Page 42: PROJECT LIVENESS REPORT 01 - PUC RIO

Importance of ScaleLarger filters capture larger context…

but involve more parameters/computation

42

7×7 filter

Receptive field = 7×7

Page 43: PROJECT LIVENESS REPORT 01 - PUC RIO

Importance of ScaleFilters applied sequentially also capture larger context…

but involve more parameters/computation

43

3×3 filter

Receptive field = 3×3

3×3 filter

Page 44: PROJECT LIVENESS REPORT 01 - PUC RIO

Importance of ScalePooling helps increase the receptive field without implying more parameters (weights) to be learned but loses spatial information (resolution)!

44

2×2

pooling

3×3 filter

Receptive field = 5×5

Page 45: PROJECT LIVENESS REPORT 01 - PUC RIO

Atrous ConvolutionEquivalent to convolving the input 𝒙 with upsampledfilter produced by inserting 𝑟-1 zeros between consecutive filter values along each dimension.

45

𝒚 𝒊 =

𝒌

𝒙 𝒊 + 𝑟𝒌 𝒘 𝒌

Page 46: PROJECT LIVENESS REPORT 01 - PUC RIO

Atrous ConvolutionWithout atrous conv: stride and pooling reduce feature map and loose location/spatial information.

With atrous conv: keep stride constant with larger receptive field without increasing the number of parameters.

It keeps high resolution feature maps in deep blocks.

46Source Chen, et al., 2017 available at https://arxiv.org/pdf/1606.00915.pdf

Page 47: PROJECT LIVENESS REPORT 01 - PUC RIO

Atrous ConvolutionComparison with and without atrous conv.

47Source Chen, et al., 2017 available at https://arxiv.org/pdf/1606.00915.pdf

Page 48: PROJECT LIVENESS REPORT 01 - PUC RIO

Atrous Spatial Pyramid Pooling (ASPP)• Also introduced in DeepLab v2

• Parallel rather than cascade atrous conv captures context at multiple scales.

48Source Chen, et al., 2017 available at https://arxiv.org/pdf/1606.00915.pdf

Page 49: PROJECT LIVENESS REPORT 01 - PUC RIO

DeepLab v1 and v2

49

Processing chain

DeepLab v1 and v2

Chen et al., 2015, DeepLabv1 available at https://arxiv.org/pdf/1412.7062.pdf

Chen et al., 2018, DeepLabv2 available at https://arxiv.org/pdf/1606.00915.pdf

Page 50: PROJECT LIVENESS REPORT 01 - PUC RIO

DeepLab v1 and v2

50

Fully Connected Conditional Random Field– a post processing step, (10 iterations)

– exact solution is intractable, good approximations exist.

– not an end-to-end learning framework.

– not used in DeepLabv3 and DeepLabv3+ .

Chen et al., 2015, DeepLabv1 available at https://arxiv.org/pdf/1412.7062.pdf

Chen et al., 2018, DeepLabv2 available at https://arxiv.org/pdf/1606.00915.pdf

Page 51: PROJECT LIVENESS REPORT 01 - PUC RIO

DeepLab v1 and v2Qualitative results

51Source https://towardsdatascience.com/review-deeplabv1-deeplabv2-atrous-convolution-semantic-segmentation-b51c5fbde92d

Page 52: PROJECT LIVENESS REPORT 01 - PUC RIO

DeepLab v1 and v2Qualitative results

52Source https://towardsdatascience.com/review-deeplabv1-deeplabv2-atrous-convolution-semantic-segmentation-b51c5fbde92d

Page 53: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skipe connections)

• U-Net ++ ( redesigned skip connections & supervision)

• ResUNet

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3 & DeepLabv3+

• ResUNet-a

• Loss Functions

• Practical Remark53

Page 54: PROJECT LIVENESS REPORT 01 - PUC RIO

Global ContextWhat is her pregnancy month?

54

Page 55: PROJECT LIVENESS REPORT 01 - PUC RIO

Global Context

55

A family oriented girl?

Page 56: PROJECT LIVENESS REPORT 01 - PUC RIO

Global Context

56

Color of his shirt?

Page 57: PROJECT LIVENESS REPORT 01 - PUC RIO

Global Context“global context helps clarifying local confusion”.

Recall DeepLab v2:

.

57

aims at alleviating this

problem

Page 58: PROJECT LIVENESS REPORT 01 - PUC RIO

Image Pooling or Image Level Feature Introduced in ParseNet: Looking Wider to See Better

58Source: Liu et al., 2016, ParseNet: Looking Wider to See Better

Page 59: PROJECT LIVENESS REPORT 01 - PUC RIO

Image Pooling or Image Level Feature Introduced in ParseNet: Looking Wider to See Better

59Source: Liu et al., 2016, ParseNet: Looking Wider to See Better

Page 60: PROJECT LIVENESS REPORT 01 - PUC RIO

Pyramid Scene PoolingIntroduced in Pyramid Scene Parsing Network

60Source: Zhao et al., 2017 Pyramid Scene Parsing Network CVPR (2017)

Page 61: PROJECT LIVENESS REPORT 01 - PUC RIO

Pyramid Scene Pooling

61

Baseline: ResNet50-based FCN with dilated network

Source: Zhao et al., 2017 Pyramid Scene Parsing Network CVPR (2017)

PASCAL VOC 2012

Qualitative results

Page 62: PROJECT LIVENESS REPORT 01 - PUC RIO

Pyramid Scene Pooling

62

Baseline: ResNet50-based FCN with dilated network

Source: Zhao et al., 2017 Pyramid Scene Parsing Network CVPR (2017)

Cityscapes

Qualitative results

Page 63: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skipe connections)

• U-Net ++ ( redesigned skip connections & supervision)

• ResUNet

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3 & DeepLabv3+

• ResUNet-a

• Loss Functions

• Practical Remark63

Page 64: PROJECT LIVENESS REPORT 01 - PUC RIO

DeepLab v3• one 1×1 convolution and three 3×3 convolutions with

rates = (6, 12, 18) when output stride = 16.

• Also, image pooling, or image-level feature.

• All with 256 filters and BN.

• Concatenated results pass through a 1x1 conv before final logits.

• No CRF.

64Chen et.al., 2017 DeepLabv3 available at https://arxiv.org/pdf/1706.05587.pdf

Page 65: PROJECT LIVENESS REPORT 01 - PUC RIO

DeepLab v3Demo DeepLabv3: cityscapes

65Available in Youtube at https://www.youtube.com/watch?v=ATlcEDSPWXY

Page 66: PROJECT LIVENESS REPORT 01 - PUC RIO

DeepLabv3+Inherits from DeepLabv2:

• Atrous convolution

• Atrous Spatial Pyramid Pooling (ASPP)

Main novelties :

• Brings back the encoder-decoder structure

• depth wise separable convolution to both ASPP module and decoder module

66Chen et.al., 2018 DeepLabv3+ available at https://arxiv.org/pdf/1802.02611.pdf

Page 67: PROJECT LIVENESS REPORT 01 - PUC RIO

More filters

For more capacity with same memory:

reduce resolution

A Dilemma

67

convolution

+

pooling

consecutive layers

more capacitymore memory.

loose spatial information.

Page 68: PROJECT LIVENESS REPORT 01 - PUC RIO

Back to the encoder-decoderCombines atrous convolution and skip connections to recover object boundaries information lost due to pooling and striding, at a lower computational cost.

68Chen et.al., 2018 DeepLabv3+ available at https://arxiv.org/pdf/1802.02611.pdf

Page 69: PROJECT LIVENESS REPORT 01 - PUC RIO

Encoder-Decoder structure• Encoder encodes multi-scale contextual information

applying atrous convolutions.

• Decoder refines segmentation on object boundaries.

69Chen et.al., 2018 DeepLabv3+ available at https://arxiv.org/pdf/1802.02611.pdf

Page 70: PROJECT LIVENESS REPORT 01 - PUC RIO

Depthwise Separable Convolutions“… a powerful operation to reduce the computation cost and number of parameters while maintaining similar (or slightly better) performance.”

Convolution is broken down into two steps:

– Depth wise convolution: a spatial convolution independently for each input channel

– Pointwise convolution: combine the output from the depth wise convolution.

70Chen et.al., 2018 DeepLabv3+ available at https://arxiv.org/pdf/1802.02611.pdf

Page 71: PROJECT LIVENESS REPORT 01 - PUC RIO

Normal vs Separable ConvolutionNormal Convolution

71

Number of multiplications: 75 x (8x8) = 4,800

Number of parameters: (5x5x3) = 75

Source: https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728

Page 72: PROJECT LIVENESS REPORT 01 - PUC RIO

Normal vs Separable ConvolutionNormal Convolution - with multiple filters

72

Number of multiplications: 75 x (8x8) x 256 = 1,245,952

Number of parameters: (5x5x3) x 256 = 19,200

256 such filters

Source: https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728

Page 73: PROJECT LIVENESS REPORT 01 - PUC RIO

Normal vs Separable ConvolutionSeparable convolution: first, depth wise

73

Number of multiplications: 75 x (8x8) = 4,800

Number of parameters: (5x5x1) x 3 = 75

Source: https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728

Page 74: PROJECT LIVENESS REPORT 01 - PUC RIO

Normal vs Separable ConvolutionSeparable convolution: then, point wise

74

Number of multiplications: 4,800 + (1x1x3) x (8x8) = 74,992

Number of parameters: 75 + (1x1x3) = 78

Source: https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728

Page 75: PROJECT LIVENESS REPORT 01 - PUC RIO

Normal vs Separable ConvolutionSeparable convolution: then, point wise, with multiple filters

75

Number of multiplications: 4,800 + (1x1x3) x (8x8) x 256 = 53,952

Number of parameters: 75 + (1x1x3) x 256 = 843

256 such filters

4%

Source: https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728

Page 76: PROJECT LIVENESS REPORT 01 - PUC RIO

Atrous Separable ConvolutionCombines depthwise and pointwise convolution tocreate Atrous Separable Convolution.

The concept has been used in so called MobileNets*

conceived for mobile and embedded applications.

76Picture source: https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728

*Howard, et al., 2017 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision

Applications, available at https://arxiv.org/pdf/1704.04861.pdf

Page 77: PROJECT LIVENESS REPORT 01 - PUC RIO

DeepLabV3+DeepLabV3+ demo: detection of weed in farm field

77

From Youtube : https://www.youtube.com/watch?v=FbY1zvMBt-s

Page 78: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skipe connections)

• U-Net ++ ( redesigned skip connections & supervision)

• ResUNet

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3 & DeepLabv3+

• ResUNet-a

• Loss Functions

• Practical Remark78

Page 79: PROJECT LIVENESS REPORT 01 - PUC RIO

ResUNet-a1. UNet backbone, i.e., the

encoder-decoder paradigm.

2. UNet building blocks replaced with modified residual blocks of convolutional layers

79Picture from Diakogiannis et al. (2020) – ”Unet-a: A Deep learning framework for semantic segmentation of remotely sensed data”

Page 80: PROJECT LIVENESS REPORT 01 - PUC RIO

ResUNet-a3. Multiple parallel atrous

convolutions with different dilation rates.

4. PSP for including context information in two locations.

80Picture from Diakogiannis et al. (2020) – ”Unet-a: A Deep learning framework for semantic segmentation of remotely sensed data”

Page 81: PROJECT LIVENESS REPORT 01 - PUC RIO

ResUNet-a5. Multitask learning

a) Segmentation mask

b) Boundary mask

c) Distance transform

d) Color image in HSV color space

81Picture from Diakogiannis et al. (2020) – ”Unet-a: A Deep learning framework for semantic segmentation of remotely sensed data”

Page 82: PROJECT LIVENESS REPORT 01 - PUC RIO

ResUNet-a5. Multitask learning

a) Segmentation mask

b) Boundary mask

c) Distance transform

d) Color image in HSV color space

82Source Diakogiannis et al. (2020) – ”Unet-a: A Deep learning framework for semantic segmentation of remotely sensed data”

Tasks b), c) and d)

• Reference for them is obtained using common CV library, no need for additional annotation.

• “helps to keep “alive” the information of the fine details of the original image to its full extent until the final output layer”.

• The overall loss function is a mixture of task-specific losses.

Page 83: PROJECT LIVENESS REPORT 01 - PUC RIO

ResUNet-a

Sample Results

83

Segmentation

Boundary

Distance map

Image at HSV

input

image

reconstructed

imagedifference

input

imageground truth prediction

Picture from Diakogiannis et al. (2020) – ”Unet-a: A Deep learning framework for semantic segmentation of remotely sensed data”

Page 84: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skipe connections)

• U-Net ++ ( redesigned skip connections & supervision)

• ResUNet

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3 & DeepLabv3+

• ResUNet-a

• Loss Functions

• Practical Remark84

Page 85: PROJECT LIVENESS REPORT 01 - PUC RIO

FCN – class unbalance

When the instances' sizes are very different among the classes we have unbalanced training sets.

Sample replication is not enough.

85

Page 86: PROJECT LIVENESS REPORT 01 - PUC RIO

Cross Entropy

where

– 𝐶𝐸 is the cross-entropy

Problem:

– Majority classes dominate the computation of CE

– Minority classes tend to vanish in the inference.

86

probability of the true class at site i (𝑦𝑖)

number of samples being classified

𝐶𝐸 = −1

𝑁

𝑖

log 𝑝𝑖

Page 87: PROJECT LIVENESS REPORT 01 - PUC RIO

Balanced Cross EntropyThe loss function is weighted to compensate for unbalanced training data.

where

– 𝐵𝐶𝐸 is the balanced cross-entropy

– 𝜔𝑦𝑖 is the weight for the true class (𝑦𝑖) ; larger for less

abundant classes

– In binary classification 𝜔𝑦𝑖 tunes the tradeoff between

precision and recall.

87

𝐵𝐶𝐸 = −1

𝑁

𝑖

𝜔𝑦𝑖 log 𝑝𝑖

probability of the true class at site i (𝑦𝑖)

precomputed weights

Page 88: PROJECT LIVENESS REPORT 01 - PUC RIO

Balanced Cross EntropyThe loss function is weighted to compensate for unbalanced training data.

where

– 𝐵𝐶𝐸 is the balanced cross-entropy

– 𝜔𝑦𝑖 is the weight for the true class (𝑦𝑖) ; larger for less

abundant classes

– In binary classification 𝜔𝑦𝑖 (𝜔𝑝𝑜𝑠/𝜔𝑛𝑒𝑔) tunes the tradeoff

between precision and recall.

88

𝐵𝐶𝐸 = −1

𝑁

𝑦𝑖=𝑝𝑜𝑠

𝜔𝑝𝑜𝑠 log 𝑝𝑖 +

𝑦𝑖=𝑛𝑒𝑔

𝜔𝑛𝑒𝑔 log 𝑝𝑖

Page 89: PROJECT LIVENESS REPORT 01 - PUC RIO

Focal LossMotivation: What samples are most determinant of the decision boundary?

89

u1

u2

+ ++

++

++

++

+ +

++

+

++

+ ++

+

+

++

++

+

+

+

+

+++ +

+

++ +

- -- -

-

-- --

- ---

-

----

---

-- -

-

-

• The ones close to the

boundary,

• The ones that are difficult to

classify,

• The ones with low probability

of belonging to the true class.

--

- -- --

--

--

-

-

• However, easy to classify samples still add to the

loss, especially if they are from majority classes.

Page 90: PROJECT LIVENESS REPORT 01 - PUC RIO

Focal LossSolution: to down weight the loss assigned to well, easy to classify samples.

90

log𝑝𝑖

= 𝑝𝑖

𝐶𝐸 = −1

𝑁

𝑖

log 𝑝𝑖

Source : Lin et all. (2018) “Focal Loss for Dense Object Detection”

Page 91: PROJECT LIVENESS REPORT 01 - PUC RIO

Focal LossSolution: down weights the loss assigned to well, easy to classify samples.

91Source : Lin et all. (2018) “Focal Loss for Dense Object Detection”

𝐶𝐸 = −1

𝑁

𝑖

log 𝑝𝑖

𝐹𝐿 = −1

𝑁

𝑖

1 − 𝑝𝑖𝛾 log 𝑝𝑖

1−𝑝𝑖𝛾log𝑝𝑖 focusing

parameter

= 𝑝𝑖

Page 92: PROJECT LIVENESS REPORT 01 - PUC RIO

Balanced Focal LossImproved accuracies may be achieved by combining the two previous approaches:

Problem: one additional parameter to estimate.

92

𝐹𝐿 = −1

𝑁

𝑖

𝜔𝑦𝑖 1 − 𝑝𝑖𝛾 log 𝑝𝑖

probability of the true class at site i (𝑦𝑖)

precomputed weights

focusing parameter

Source : Lin et all. (2018) “Focal Loss for Dense Object Detection”

Page 93: PROJECT LIVENESS REPORT 01 - PUC RIO

Dice LossOriginally conceived for binary classification:

where

• 𝑝𝑖 and 𝑟𝑖 represent pairs of corresponding prediction and ground truth,

• In boundary detection 𝑝𝑖 and 𝑟𝑖 are either 0 or 1,

• 𝜖 is some small number to ensure stability, and

• 𝑁 is the number of pixels.

• There are other formulations!93Milleri et all. (2016) “V-Net: Fully Convolutional Neural Network for Volumetric Medical Image Segmentation”

𝐷𝐿 = 1 −2σ𝑖

𝑁 𝑝𝑖𝑔𝑖 + 𝜖

σ𝑖𝑁 𝑝𝑖

2 + σ𝑖𝑁 𝑔𝑖

2 + 𝜖Sum of total boundary pixels of

both prediction and ground truth

Sum of correctly predicted pixels

2 ∗

+= 1 −

Page 94: PROJECT LIVENESS REPORT 01 - PUC RIO

Loss Functions for RegressionWhen output is non-categorical (real values), the 𝐿2 and 𝐿1 norms are a common choice:

where

• 𝑦𝑖 is the value computed by the model

• ො𝑦𝑖 is the reference value

• 𝑁 is the number of samples.

94

𝐿2 =1

𝑁

𝑖=1

𝑁

𝑦𝑖 − ො𝑦𝑖2 𝐿1 =

1

𝑁

𝑖=1

𝑁

𝑦𝑖 − ො𝑦𝑖

Page 95: PROJECT LIVENESS REPORT 01 - PUC RIO

More on Loss-functions

95Jorgan, S., 2020, A survey of loss functions for semantic segmentation

Page 96: PROJECT LIVENESS REPORT 01 - PUC RIO

Overview• Introduction

• Metrics

• Fully Convolutional Networks

• SegNet (pool indices)

• U-Net (skip connections)

• DeepLabv1 & DeepLabv2 (Atrous conv. & ASPP)

• PARSENet & PSPNet (Image Pooling – Parse Pooling)

• DeepLabv3

• DeepLabv3+ (Encode-Decode + ASPP)

• ResUNet

• ResUNet-a

• Loss Functions

• Practical Remark96

Page 97: PROJECT LIVENESS REPORT 01 - PUC RIO

Patch-wise training and inference

97

In practice large images are processed in tiles

• Final result is the mosaic of patches’ results.

• Less spatial context at patches’ borders.

• Use overlapping patches and

• Consider the inner part,

• Average the results.

Page 98: PROJECT LIVENESS REPORT 01 - PUC RIO

Spatial accuracyExample: • The CNN scores high for “car” and

“road“ for both patches on the right.

• In consequence, CNN patch classification over-smooths objects boundaries.

• FCN, on the contrary, learns class specific structures contained in each patch.

98

Page 99: PROJECT LIVENESS REPORT 01 - PUC RIO

CNN patch wise vs. FCNFCN

99

Input DSM Ground Random CNN FCN-1 FCN-2

Truth Forest patch wise

From: Volpi, M. and Tuia, D., 2017. Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks,

IEEE Transactions on Geoscience and Remote Sensing , Vol. 55(2), pp. 881-893.

Page 100: PROJECT LIVENESS REPORT 01 - PUC RIO

Semantic SegmentationThanks for your attention

R. Q. FEITOSA