"Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from...

22
Copyright © 2016 Auviz Systems 1 Semantic Segmentation for Scene Understanding: Algorithms and Implementations Nagesh Gupta May 3, 2016

Transcript of "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from...

Page 1: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 1

Semantic Segmentation for Scene Understanding:

Algorithms and Implementations

Nagesh Gupta

May 3, 2016

Page 2: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 2

• Auviz Systems

• Introduction to Semantic Segmentation

• Quick survey of techniques

• Fully Convolutional Network

• Implementation architectures & results

• FPGA & GPU implementations

• References

Topics

Page 3: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 3

• ISV, specializes in implementing & optimizing algorithms on FPGAs

• Offers libraries of different classes of algorithms

• AuvizCV — optimized OpenCV algorithms

• AuvizLA — optimized BLAS

• AuvizDNN — optimized deep neural networks

• Develop Applications in Computer Vision, Linear Algebra, Deep

Learning & Machine Learning

• Available as OpenCL function calls for software users to abstract the

complexity of using an FPGA

• Visit our booth & see Semantic Segmentation running on Xilinx FPGA!

Auviz Systems

Page 4: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 4

Introduction — Image Classification

Computer

Vision Giraffe

Page 5: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 5

Introduction — Semantic Segmentation

Computer

Vision

Page 6: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 6

Object Detection vs. Semantic Segmentation

Page 7: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 7

Applications of Semantic Segmentation

Automotive: Free space detection

Monocular depth estimation

Boundary prediction

Page 8: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 8

A Survey of Different Methods for Semantic

Segmentation

Reference Paper SIFT-Flow pixel

accuracy

C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its

applications 76.7

D. Eigen and R. Fergus. Nonparametric image parsing using adaptive neighbor sets 77.1

H. J. Myeong, Y. Chang, and K. M. Lee. Learning object relationships via graph-based context

model 77.1

P. H. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene parsing” 77.7

C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene

labeling 78.5

J. Tighe and S. Lazebnik, “Finding things: Image parsing with regions and per-exemplar

detectors” 78.6

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic

segmentation” 85.2

Guosheng Lin, Chunhua Shen, Anton van den Hengel, Ian Reid, "Exploring Context with

Deep Structured models for Semantic Segmentation" 88.1

Page 9: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 9

• An input image retains global features and loses the local details as it goes through

convolutions

• A CNN has several sub-sampling layers, which reduce the size of the input image

Classification Networks

Page 10: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 10

• Replacing the fully connected layers in a CNN with convolutions retains a heat-

map

• Use the “heat-map” to segment the original image

• Figure adapted from: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional

networks for semantic segmentation”

From Classification to Semantic Segmentation

Page 11: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 11

• Multiple convolution layers followed by deconvolution layers and a

classifier

• Weights for all layers are learned through training using backpropagation

(gradient descent)

Fully Convolutional Networks (FCN)

Bird

Person

3D

convolution

3D

convolution

3D

convolution Deconvolution

S

o

f

t

m

a

x Sub-

sampling

Sub-

sampling

Sub-

sampling

Page 12: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 12

• High resolution local information is lost due to down-sampling as we go from left

to right

• Skip layers overcome this by combining the global semantic information with

shallow features from layers prior to down-sampling

• Figure adapted from: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional

networks for semantic segmentation”

Skip Layers — Improve Pixel Accuracy

Page 13: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 13

Key parts of an FCN — Convolutions &

De-convolutions

Page 14: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 14

• Results on a Tesla K40c GPU to implement an FCN using Caffe

• FCN created using VGG16 produces the best results for mean IoU, at the

cost of additional latency

Implementation results — GPU

FCN —

AlexNet

FCN —

VGG16

FCN —

GoogLeNet

Mean IoU 39.8 56.0 42.5

Forward

time

50 ms 210 ms 59 ms

Conv layers 8 16 22

Max stride 32 32 32

IoU, Intersection over Union:

Sseg: pixels from segmentation

Shum: pixels from ground truth

Page 15: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 15

• GEMM

• Convolutions and de-convolutions can be mapped into a GEMM kernel [6]

• Requires significant data remapping – more resources and latency

• Re-mapping the data in the host CPU is another easy option using the

OpenCL development environment

• Convolutions

• Implement convolutions & de-convolutions using Convolution kernels

• Some data re-mapping is needed to use the convolution kernel for de-

convolutions

• Possible to achieve higher performance in the FPGA

Implementation Architectures — FPGA

Page 16: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 16

• OpenCL is a simpler and faster way to implement FPGA accelerator

• Xilinx SDAccel tools provide the OpenCL infrastructure

• Altera (Intel) supports OpenCL

• The following infrastructure blocks are needed in addition to the accelerator

• PCIe & DMA

• External Memory Interface

• In a mid-range 28 nm FPGA such as Xilinx Virtex 7 690T, 25-30% is taken up by

infrastructure blocks

• 60-70% of the FPGA is available to implement the accelerator kernel

• Expect to get 1024 – 1536 MACs, running in the frequency range of 200-300 MHz

• A good design can thus achieve 400-600 GOPS

FPGA Accelerator — Resource & Performance

Estimates

Page 17: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 17

Use Model — GPU

Fully

connected

Forward

convolution

Page 18: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 18

Use Model — FPGA

Forward

conv

Fully

connected

Page 19: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 19

• OpenCL is beginning to be the method of choice to implement CNNs [6] [7]

• AuvizDNN is a flexible framework built using OpenCL

FPGA Implementation Using OpenCL

Host C

ode

APIs calls are initiated by Host

Calling APIs with different parameters creates new networks

Recompile on CPU to create new networks

Use model similar to CPU/GPU K

ern

el B

inary

Highly optimized for performance

Supports a wide range of API parameters

FPGA recompilation/timing closure not needed

No FPGA tools expertise

Available for different accelerator boards supported by FPGA vendors

Page 20: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 20

FPGA — Implementation Results

• Semantic segmentation

with 2-21 classes on a

500x500 image

• Network similar to AlexNet

• Results for XC7VX690

device is based on

achieved performance; rest

are projected 0

20

40

60

80

100

120

140

Imag

es/S

eco

nd

Page 21: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 21

GPU FPGA

Mature use model and rich set of libraries

available

Libraries and use model are beginning

to catch up to GPU

Used extensively for training of CNNs Serious contender for deployment in the

data center & embedded applications

Traditionally higher in power Typically lower power draw

Well integrated into most CNN R&D

frameworks such as Caffe

Loosely integrated with Caffe

Entrenched in the research community —

used by most publications & researchers

FPGAs are extensively used in

embedded applications

Implementation Choice: FPGA/GPU

Page 22: "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from Auviz Systems

Copyright © 2016 Auviz Systems 23

• [1] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image

segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062.

• [2] Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene

labeling. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1915-1929.

• [3] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation.

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3431-3440).

• [4] Badrinarayanan, V., Handa, A., & Cipolla, R. (2015). Segnet: A deep convolutional encoder-decoder

architecture for robust semantic pixel-wise labeling. arXiv preprint arXiv:1505.07293.

• [5] C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications.”

IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 978-994, 2011

• [6] Naveen Suda et. al, “Throughput Optimized OpenCL-based FPGA Accelerator for Large-Scale CNNs”,

ISFPGA 2016

• [7] “Efficient Implementation of Neural Network Systems Built on FPGAs, Programmed with OpenCL”,

Altera White Paper

Reference