"Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from...

Copyright © 2016 Auviz Systems 1

Semantic Segmentation for Scene Understanding:

Algorithms and Implementations

Nagesh Gupta

May 3, 2016


• Auviz Systems

• Introduction to Semantic Segmentation

• Quick survey of techniques

• Fully Convolutional Network

• Implementation architectures & results

• FPGA & GPU implementations

• References

Topics


• ISV, specializes in implementing & optimizing algorithms on FPGAs

• Offers libraries of different classes of algorithms

• AuvizCV — optimized OpenCV algorithms

• AuvizLA — optimized BLAS

• AuvizDNN — optimized deep neural networks

• Develop Applications in Computer Vision, Linear Algebra, Deep

Learning & Machine Learning

• Available as OpenCL function calls for software users to abstract the

complexity of using an FPGA

• Visit our booth & see Semantic Segmentation running on Xilinx FPGA!

Auviz Systems


Introduction — Image Classification

Computer

Vision Giraffe


Introduction — Semantic Segmentation

Computer

Vision


Object Detection vs. Semantic Segmentation


Applications of Semantic Segmentation

Automotive: Free space detection

Monocular depth estimation

Boundary prediction


A Survey of Different Methods for Semantic

Segmentation

Reference Paper SIFT-Flow pixel

accuracy

C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its

applications 76.7

D. Eigen and R. Fergus. Nonparametric image parsing using adaptive neighbor sets 77.1

H. J. Myeong, Y. Chang, and K. M. Lee. Learning object relationships via graph-based context

model 77.1

P. H. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene parsing” 77.7

C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene

labeling 78.5

J. Tighe and S. Lazebnik, “Finding things: Image parsing with regions and per-exemplar

detectors” 78.6

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic

segmentation” 85.2

Guosheng Lin, Chunhua Shen, Anton van den Hengel, Ian Reid, "Exploring Context with

Deep Structured models for Semantic Segmentation" 88.1


• An input image retains global features and loses the local details as it goes through

convolutions

• A CNN has several sub-sampling layers, which reduce the size of the input image

Classification Networks


• Replacing the fully connected layers in a CNN with convolutions retains a heat-

map

• Use the “heat-map” to segment the original image

• Figure adapted from: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional

networks for semantic segmentation”

From Classification to Semantic Segmentation


• Multiple convolution layers followed by deconvolution layers and a

classifier

• Weights for all layers are learned through training using backpropagation

(gradient descent)

Fully Convolutional Networks (FCN)

Bird

Person

3D

convolution

3D

convolution

3D

convolution Deconvolution

S

o

f

t

m

a

x Sub-

sampling

Sub-

sampling

Sub-

sampling


• High resolution local information is lost due to down-sampling as we go from left

to right

• Skip layers overcome this by combining the global semantic information with

shallow features from layers prior to down-sampling

• Figure adapted from: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional

networks for semantic segmentation”

Skip Layers — Improve Pixel Accuracy


Key parts of an FCN — Convolutions &

De-convolutions


• Results on a Tesla K40c GPU to implement an FCN using Caffe

• FCN created using VGG16 produces the best results for mean IoU, at the

cost of additional latency

Implementation results — GPU

FCN —

AlexNet

FCN —

VGG16

FCN —

GoogLeNet

Mean IoU 39.8 56.0 42.5

Forward

time

50 ms 210 ms 59 ms

Conv layers 8 16 22

Max stride 32 32 32

IoU, Intersection over Union:

Sseg: pixels from segmentation

Shum: pixels from ground truth


• GEMM

• Convolutions and de-convolutions can be mapped into a GEMM kernel [6]

• Requires significant data remapping – more resources and latency

• Re-mapping the data in the host CPU is another easy option using the

OpenCL development environment

• Convolutions

• Implement convolutions & de-convolutions using Convolution kernels

• Some data re-mapping is needed to use the convolution kernel for de-

convolutions

• Possible to achieve higher performance in the FPGA

Implementation Architectures — FPGA


• OpenCL is a simpler and faster way to implement FPGA accelerator

• Xilinx SDAccel tools provide the OpenCL infrastructure

• Altera (Intel) supports OpenCL

• The following infrastructure blocks are needed in addition to the accelerator

• PCIe & DMA

• External Memory Interface

• In a mid-range 28 nm FPGA such as Xilinx Virtex 7 690T, 25-30% is taken up by

infrastructure blocks

• 60-70% of the FPGA is available to implement the accelerator kernel

• Expect to get 1024 – 1536 MACs, running in the frequency range of 200-300 MHz

• A good design can thus achieve 400-600 GOPS

FPGA Accelerator — Resource & Performance

Estimates


Use Model — GPU

Fully

connected

Forward

convolution


Use Model — FPGA

Forward

conv

Fully

connected


• OpenCL is beginning to be the method of choice to implement CNNs [6] [7]

• AuvizDNN is a flexible framework built using OpenCL

FPGA Implementation Using OpenCL

Host C

ode

APIs calls are initiated by Host

Calling APIs with different parameters creates new networks

Recompile on CPU to create new networks

Use model similar to CPU/GPU K

ern

el B

inary

Highly optimized for performance

Supports a wide range of API parameters

FPGA recompilation/timing closure not needed

No FPGA tools expertise

Available for different accelerator boards supported by FPGA vendors


FPGA — Implementation Results

• Semantic segmentation

with 2-21 classes on a

500x500 image

• Network similar to AlexNet

• Results for XC7VX690

device is based on

achieved performance; rest

are projected 0

20

40

60

80

100

120

140

Imag

es/S

eco

nd


GPU FPGA

Mature use model and rich set of libraries

available

Libraries and use model are beginning

to catch up to GPU

Used extensively for training of CNNs Serious contender for deployment in the

data center & embedded applications

Traditionally higher in power Typically lower power draw

Well integrated into most CNN R&D

frameworks such as Caffe

Loosely integrated with Caffe

Entrenched in the research community —

used by most publications & researchers

FPGAs are extensively used in

embedded applications

Implementation Choice: FPGA/GPU


• [1] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image

segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062.

• [2] Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene

labeling. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1915-1929.

• [3] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation.

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3431-3440).

• [4] Badrinarayanan, V., Handa, A., & Cipolla, R. (2015). Segnet: A deep convolutional encoder-decoder

architecture for robust semantic pixel-wise labeling. arXiv preprint arXiv:1505.07293.

• [5] C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications.”

IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 978-994, 2011

• [6] Naveen Suda et. al, “Throughput Optimized OpenCL-based FPGA Accelerator for Large-Scale CNNs”,

ISFPGA 2016

• [7] “Efficient Implementation of Neural Network Systems Built on FPGAs, Programmed with OpenCL”,

Altera White Paper

Reference

https://www.altera.com/en_US/pdfs/literature/solution-sheets/efficient_neural_networks.pdf

"Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from...

Technology

Transcript of "Semantic Segmentation for Scene Understanding: Algorithms and Implementations," a Presentation from...