Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for...

Sachin Mehta

Joint work with Mohammad Rastegari, Linda Shapiro, and Hannaneh Hajishirzi

Efficient Machine Learning for Visual and Textual Data

https://github.com/sacmehta/

https://sacmehta.github.io/

https://h2lab.cs.washington.edu/

IntroductionAccuracy improves

Speed reduces

Energy consumption increases

8 Layers 22 Layers 151 Layers

Network Depth

Introduction

AlexNet GoogLeNet Human ResNet

ImageNet Classification

20142012 2016

8 Layers 22 Layers 151 Layers

Accuracy improves

Speed reduces

Energy consumption increases

These models cannot be used on Resource-Constrained Devices

Resource-constrained devices

Embedded Devices Mobile phones

Limited energy

overhead

Restrictive memory

constraints

Limited compute

My Research

Light-weight, Low latency, and SOTA Neural Networks

My Research

Different Tasks

Object Detection Semantic Segmentation Language Modeling

ESPNets: ECCV’18, CVPR’19PRU: EMNLP’18

My Research

Different Tasks

• Object Detection• Semantic Segmentation• Language Modeling• …..

Medical Imaging• JAMA Network Open’19• MICCAI’18 • WACV’18

Different Modalities

Natural Images Whole Slide Images 3D MRIsSketches (Iconary@AI2)

My Research

Different Tasks

Different Modalities• Natural Images• Medical Images• Text• …..

Different Devices

NVIDIA TX2 Mobile Phone

My Research

Different Tasks

Different Modalities• Natural Images• Medical Images• Text• …..

Different Devices• Desktop• Embedded Devices• Mobile Devices• …..

Faster Training Ours: 1 - 1.5 days

SOTA: 4 - 5 days

Outline

• Brief overview about convolutions

• Image Classification• VGG Unit

• ResNet Unit

• ESPNet Unit

• Semantic Segmentation• Vanilla Encoder-Decoder

• U-Net Encoder-Decoder

• Results

Convolutions • A widely used operation in computer vision

• Traditionally, kernels are manually designed• Sobel filter for edge detection• Gaussian filter

• Nowadays, we learn kernels from the data• Convolutional neural networks (CNNs)

• A 𝑛 × 𝑛 kernel• Has 𝑛2 parameters• Performs 𝑛2𝑊𝐻 multiplication-addition

operations

Convolutions• Higher receptive field

• Inserts zeros between kernel elements

• A 𝑛 × 𝑛 kernel with a dilation rate of 𝑟• Has a receptive field of

𝑛 − 1 𝑟 + 1 2

• Learns 𝑛2 parameters

• Performs 𝑛2𝑊𝐻 multiplication-addition operations

Convolution in 3D

• Both Input and Kernel are 3D

• A 𝑛 × 𝑛 × 𝐷 kernel • Learns 𝑛2𝐷 parameters

• Performs 𝑛2𝐷𝑊𝐻 operations to produce an output plane

Convolution in 3D

• Both Input and Kernel are 3D

• A 𝑛 × 𝑛 × 𝐷 kernel • Learns 𝑛2𝐷 parameters

• Performs n2D𝑊𝐻 operations to produce an output plane

• Multiple independent kernels are applied to produce high-dimensional output

• 𝐾 𝑛 × 𝑛 × 𝐷 kernels• Learns 𝑛2𝐷𝐾 parameters

• Performs 𝑛2𝑊𝐻𝐷𝐾 operations

Depth-wise Convolution

• Improves the efficiency of standard convolutions

• Each convolutional filter is applied per spatial plane

• 𝐷 𝑛 × 𝑛 kernels• Learns 𝑛2𝐷 parameters

• Performs 𝑛2𝑊𝐻𝐷 operations

Image Classification

Image ClassificationCU

DU GAP

FCConvolutional

Down-sampling Unit

Fully-connectedOr Linear Layer

Global Avg.Pooling

28 x 28 = [28]2

CU DU CU

[28]2 [14]2 [14]2

[7]2 [7]2

GAP FC

DU GAP

FCConvolutional

Down-sampling Unit

Fully-connectedOr Linear Layer

Global Avg.Pooling

Convolutional Unit (CU) - VGG

3x3 Conv Layer

VGG: Simonyan, Karen, and Andrew Zisserman. "Verydeep convolutional networks for large-scale imagerecognition." ICLR, 2015.

Convolutional Unit (CU) - VGG

3x3 Conv Layer

Image Source (Inception): Szegedy, Christian, et al. "Rethinkingthe inception architecture for computer vision." CVPR. 2016.

Convolutional Unit (CU) - ResNet

3x3 Conv Layer

ResNet: He, Kaiming, et al. "Deep residual learning forimage recognition." CVPR. 2016.

3x3 Conv Layer

• Element-wise addition of input and output

• Often referred as Residual Connection

• Improves gradient flow and accuracy

• Computationally expensive• Hard to train very deep networks (101-

151 layers)

3x3 Conv Layer

1x1 Conv Layer

• Bottleneck unit

• Exercise?• Validate this unit is efficient

1x1 Conv Layer

3x3 Depth-Conv Layer

1x1 Conv Layer

• Bottleneck unit with Depth-wise convs• MobileNetv2• ShuffleNetv2

1x1 Conv Layer

• MobileNetv2: Sandler, Mark, et al. "Mobilenetv2: Invertedresiduals and linear bottlenecks." CVPR, 2018.

• ShuffleNetv2: Ma, Ningning, et al. "Shufflenet v2: Practicalguidelines for efficient cnn architecture design." ECCV, 2018.

Convolutional Unit - ESPNet

1x1 Conv Layer

3x3 DilatedConv Layer

Concat

ESPNet: Mehta, Sachin, et al. “ESPNet: Efficient spatial pyramidof dilated convolutions for semantic segmentation." ECCV, 2018.

ESPNet Unit

Hierarchical Feature Fusion in ESPNet

1x1 Conv Layer

Concat

Output of ESP module

Gridding Artifact in Dilated Convolutions

Standard convolution

Dilated convolution27

Hierarchical Feature Fusion in ESPNet

1x1 Conv Layer

Concat

Output of ESP module

AddAdd

• ESPNetv2• 3x3 dilated convolutions are replaced

by 3x3 depth-wise dilated convolutions

Convolutional Unit – ESPNetv2

• ESPNet: Mehta, Sachin, et al. “ESPNet: Efficient spatial pyramid of dilatedconvolutions for semantic segmentation." ECCV, 2018.

• ESPNetv2: Mehta, Sachin, et al. "Espnetv2: A light-weight, power efficient,and general purpose convolutional neural network." CVPR, 2019.

Semantic Segmentation

CU DU CU DU CU GAP FC Cat

CU DU GAPFCConvolutional

UnitDown-sampling

UnitFully-connectedOr Linear Layer

Global Avg.Pooling

Encoder-Decoder

CU DU CU DU CU

UnitDown-sampling

Global Avg.Pooling

Encoder-Decoder

CU DU CU DU CU

UnitDown-sampling

Global Avg.Pooling

UUCUUUCU

UUUp-sampling

Encoder-Decoder with Skip-Connections (U-Net)

UnitDown-sampling

Global Avg.Pooling

UUUp-sampling

CU DU CU DU CU

UUCUUUCU

Papers for Semantic Segmentation

• FCN: Long et al. "Fully convolutional networks for semantic segmentation." CVPR, 2015.

• SegNet: Badrinarayanan et al. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation”, PAMI, 2017

• DeepLab: • Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep

convolutional nets, atrous convolution, and fully connected crfs." PAMI, 2017.• Chen, Liang-Chieh, et al. "Encoder-decoder with atrous separable convolution for

semantic image segmentation." ECCV, 2018

• UNet: Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." MICCAI, 2015.

Results

Semantic Segmentation

Semantic SegmentationWSIs 3D MRIs

Object Detection

Thanks!!

https://github.com/sacmehta/

https://sacmehta.github.io/

Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for...

Documents

Transcript of Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for...

Saliency, Scale and Image Descriptionshapiro/EE596/notes/kadir.pdf · TIMOR KADIR∗ AND MICHAEL BRADY Robotics Research Laboratory, Department of Engineering Science, University

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions ...openaccess.thecvf.com/content_ECCV_2018/papers/Sachin_Mehta_ESPNet... · our ESP module is more efﬁcient than other

National Efficient Cost Determination 2020–21 · National Efficient Cost Determination 2020−21 8 3. National efficient cost for small rural public hospitals 3.1 National efficient

Making Russia Energy Efficient - Energy Efficient Cities and Towns

Object Class Recognition by Unsupervised Scale …shapiro/EE596/notes/...into account. Second, appearance here is learnt simultane-ously with shape, whereas in their work the appearance

4 - Efficient biotechnology for efficient animal nutrition ...

Efficient-market Hypothesis · 2004-06-15 · Efficient-market Hypothesis

Covington Efficient PrologCovington-Efficient-Prolog

NEMOs efficient competition and efficient market coupling · NEMOs – efficient competition and efficient market coupling Report to The Norwegian Water Resources and Energy Directorate

Our Energy Efficient House - Sustainable Homes & … · Our Energy Efficient HouseOur Energy Efficient House. It takes a combination of many features to produce a high energy efficient

The PASCAL Visual Object Classes Challenge 2012 (VOC2012 ...homes.cs.washington.edu/~shapiro/EE596/notes/Pascal.pdf · The PASCAL Visual Object Classes Challenge 2012 (VOC2012 ) Part

THYRIPOL - reliable, efficient, flexible - …w3app.siemens.com/.../thyripol-reliable-efficient-flexible-en.pdf · THYRIPOL – reliable, efficient, flexible ... • Closed-loop control

Content-Based Image Retrieval - University of Washingtonshapiro/EE596/notes/... · 2019-10-25 · Content-Based Image Retrieval 1 ECE P 596 Autumn 2019 Linda Shapiro. Content-Based

Efficient Vehicles Versus Efficient Transportation ... · Efficient Vehicles Versus Efficient Transportation 1 Introduction There are often many possible solutions to a problem. Which

Face/Flesh Detection and Face Recognitionshapiro/EE596/notes/FaceDetRec… · Face/Flesh Detection and Face Recognition Linda Shapiro ECE P 596 1. What’s Coming 1. Review of Bakic

Build Thermally Efficient and Sustainable · Web viewBuild Thermally Efficient and Sustainable Structures Subject Chapter 8: Thermal Mass Keywords Energy efficient, thermally efficient,

Motion and Optical Flow - University of Washingtonshapiro/EE596/notes/...– Apply this flow field to warp the first frame toward the second frame. – Rerun L-K and warping till convergence

The SIFT (Scale Invariant Feature Transform) …homes.cs.washington.edu/~shapiro/EE596/notes/SIFT.pdfThe SIFT (Scale Invariant Feature Transform) Detector and Descriptor developed

Energy efficient Mortgages Action Plan (EeMAP) Energy ... · July 2019 Energy Efficient Mortgages Initiative 3 The Energy Efficient Mortgages Initiative The Energy Efficient Mortgages

Binary Image Analysis: Part 1 Readings: Chapter 3: 3.1, 3.4, 3shapiro/EE596/notes/Binary1.pdf · 1. Thresholding a gray-tone image 2. Determining good thresholds 3. Connected components