DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning...

23
DENSELY CONNECTED CONVOLUTIONAL NETWORKS Gao Huang*, Zhuang Liu*, Laurens van der Maaten, Kilian Q. Weinberger CVPR 2017 Cornell University Tsinghua University Facebook AI Research Best paper award

Transcript of DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning...

Page 1: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

DENSELY CONNECTED CONVOLUTIONAL NETWORKS

Gao Huang*, Zhuang Liu*, Laurens van der Maaten, Kilian Q. Weinberger

CVPR 2017

Cornell University Tsinghua University

Facebook AI Research

Best paper award

Page 2: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

CONVOLUTIONAL NETWORKS

LeNet AlexNet

VGG Inception

ResNet

Page 3: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

STANDARD CONNECTIVITY

Page 4: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

RESNET CONNECTIVITY

Deep residual learning for image recognition: [He, Zhang, Ren, Sun] (CVPR 2015)

Identity mappings promote gradient propagation.

: Element-wise addition

Page 5: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

DENSE CONNECTIVITY

C C C C

: Channel-wise concatenation C

Page 6: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

DENSE AND SLIM

k channels k channels k channels k channels

k : Growth Rate

C C C C

Page 7: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

x4x2x1h1 h2 h3 h4x0

FORWARD PROPAGATION

x4x3x2x1x0

x0 x1x0x3x2x1 x0

x3x2x1x0 Batc

ReL

Con

Page 8: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

COMPOSITE LAYER IN DENSENET

x5 =h5([x0, …, x4])

kchannels

x4

Batc

h N

orm

ReL

U

Con

volu

tion

(3x3

)

x3x2x1 x0x3x2x1x0

Page 9: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

COMPOSITE LAYER IN DENSENET

Higher parameter and computational efficiency

WITH BOTTLENECK LAYER

lxkchannels

4xkchannels

Batc

h N

orm

ReL

U

Con

volu

tion

(3x3

)

Batc

h N

orm

ReL

U

Con

volu

tion

(1x1

)k

channels

x3x2x1 x0

x4

Page 10: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

DENSENETC

onvo

lutio

n

Pool

ing

Con

volu

tion

Pool

ing

Con

volu

tion

Pool

ing

Line

ar

Dense Block 1 Dense Block 2 Dense Block 3

Pooling reduces feature map sizes

Output

Feature map sizes match within each block

Page 11: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

ADVANTAGES OF DENSE CONNECTIVITY

Page 12: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

ADVANTAGE 1: STRONG GRADIENT FLOW

Error Signal

Deeply supervised Net: [Lee, Xie, Gallagher, Zhang, Tu] (2015)

Implicit “deep supervision”

Page 13: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

ADVANTAGE 2: PARAMETER & COMPUTATIONAL EFFICIENCY

ResNet connectivity:

DenseNet connectivity:

O(CxC)C C

lXkO(lxkxk)

k

Correlated features

k: Growth rate

#parameters:

k<<C

Diversified features

Input Output

Input

Output

hl

hl

Page 14: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

ADVANTAGE 3: MAINTAINS LOW COMPLEXITY FEATURES

y = w4h4(x)w4

Classifier uses most complex (high level) features

classifier

Standard Connectivity:

h1(x) h2(x) h3(x) h4(x)x

Increasingly complex features

Page 15: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

ADVANTAGE 3: MAINTAINS LOW COMPLEXITY FEATURESDense Connectivity:

h1(x) h2(x) h3(x) h4(x)x

C C C C

y = w0x + +w1h1(x) +w2h2(x) +w3h3(x) +w4h4(x)

classifier

w4

w0

w1

w2

w3

Classifier uses features of all complexity levels

Increasingly complex features

Page 16: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

RESULTS

Page 17: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

Test

Err

or (

%)

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

11.0

12.0

3.6

4.54.62

6.41

4.2

RESULTS ON CIFAR-10

ResNet (110 Layers, 1.7 M) ResNet (1001 Layers, 10.2 M)DenseNet (100 Layers, 0.8 M) DenseNet (250 Layers, 15.3 M)

Previous SOTA

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

11.0

12.0

5.25.9

10.5611.26

7.3

Previous SOTA

With data augmentation Without data augmentation

Page 18: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

Test

Err

or (

%)

10.0

15.0

20.0

25.0

30.0

35.0

17.6

22.322.71

27.22

20.5

RESULTS ON CIFAR-100

ResNet (110 Layers, 1.7 M) ResNet (1001 Layers, 10.2 M)DenseNet (100 Layers, 0.8 M) DenseNet (250 Layers, 15.3 M)

Previous SOTA

10.0

15.0

20.0

25.0

30.0

35.0

19.6

24.2

33.4735.58

28.2

Previous SOTA

With data augmentation Without data augmentation

Page 19: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

Top-

1 er

ror

(%)

20.0

22.0

24.0

26.0

28.0

GFLOPs

3 10 16 23 29

DenseNet ResNet

RESULTS ON IMAGENET

Top-

1 er

ror

(%)

20.0

22.0

24.0

26.0

28.0

# Parameters (M)

0 20 40 60 80

DenseNet ResNet

ResNet-152ResNet-101

ResNet-50

ResNet-34

ResNet-152ResNet-101

ResNet-50

ResNet-34

DenseNet-264(k=48)

DenseNet-264

DenseNet-201

DenseNet-169

DenseNet-121 DenseNet-121

DenseNet-169

DenseNet-201

DenseNet-264DenseNet-264(k=48)

Top-1: 20.27% Top-5: 5.17%

Page 20: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

MULTI-SCALE DENSENET

Classifier 4Classifier 2 Classifier 3Classifier 1

…Classifier 2 Classifier 3Classifier 1

cat: 0.20.2 ≱ threshold

cat: 0.40.4 ≱ threshold

cat: 0.60.6 > threshold

Multi-Scale DenseNet: [Huang, Chen, Li, Wu, van der Maaten, Weinberger] (arXiv Preprint: 1703.09844)

(Preview)

Page 21: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

MULTI-SCALE DENSENET

Classifier 4Classifier 2 Classifier 3Classifier 1

Test Input

“Easy” examples “Hard” examples

Inference Speed:~ 2.6x faster than ResNets~ 1.3x faster than DenseNets

(Preview)

Page 22: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

Memory efficient Torch implementation:https://github.com/liuzhuang13/DenseNet

Our Caffe ImplementationOur memory-efficient Caffe Implementation.Our memory-efficient PyTorch Implementation.PyTorch Implementation by Andreas Veit.PyTorch Implementation by Brandon Amos.MXNet Implementation by Nicatio.MXNet Implementation (supports ImageNet) by Xiong Lin.Tensorflow Implementation by Yixuan Li.Tensorflow Implementation by Laurent Mazare.

Tensorflow Implementation (with BC structure) by Illarion Khlestov.Lasagne Implementation by Jan Schlüter.Keras Implementation by tdeboissiere.Keras Implementation by Roberto de Moura Estevão Filho.Keras Implementation (with BC structure) by Somshubra Majumdar.Chainer Implementation by Toshinori Hanya.Chainer Implementation by Yasunori Kudo.

Other implementations:

Page 23: DENSELY CONNECTED CONVOLUTIONAL NETWORKSREFERENCES • Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016 • Chen-Yu Lee, et al. "Deeply-supervised nets"

REFERENCES

• Kaiming He, et al. "Deep residual learning for image recognition" CVPR 2016• Chen-Yu Lee, et al. "Deeply-supervised nets" AISTATS 2015• Gao Huang, et al. "Deep networks with stochastic depth" ECCV 2016• Gao Huang, et al. "Multi-Scale Dense Convolutional Networks for Efficient Prediction" arXiv

preprint arXiv:1703.09844 (2017) • Geoff Pleiss, et al. "Memory-Efficient Implementation of DenseNets”, arXiv preprint arXiv:

1707.06990 (2017)