cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... ·...

DeepResidualLearningforImageRecognition

KaimingHe,XiangyuZhang,ShaoqingRen,JianSun

workdoneatMicrosoftResearchAsia

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,512

3x3conv,512

1x1conv,2048

1x1conv,512

3x3conv,512

1x1conv,2048

1x1conv,512

3x3conv,512

1x1conv,2048

avepool,fc1

7x7conv

,64,/2,pool/2

ResNet @ILSVRC&COCO2015Competitions

1stplacesinallfivemaintracks• ImageNetClassification:“Ultra-deep”152-layer nets• ImageNetDetection: 16% betterthan2nd• ImageNetLocalization: 27% betterthan2nd• COCODetection: 11% betterthan2nd• COCOSegmentation: 12% betterthan2nd

*improvementsarerelativenumbersKaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

RevolutionofDepth

6.7 7.3

25.828.2

ILSVRC'15ResNet

ILSVRC'14GoogleNet

ILSVRC'14VGG

ILSVRC'13 ILSVRC'12AlexNet

ILSVRC'11 ILSVRC'10

ImageNetClassificationtop-5error(%)

shallow8layers

19layers22layers

152layers

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

8layers

RevolutionofDepth

HOG,DPM AlexNet(RCNN)

VGG(RCNN)

ResNet(FasterRCNN)*

PASCALVOC2007ObjectDetectionmAP (%)

shallow8layers

16layers

101layers

*w/otherimprovements&moredata

Enginesofvisualrecognition

RevolutionofDepth11x11conv,96,/4,pool/2

5x5conv,256,pool/2

3x3conv,384

3x3conv,256,pool/2

fc,4096

fc,1000

AlexNet,8layers(ILSVRC2012)

RevolutionofDepth11x11conv,96,/4,pool/2

5x5conv,256,pool/2

3x3conv,384

3x3conv,256,pool/2

fc,4096

fc,1000

3x3conv,64

3x3conv,64,pool/2

3x3conv,128

3x3conv,128,pool/2

3x3conv,256

3x3conv,256,pool/2

3x3conv,512

3x3conv,512,pool/2

3x3conv,512

3x3conv,512,pool/2

fc,4096

fc,1000

VGG,19layers(ILSVRC2014)

Conv7x7+ 2(S)

MaxPool 3x3+ 2(S)

LocalRespNorm

Conv1x1+ 1(V)

Conv3x3+ 1(S)

LocalRespNorm

MaxPool 3x3+ 2(S)

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

MaxPool 3x3+ 2(S)

Dept hConcat

AveragePool 5x5+ 3(V)

Dept hConcat

MaxPool 3x3+ 2(S)

Dept hConcat

Conv1x1+ 1(S)

Soft maxAct ivat ion

soft max0

Conv1x1+ 1(S)

soft max1

soft max2

GoogleNet,22layers(ILSVRC2014)

RevolutionofDepthResNet,152layers(ILSVRC2015)

3x3conv,64

3x3conv,64,pool/2

3x3conv,128

3x3conv,128,pool/2

3x3conv,256

3x3conv,256,pool/2

3x3conv,512

3x3conv,512,pool/2

3x3conv,512

3x3conv,512,pool/2

fc,4096

fc,1000

11x11conv,96,/4,pool/2

5x5conv,256,pool/2

3x3conv,384

3x3conv,256,pool/2

fc,4096

fc,1000

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,64

3x3conv,64

1x1conv,256

1x2conv,128,/2

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,256,/2

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,512,/2

3x3conv,512

1x1conv,2048

1x1conv,512

3x3conv,512

1x1conv,2048

1x1conv,512

3x3conv,512

1x1conv,2048

avepool,fc1000

7x7conv,64,/2,pool/2

VGG,19layers(ILSVRC2014)

Islearningbetternetworksassimpleasstackingmorelayers?

Simplystackinglayers?

0 1 2 3 4 5 60

iter. (1e4)

trainerror(%)

0 1 2 3 4 5 60

iter. (1e4)

testerror(%)CIFAR-10

56-layer

20-layer

56-layer

20-layer

• Plain nets:stacking3x3convlayers…• 56-layernethashighertrainingerror andtesterrorthan20-layernet

Simplystackinglayers?

0 1 2 3 4 5 60

iter. (1e4)

plain-20plain-32plain-44plain-56

CIFAR-10

20-layer32-layer44-layer56-layer

0 10 20 30 40 5020

iter. (1e4)

plain-18plain-34

ImageNet-1000

34-layer

18-layer

• “Overlydeep”plainnetshavehighertrainingerror• Ageneralphenomenon,observedinmanydatasets

solid:test/valdashed:train

7x7conv,64,/2

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,512,/2

3x3conv,512

fc1000

ashallowermodel

(18layers)

adeepercounterpart(34layers)

7x7conv,64,/2

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,512,/2

3x3conv,512

fc1000

“extra”layers

• Richersolutionspace

• Adeepermodelshouldnothavehighertrainingerror

• Asolutionbyconstruction:• originallayers:copiedfroma

learnedshallowermodel• extralayers:setasidentity• atleastthesametrainingerror

• Optimizationdifficulties:solverscannotfindthesolutionwhengoingdeeper…

DeepResidualLearning

• Plaintnet

anytwostackedlayers

𝐻(𝑥)

weightlayer

𝐻 𝑥 isanydesiredmapping,

hopethe2weightlayersfit𝐻(𝑥)

• Residual net

𝐻 𝑥 isanydesiredmapping,

hopethe2weightlayersfit𝐻(𝑥)

hope the2weightlayersfit𝐹(𝑥)

let𝐻 𝑥 = 𝐹 𝑥 + 𝑥weightlayer

weightlayer

𝐻 𝑥 = 𝐹 𝑥 + 𝑥

identity𝑥

𝐹(𝑥)

• 𝐹 𝑥 isaresidual mappingw.r.t.identity

• Ifidentitywereoptimal,easytosetweightsas0

• Ifoptimalmappingisclosertoidentity,easiertofindsmallfluctuations

weightlayer

𝐻 𝑥 = 𝐹 𝑥 + 𝑥

identity𝑥

𝐹(𝑥)

Network“Design”

• Keepitsimple

• Ourbasicdesign (VGG-style)• all3x3conv(almost)

• spatialsize/2=>#filtersx2• Simpledesign;justdeep!

7x7conv,64,/2

pool,/2

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,512,/2

3x3conv,512

avgpool

fc1000

7x7conv,64,/2

pool,/2

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,512,/2

3x3conv,512

avgpool

fc1000

plainnet ResNet

CIFAR-10experiments

0 1 2 3 4 5 60

iter. (1e4)

plain-20plain-32plain-44plain-56

CIFAR-10plainnets

0 1 2 3 4 5 60

iter. (1e4)

ResNet-20ResNet-32ResNet-44ResNet-56ResNet-110

CIFAR-10ResNets

110-layer

• DeepResNetscanbetrainedwithoutdifficulties• DeeperResNetshavelowertrainingerror,andalsolowertesterror

solid:testdashed:train

ImageNetexperiments

0 10 20 30 40 5020

iter. (1e4)

ResNet-18ResNet-34

0 10 20 30 40 5020

iter. (1e4)

plain-18plain-34

ImageNetplainnets ImageNetResNets

solid:testdashed:train

34-layer

18-layer

34-layer

• DeepResNetscanbetrainedwithoutdifficulties• DeeperResNetshavelowertrainingerror,andalsolowertesterror

ImageNetexperiments7.4

6.15.7

ResNet-34ResNet-50ResNet-101ResNet-15210-crop testing,top-5val error(%)

thismodelhaslowertimecomplexity

thanVGG-16/19

• Deeper ResNetshavelower error

Beyondclassification

AtreasurefromImageNetisonlearningfeatures.

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.

“Featuresmatter.”(quote[Girshicketal.2014],theR-CNNpaper)

task 2nd-placewinner ResNets margin

(relative)

ImageNetLocalization(top-5error) 12.0 9.0 27%

ImageNetDetection(mAP@.5) 53.6 62.1 16%

COCO Detection(mAP@.5:.95) 33.5 37.3 11%

COCOSegmentation(mAP@.5:.95) 25.1 28.2 12%

• OurresultsareallbasedonResNet-101• Ourfeaturesarewelltransferrable

absolute8.5%better!

ObjectDetection(brief)

• Simply“FasterR-CNN+ResNet”

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

featuremap

RegionProposalNet

proposals

classifier

RoI pooling

FasterR-CNNbaseline mAP@.5 mAP@.5:.95

VGG-16 41.5 21.5ResNet-101 48.4 27.2

COCOdetection results(ResNethas28%relativegain)

OurresultsonMSCOCOKaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

*theoriginalimageisfromtheCOCOdataset

Resultsonrealvideo.ModeltrainedonMSCOCOw/80categories.(frame-by-frame;notemporalprocessing)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

thisvideoisavailableonline:https://youtu.be/WZmSMkK9VuA

MoreVisualRecognitionTasksResNets leadonthesebenchmarks(incompletelist):• ImageNet classification,detection,localization• MSCOCO detection,segmentation

• PASCALVOC detection,segmentation• VQA challenge2016

• Humanposeestimation[Newelletal2016]• Depthestimation[Laina etal2016]• Segmentproposal[Pinheiro etal2016]• …

PASCALdetectionleaderboard

PASCALsegmentationleaderboard

ResNet-101

PotentialApplications

ResNetshaveshownoutstandingorpromisingresultson:

VisualRecognition

ImageGeneration(PixelRNN,NeuralArt,etc.)

NaturalLanguageProcessing(VerydeepCNN)

SpeechRecognition(preliminaryresults)

Advertising,userprediction(preliminaryresults)

Conclusions

• DeepResidualNetworks:• Easytotrain• Simplygainaccuracyfromdepth• Welltransferrable

• Follow-up[Heetal.arXiv 2016]• 200 layersonImageNet,1000 layersonCIFAR

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv 2016.KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Resources

• ModelsandCode• OurImageNetmodelsinCaffe:https://github.com/KaimingHe/deep-residual-networks

• Manyavailableimplementations:(listinhttps://github.com/KaimingHe/deep-residual-networks)

• FacebookAIResearch’sTorchResNet:https://github.com/facebook/fb.resnet.torch

• Torch,CIFAR-10,withResNet-20toResNet-110,trainingcode,andcurves:code• Lasagne,CIFAR-10,withResNet-32andResNet-56andtrainingcode:code• Neon,CIFAR-10,withpre-trainedResNet-32toResNet-110models,trainingcode,andcurves:code• Torch,MNIST,100layers:blog,code• AwinningentryinKaggle's rightwhalerecognitionchallenge:blog,code• Neon,Place2(mini),40layers:blog,code• …....

cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... ·...

Documents

Transcript of cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... ·...

Deep Residual Learning for Accelerated MRI using Magnitude and … · 2018-04-03 · 1 Deep Residual Learning for Accelerated MRI using Magnitude and Phase Networks Dongwook Lee,

Deep Residual Convolutional Neural Network for ...junzhou/papers/C_ICIG_2017.pdf · We propose a deep residual convolutional neural network to increase the resolu- tion of hyperspectral

Adaptive deep residual network for single image super-resolution · 2020-01-17 · Keywords single image super-resolution (SISR); adaptive deep residual network; deep learning 1 Introduction

Tidal and residual currents over abrupt deep-sea ...

cvpr2016 deep residual learning kaiminghe...ResNet @ ILSVRC & COCO 2015 Competitions 1st places in all five main tracks • ImageNet Classification: “Ultra-deep” 152-layer nets

Image Super-Resolution Using Very Deep Residual Channel ...openaccess.thecvf.com/content_ECCV_2018/papers/... · Image Super-ResolutionUsing Very Deep Residual Channel Attention Networks

Learning Deep Representations of Fine-Grained Visual ...web.eecs.umich.edu/~honglak/cvpr2016-sentenceEmbed.pdf1. Introduction A key challenge in image understanding is to correctly

Deep Residual Learning - cs.kangwon.ac.krcs.kangwon.ac.kr/~leeck/AI2/deep_residual_learning.pdf · Deep Residual Learning MSRA @ ILSVRC & COCO 2015 competitions Kaiming He with Xiangyu

Learning Strict Identity Mappings in Deep Residual Networks › ~srikumar › publications_files › epsilonRes… · Learning Strict Identity Mappings in Deep Residual Networks Xin

Learning Deep Representations of Fine-Grained Visual ...honglak/cvpr2016-sentenceEmbed.pdf · Learning Deep Representations of Fine-Grained Visual Descriptions Scott Reed1, Zeynep

Deep Residual Learning for Image Recognition*yjlee/teaching/ecs289g... · Deep Residual Learning for Image Recognition* Wei-Pang Jan, Xuanqing Liu * Most of the figures/tables credit

Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July

Novel Hierarchal Multichannel Deep Residual Network Model ...

Deep CNN with Residual Connections and Range Normalization ...

Beyond Deep Residual Learning for Image …openaccess.thecvf.com/content_cvpr_2017_workshops/w12/...Beyond Deep Residual Learning for Image Restoration: Persistent Homology-Guided

Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition.pptx [Read-Only]

Deep Residual Networks - ICML · Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July 2016. Formerly

Deep Spatio-Temporal Residual Networks for Citywide … · Deep Spatio-Temporal Residual Networks for Citywide ... trafﬁc conditions during morning rush hours may be similar ...

Identity Mappings in Deep Residual Networks arXiv:1603 ... · Identity Mappings in Deep Residual Networks Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun Microsoft Research

Deep Residual Learning for Image Recognitionyjlee/teaching/ecs289g... · Deep Residual Learning for Image Recognition Wei-Pang Jan, Xuanqing Liu * Most of the figures/tables credit