6/1/17
1
Deepneuralnetworks
June1st,2017
YongJaeLeeUCDavis
ManyslidesfromRobFergus,SvetlanaLazebnik,Jia-BinHuang,DerekHoiem
Announcements• PostquesMonsonPiazzaforreview-session(6/8lecture)
2
Outline• DeepNeuralNetworks• ConvoluMonalNeuralNetworks(CNNs)
6/1/17
2
TradiMonalImageCategorizaMon:Trainingphase
TrainingLabels
Training Images
ClassifierTraining
Training
ImageFeatures
TrainedClassifier
TrainingLabels
Training Images
ClassifierTraining
Training
ImageFeatures
TrainedClassifier
ImageFeatures
Testing
Test Image Outdoor Prediction Trained
Classifier
TradiMonalImageCategorizaMon:TesMngphase
Featureshavebeenkey..
SIFT [Loewe IJCV 04] HOG [Dalal and Triggs CVPR 05]
SPM [Lazebnik et al. CVPR 06] DPM [Felzenszwalb et al. PAMI 10]
Color Descriptor [Van De Sande et al. PAMI 10]
Hand-cra(ed
6/1/17
3
Whataboutlearningthefeatures?
• Learnafeaturehierarchyallthewayfrompixelstoclassifier
• Eachlayerextractsfeaturesfromtheoutputofpreviouslayer
• Layershave(nearly)thesamestructure
• Trainalllayersjointly
Layer1 Layer2 Layer3 SimpleClassifier
Image/VideoPixels
LearningFeatureHierarchy
Goal:Learnusefulhigher-levelfeaturesfromimagesFeaturerepresentaMon
Input data
1stlayer“Edges”
2ndlayer“Objectparts”
3rdlayer“Objects”
Pixels
Lee et al., ICML 2009; CACM 2011
Slide: Rob Fergus
LearningFeatureHierarchy
• Better performance
• Other domains (unclear how to hand engineer): – Kinect – Video – Multi spectral
• Feature computation time – Dozens of features now regularly used – Getting prohibitive for large datasets (10’s sec /image)
Slide: R. Fergus
6/1/17
4
“Shallow”vs.“deep”architectures
Hand-designedfeatureextracMon
Trainableclassifier
Image/VideoPixels
ObjectClass
Layer1 LayerN Simpleclassifier
ObjectClass
Image/VideoPixels
Traditional recognition: “Shallow” architecture
Deep learning: “Deep” architecture
…
BiologicalneuronandPerceptrons
A biological neuron An artificial neuron (Perceptron) - a linear classifier
Simple,Complex,andHyper-complexcells
David H. Hubel and Torsten Wiesel
David Hubel's Eye, Brain, and Vision
Suggested a hierarchy of feature detectors in the visual cortex, with higher level features responding to patterns of activation in lower level cells, and propagating activation upwards to still higher level cells.
video
6/1/17
5
Hubel/WieselArchitectureandMulM-layerNeuralNetwork
Hubel and Weisel’s architecture Multi-layer Neural Network - A non-linear classifier
Neuron:LinearPerceptron• Inputsarefeaturevalues• Eachfeaturehasaweight• SumistheacMvaMon
• IftheacMvaMonis:– PosiMve,output+1– NegaMve,output-1
Slide credit: Pieter Abeel and Dan Klein
Two-layerperceptronnetwork
Slide credit: Pieter Abeel and Dan Klein
6/1/17
6
Two-layerperceptronnetwork
Slide credit: Pieter Abeel and Dan Klein
Two-layerperceptronnetwork
Slide credit: Pieter Abeel and Dan Klein
Learningw• Trainingexamples
• ObjecMve:amisclassificaMonloss
• Procedure:
– Gradientdescent/hillclimbing
Slide credit: Pieter Abeel and Dan Klein
6/1/17
7
Hillclimbing
• Simple,generalidea:– Startwherever– Repeat:movetothebestneighboringstate
– Ifnoneighborsbeferthancurrent,quit
– Neighbors=smallperturbaMonsofw
• What’sbad?– OpMmal?
Slide credit: Pieter Abeel and Dan Klein
Two-layerperceptronnetwork
Slide credit: Pieter Abeel and Dan Klein
Two-layerperceptronnetwork
Slide credit: Pieter Abeel and Dan Klein
6/1/17
8
Two-layerneuralnetwork
Slide credit: Pieter Abeel and Dan Klein
NeuralnetworkproperMes• Theorem(Universalfunc9onapproximators):Atwo-layernetworkwithasufficientnumberofneuronscanapproximateanyconMnuousfuncMontoanydesiredaccuracy
• Prac9calconsidera9ons:– Canbeseenaslearningthefeatures– Largenumberofneurons
• Dangerforoverfijng
– Hill-climbingprocedurecangetstuckinbadlocalopMma
Slide credit: Pieter Abeel and Dan Klein Approximation by Superpositions of Sigmoidal Function,1989
MulM-layerNeuralNetwork• Anon-linearclassifier• Training:findnetworkweightswtominimizetheerrorbetweentruetraininglabelsandesMmatedlabels
• MinimizaMoncanbedonebygradientdescentprovidedfisdifferenMable
• Thistrainingmethodiscalledback-propaga9on
6/1/17
9
Outline• DeepNeuralNetworks• Convolu9onalNeuralNetworks(CNNs)
ConvoluMonalNeuralNetworks(CNN,ConvNet,DCN)
• CNN=amulM-layerneuralnetworkwith– LocalconnecMvity:• Neuronsinalayerareonlyconnectedtoasmallregionofthelayerbeforeit
– ShareweightparametersacrossspaMalposiMons:• Learningshil-invariantfilterkernels
Image credit: A. Karpathy
Neocognitron[Fukushima,BiologicalCyberneMcs1980]
Deformation-Resistant Recognition S-cells: (simple) - extract local features C-cells: (complex) - allow for positional errors
6/1/17
10
LeNet[LeCunetal.1998]
Gradient-basedlearningappliedtodocumentrecogniMon[LeCun,Bofou,Bengio,Haffner1998] LeNet-1 from 1993
• StackmulMplestagesoffeatureextractors
• Higherstagescomputemoreglobal,moreinvariantfeatures
• ClassificaMonlayerattheend
InputImage
ConvoluMon(Learned)
Non-linearity
SpaMalpooling
ConvoluMonalNeuralNetworks
Featuremaps
WhatisaConvoluMon?• Weightedmovingsum
Input FeatureAcMvaMonMap
.
.
.
6/1/17
11
WhyConvoluMon?• Fewparameters(filterweights)• Dependenciesarelocal• TranslaMoninvariance
Input FeatureMap
.
.
.
Input FeatureMap
...
ConvoluMonalNeuralNetworks
InputImage
ConvoluMon(Learned)
Non-linearity
SpaMalpooling
Featuremaps
ConvoluMonalNeuralNetworks
Rectified Linear Unit (ReLU)
slide credit: S. Lazebnik
InputImage
ConvoluMon(Learned)
Non-linearity
SpaMalpooling
Featuremaps
6/1/17
12
Non-Linearity
• Per-element(independent)• OpMons:
– Tanh– Sigmoid:1/(1+exp(-x))– RecMfiedlinearunit(ReLU)
• Makeslearningfaster• SimplifiesbackpropagaMon• AvoidssaturaMonissuesàPreferredopMon
InputImage
ConvoluMon(Learned)
Non-linearity
SpaMalpooling
NormalizaMon
Featuremaps
Maxpooling
ConvoluMonalNeuralNetworks
Max-pooling: a non-linear down-samp Provide translation invariance
SpaMalPooling• Averageormax• Non-overlapping/overlappingregions• Roleofpooling:• InvariancetosmalltransformaMons• LargerrecepMvefields(seemoreofinput)
Max
Average
6/1/17
13
Engineeredvs.learnedfeatures
Image
FeatureextracMon
Pooling
Classifier
Label
Image
ConvoluMon/pool
ConvoluMon/pool
ConvoluMon/pool
ConvoluMon/pool
ConvoluMon/pool
Dense
Dense
Dense
Label Convolutional filters are trained in a supervised manner by back-propagating classification error
Compare:SIFTDescriptor
ImagePixels Apply
orientedfilters
SpaMalpool(Sum)
Normalizetounitlength
FeatureVector
Lowe[IJCV2004]
Compare:SpaMalPyramidMatching
FilterwithVisualWords
MulM-scalespaMalpool(Sum)
TakemaxVWresponse
Globalimage
descriptor
Lazebnik,Schmid,Ponce
[CVPR2006]SIFT
features
6/1/17
14
PreviousConvnetsuccesses
• Handwrifentext/digits– MNIST(0.17%error[Ciresanetal.2011])– Arabic&Chinese[Ciresanetal.2012]
• SimplerrecogniMonbenchmarks– CIFAR-10(9.3%error[Wanetal.2013])– TrafficsignrecogniMon• 0.56%errorvs1.16%forhumans[Ciresanetal.2011]
ImageNetChallenge2012
[Dengetal.CVPR2009]
• ~14millionlabeledimages,20kclasses
• ImagesgatheredfromInternet
• HumanlabelsviaAmazonTurk
• ImageNetChallenge:1.2milliontrainingimages,1000classes
A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012
AlexNetSimilarframeworktoLeCun’98but:• Biggermodel(7hiddenlayers,650,000units,60,000,000params)• Moredata(106vs.103images)• GPUimplementaMon(50xspeedupoverCPU)• TrainedontwoGPUsforaweek
A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012
6/1/17
15
AlexNetforimageclassificaMon
“car”
AlexNet
Fixed input size: 224x224x3
ImageNetClassificaMonChallenge
http://image-net.org/challenges/talks/2016/ILSVRC2016_10_09_clsloc.pdf
AlexNet
IndustryDeployment
• UsedinFacebook,Google,Microsol• Startups• ImageRecogniMon,SpeechRecogniMon,….• FastattestMme
Taigman et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR’14
6/1/17
16
VisualizingCNNs• WhatinputpafernoriginallycausedagivenacMvaMoninthefeaturemaps?
Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]
Layer1
Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]
Layer2
Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]
6/1/17
17
Layer3
Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]
Layer4and5
Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]
BeyondclassificaMon• DetecMon• SegmentaMon• Regression• PoseesMmaMon• Matchingpatches• Synthesis
andmanymore…
6/1/17
18
R-CNN:RegionswithCNNfeatures• TrainedonImageNetclassificaMon• FinetuneCNNonPASCAL
RCNN [Girshick et al. CVPR 2014]
LabelingPixels:SemanMcLabels
Fully Convolutional Networks for Semantic Segmentation [Long et al. CVPR 2015]
LabelingPixels:EdgeDetecMon
DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection [Bertasius et al. CVPR 2015]
6/1/17
19
CNNforRegression
DeepPose [Toshev and Szegedy CVPR 2014]
CNNasaSimilarityMeasureforMatching
FaceNet [Schroff et al. 2015] Stereo matching [Zbontar and LeCun CVPR 2015] Compare patch [Zagoruyko and Komodakis 2015]
Match ground and aerial images [Lin et al. CVPR 2015] FlowNet [Fischer et al 2015]
CNNforImageGeneraMon
Learning to Generate Chairs with Convolutional Neural Networks [Dosovitskiy et al. CVPR 2015]
6/1/17
20
ChairMorphing
Learning to Generate Chairs with Convolutional Neural Networks [Dosovitskiy et al. CVPR 2015]
TransferLearning• Improvementoflearninginanewtaskthroughthetransferofknowledgefromarelatedtaskthathasalreadybeenlearned.
• WeightiniMalizaMonforCNN
LearningandTransferringMid-LevelImageRepresentaMonsusingConvoluMonalNeuralNetworks[Oquab et al. CVPR 2014]
Deeplearninglibraries• Tensorflow• Caffe• Torch• MatConvNet
6/1/17
21
FoolingCNNs
Intriguing properties of neural networks [Szegedy ICLR 2014]
Whatisgoingon?
x x← x+α ∂E∂x
∂E∂x
http://karpathy.github.io/2015/03/30/breaking-convnets/ Explaining and Harnessing Adversarial Examples [GoodfellowICLR2015]
QuesMons?
SeeyouTuesday!
63
Top Related