Convolutional Neural Networkscs6320/cv_files/CNN_6320.pdfReference Many of the slides are prepared...

Convolutional Neural Networks

Srikumar Ramalingam

Reference

Many of the slides are prepared using the following resources:

• Marc'Aurelio Ranzato Deep learning tutorial in CVPR 2014

Example: 300x300 image40K hidden units

~4B parameters!!!

- Spatial correlation is local- Waste of resources + we don’t have enough training samples anyway..

Fully Connected Layer

Slide Credit: Marc'Aurelio Ranzato

Fully connected layers do not take into account the spatial structure of the images. For instance, it treats input pixels which are far apart and close together on exactly the same footing.

Example: 300x300 image40K hidden unitsFilter size: 10x10

4M parameters

Note: This parameterization is good when input image is registered (e.g., face recognition).

Locally Connected Layer

Note: This parameterization is good when input image is registered (e.g., face recognition).

Example: 300x300 image40K hidden unitsFilter size: 10x10

4M parameters

Locally Connected Layer

Share the same parameters across different locations (assuming input is stationary):Convolutions with learned kernels

Convolutional Layer

* -1 0 1-1 0 1-1 0 1

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato7

Convolutional Layer

Local Receptive Fields in CNNs

• Local receptive fields – each neuron in the hidden layer will be connected to a small window of input neurons, say 5x5 region, corresponding to 25 input neurons. This input area of a neuron is called its receptive field.

• Each hidden neuron can be thought of analyzing its local receptive field.

Local Receptive Fields in CNNs

• We slide the local receptive field over by one pixel to the right (i.e., by one neuron), to connect to a second hidden neuron.

• If we have a 28x28 image and 5x5 receptive field, we will have 24x24 hidden neurons.

Stride Length

• Sometimes we slide the local receptive field over by more than one pixel to the right (or down). In that case the stride length could be 2 or more.

• This will lead to fewer hidden neurons. For example, in the case of a 28x28 image with 5x5 receptive field and stride length 2, we will have just 12x12 hidden neurons.

Shared Weights and Biases

• 𝜎 is the activation function such as sigmoid unit, 𝑏 is the bias unit, 𝑤𝑙,𝑚 is the weight term, and 𝑎𝑥,𝑦 is the input variable at location 𝑥, 𝑦 .

• Note that the weights and the bias terms are the same for all the hidden neurons in one feature map.

• All the neurons in the first hidden layer detect exactly the same feature, just at different locations in the image, e.g, cats.

𝑗, 𝑘 𝑡ℎ ℎ𝑖𝑑𝑑𝑒𝑛 𝑛𝑒𝑢𝑟𝑜𝑛

Feature maps

• We call this map from the input to the hidden layer as the feature map.

• The weights are called shared weights and biases are called shared biases, and they are only shared in one feature map. Different feature maps have different weights and biases. We use 20 different filters and each filter is of dimension 5x5x10.

20 feature maps

Conv kernel = 5x5,Input channels = 10,Output channels = 20,Stride = 1

MD-K+1

Assumption: zero-padding and stride 1

With padding P and Stride S

• Input : D x D x M – there are M input channels (M@D x D)

• Let us assume that we have kernels of size K x K

• Let us assume that we have N output channels. What is the dimensions of the output?

N@( (D – K + 2P)/S + 1) x (D – K + 2P)/S + 1) )

Example in 1D: Filter in 1DD=5, P=1,S=1,K=3 D=5, P=1,S=2,K=3

Input with padding

Pooling Layers

• A pooling layer takes each feature map output from the convolutional layer and prepares a condensed feature map.

• For instance, each unit in the pooling layer may summarize a region of neurons in the previous layer.

• As a concrete example, one common procedure for pooling is known as max-pooling, e.g., a pooling unit can output the maximum activation in the 2x2 input region.

Pooling Layers

• L2 pooling – square root of the sum of the squares of the 2x2 regions – several pooling options exist (e.g., average pooling)

Stride S = 2, Padding P = 1, input size D=5, filter size K=3

Why use padding?

• If there is no padding, then the size of the output would reduce by a small amount after each CONV, and the information at the borders would be “washed away” too quickly.

With overlap?

• Stride S and Filter size K

• Output dimension?

M@( ((D-K)/S+1) x ((D-K)/S + 1) )

MNIST data

• Each grayscale image is of size 28x28.

• 60,000 training images and 10,000 test images

• 10 possible labels (0,1,2,3,4,5,6,7,8,9)

Performance on MNIST is near perfect

Using several ideas: convolutions, pooling, the use of GPUs to do far more training than we did with shallow networks, the algorithmic expansion of our training, dropouts, etc.

33 out of 10,000 images are misclassified.Top Right: correct Bottom Right: misclassfied

Thank You

Convolutional Neural Networkscs6320/cv_files/CNN_6320.pdfReference Many of the slides are prepared...

Documents

Transcript of Convolutional Neural Networkscs6320/cv_files/CNN_6320.pdfReference Many of the slides are prepared...

COMPUTATIONAL MODELING OF PROTEIN KINASES: …biophys/cv_files/thesis_yingting.pdfProtein kinases catalyze protein phosphorylation reactions, i.e. the transfer of the γ-phosphoryl

jude.edu.syjude.edu.sy/assets/uploads/employees/cv_files/ihsan... · 2018-07-26 · lhsan A. Al- Duhan . Synthesis and characterization Complexes , Anthranilic Acid with some metal

Lecture 06 marco aurelio ranzato - deep learning

CURRICULUM VITAE JORGE CATALÁN VÁZQUEZmedia.planreforma.com/cv_files/legacy/cv/CV-bee96.pdf · refuerzo). INTEMAC - Madrid 25 al 28 Octubre 1.993.

Unsupervised Deep LearningUnsupervised Deep Learning Tutorial - Part 2 Alex Graves Marc’Aurelio Ranzato NeurIPS, 3 December 2018 gravesa@google.com ranzato@fb.com Overview • Practical

Sequences Modeling with - cilvr.nyu.edu Sequences Modeling with Deep Learning Marc’Aurelio Ranzato Facebook AI Research ranzato@fb.com NYU - DS-GA 1008, 20 March 2017 2

Overview of Stochastic Gradient Descent Algorithmscs6320/cv_files/GradientDescentOverview.… · •Mini-batch gradient descent is typically the algorithm of choice when training

Learning Factored Representations in a Deep …ranzato/publications/eigen_iclr14.pdfLearning Factored Representations in a Deep Mixture of Experts David Eigen 1;2 Marc’Aurelio Ranzato

An Introduction to Deep Learningranzato/files/ranzato_deeplearn17_lec2_nlp.pdf · An Introduction to Deep Learning Marc’Aurelio Ranzato Facebook AI Research ranzato@fb.com DeepLearn

Cora Dvorkin CurriculumVitaedvorkin.physics.harvard.edu/CV_files/CV_Dvorkin.pdfCoraDvorkin CurriculumVitae 5 RESEARCHSTUDENTSUPERVISION Graduate students at Harvard: • Nick Deporzio

Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

Discrete Restricted Boltzmann Machinesjmlr.csail.mit.edu/papers/volume16/montufar15a/montufar15a.pdfDiscrete Restricted Boltzmann Machines 2008; Ranzato et al., 2010). The techniques

MULTISCALE MODELING OF THE ERBB RECEPTOR ...biophys/cv_files/thesis_telesco.pdfMULTISCALE MODELING OF THE ERBB RECEPTOR TYROSINE KINASE SIGNALING NETWORK THROUGH THEORY AND EXPERIMENT

Curriculum Vitae - Loyola University New Orleanscas.loyno.edu/sites/chn.loyno.edu/files/cv_files/Curriculum-Vitae.pdf · Curriculum Vitae Connie Lynn Rodriguez Emmett M. Bienvenu,

Y LeCun MA Ranzato - CILVR at NYU – Computational Intelligence, Learning… · 2016-01-27Y LeCun MA Ranzato Deep Learning ... Image recognition ... AI, Neuroscience, Cognitive

Automatic Recognition of Biological Particles in ...yann.lecun.com/exdb/publis/pdf/ranzato-07a.pdf · Automatic Recognition of Biological Particles ... from a mixture-of-Gaussians

Generating more realistic images using gated MRF’sfritz/absps/mcimage.pdf · Generating more realistic images using gated MRF’s Marc’Aurelio Ranzato Volodymyr Mnih Geoffrey

An Introduction to Deep Learning - Department of Computer ...ranzato/files/ranzato_deep... · An Introduction to Deep Learning Marc’Aurelio Ranzato Facebook AI Research ranzato@fb.com

Introduction to Graphical Modelscs6320/cv_files/GraphicalModels.pdfProbability (Reminder) •Sample space is the set of all possible outcomes. Example: S = {1,2,3,4,5,6} •Power set

Modeling Pixel Means and Covariances Using Factorized ...ranzato/publications/ranzato_cvpr2010.pdf · Using Factorized Third-Order Boltzmann Machines Marc’Aurelio Ranzato Geoffrey