Deep Nets for image classification - Colorado School of Mines

35
Image Classification with Deep Neural Networks Greg Schoeninger

Transcript of Deep Nets for image classification - Colorado School of Mines

Page 1: Deep Nets for image classification - Colorado School of Mines

Image Classification with Deep Neural Networks

Greg Schoeninger

Page 2: Deep Nets for image classification - Colorado School of Mines

The Problem (General)

• Transform raw input as pixels, to higher level representations.

• Edges, local shapes and colors, object parts, etc.

• How do we as humans recognize objects in scenes?

Page 3: Deep Nets for image classification - Colorado School of Mines

The Problem (simplified)

• Train a neural network to recognize 4 classes of images.

• STL‐10 Data Set (Stanford)

• 100,000 Unlabeled Image Patches

• 5,000 Labeled Images

Page 4: Deep Nets for image classification - Colorado School of Mines

Complexity

• Natural images have a high dimensionality.

• Varying position, orientation, lighting, etc. (Factors of variation)

• Many different features could be considered. (Edges, colors, SIFT, Gabor filters..)

Page 5: Deep Nets for image classification - Colorado School of Mines

Deep Architectures• Learn feature hierarchies from 

lower level features to higher level ones.

• Do not rely on hand engineered features.

• Inspired by the depth of the brain.

• Natural images are “stationary” ‐ features learned in one part can be applied to others.

• Invariant to small changes in input (translation, rotation, etc.)

Page 6: Deep Nets for image classification - Colorado School of Mines

Deep Architectures Continued• # of variations in input greater 

than # of training examples.• We now have sufficient 

computational power.• Unsupervised learning 

performed locally at each level.• Minimal supervised learning at 

the end.• Learn good properties and 

representations of images, then learn what combinations of these properties are called (labels).

Page 7: Deep Nets for image classification - Colorado School of Mines

Solution (Overview)

• Self taught learning with a sparse auto encoder.

• Convolution• Mean Pooling of features• Soft max Regression of pooled features for classification.

• Unsupervised Feature Learning and Deep Learning ‐ Stanford

Page 8: Deep Nets for image classification - Colorado School of Mines

Simple Neuron

Page 9: Deep Nets for image classification - Colorado School of Mines

Neural Network

• Hook up neurons so that output of a neuron goes into input of another.

• 3 input units,3 hidden units, 1 output unit.

• Notation – (x, a, w, b, l, h(x))

Page 10: Deep Nets for image classification - Colorado School of Mines

Neural Networks Activations

• x – Input• a – Activations• z – Total weighted sum 

of inputs and bias• W – parameters or 

weights associated with the connections between unit j in layer l, and unit i in layer l + 1.

• h(x) – Hypothesis, real number output. 

Page 11: Deep Nets for image classification - Colorado School of Mines

Forward Propagation

• a(1) = x

Page 12: Deep Nets for image classification - Colorado School of Mines

Bias Units

• Bias unit – enables activation function to be shifted as well as scaled.

Page 13: Deep Nets for image classification - Colorado School of Mines

Gradient Descent and Back Propagation

• Batch Gradient Descent.

• Try to minimize cost function J(W, b; x, y)

Page 14: Deep Nets for image classification - Colorado School of Mines

Gradient Descent

• Initialize weights (W) and bias’s (b) to small random values near zero.

• alpha = learning rate.• Back propagation is an 

efficient way to calculate the partial derivatives of J(W, b) 

Page 15: Deep Nets for image classification - Colorado School of Mines

Back Propagation

• Given a training example (x,y), run forward propagation to compute all the activations, including final hypothesis.

• Then for each neuron (i) in layer (L)  compute an error term delta that measures how responsible each node was for any errors in the output.

Page 16: Deep Nets for image classification - Colorado School of Mines

Back Propagation

• Perform feed forward pass to calculate activations

• For each output unit in the final output layer set the delta term. This is just the error between the output nodes and the true expected values.

• Work backwards from the output layer to the first hidden layer and set the delta terms. Weighted average of error terms that use a(L) as an input.

• Use these delta terms to calculate the partial derivatives of weights and biases.

Page 17: Deep Nets for image classification - Colorado School of Mines

Gradient Descent and Back Prop

• Set initial weights and biases to random values close to 0.

• Go through the training examples and use back propagation to compute the error terms (delta)

• Set the change in the weights and biases by adding their respective delta terms.

• Update the parameters, minimizing the error

Page 18: Deep Nets for image classification - Colorado School of Mines

Auto encoders

• Unsupervised training

• Set target values equal to the inputs (identity function)

Page 19: Deep Nets for image classification - Colorado School of Mines

Auto encoders with sparsity constraint

• Make sure that the average activation over the training set is constrained to p.

• Add extra penalty to the overall cost function – based on KL divergence of Bernoulli random variable.

Page 20: Deep Nets for image classification - Colorado School of Mines

Auto encoder continued• Auto encoders learn what input image would most likely cause an activation.

• Each hidden unit is now learning to look for certain features.

• Example of training  auto encoder on 10x10 whitened image patches, with 100 hidden units. 

Page 21: Deep Nets for image classification - Colorado School of Mines

Linear Decoders (Sparse Auto Encoder Variation)

• Some neurons use a different activation function.

• Sigmoid activation function constrains the range of inputs and outputs to [0,1]

• Linear activation function: Set a(3) = z(3) instead of a(3) = f(z(3)) for the output layer. (Identity function)

• Output is now linear function of hidden unit activations.

Page 22: Deep Nets for image classification - Colorado School of Mines

Simplified Gradients and Back propagation

• New activation function, so the gradients change for output units.

• y = x is the desired output.

• f(z) = z• f’(z) = 1• Hidden layer still uses the sigmoid activation f’(z(2))

Page 23: Deep Nets for image classification - Colorado School of Mines

ZCA Whitening• Goal is to make input data less redundant.

– Pixels are highly correlated to nearby pixels, and weakly correlated to faraway pixels. Similar model to how we think the biological eye processes images.

– Adjacent pixels will be perceived to have similar values, inefficient to transmit every single pixel separately.

• Not interested in the overall brightness, subtract mean value for normalization.

Page 24: Deep Nets for image classification - Colorado School of Mines

PCA and ZCA Whitening• Subtract the mean value of all 

patches from input patch.• Sigma – the covariance matrix 

since x has a 0 mean now.• Compute the eigenvectors of 

sigma using:– [U,S,V] = svd(sigma).– U = eigenvectors– S = eigenvalues– V is transpose(U)

• You can reduce the dimensionality by only considering the top (k) eigenvalues of the data.

Page 25: Deep Nets for image classification - Colorado School of Mines

Linear Decoder Implementation• Learn color image patch 

features, flatten intensities from each channel into vector.

• 100,000 8x8 random image patches from 13,000 96x96 color images. (Cats, dogs, deer, airplanes, birds, horses, monkeys, ships, trucks).

• 192 (8*8*3) input units• 400 hidden units• 192 output units• 0.035 sparsity parameter.

Page 26: Deep Nets for image classification - Colorado School of Mines

Convolution

• We have learned features over random 8x8 patches from large images.

• Convolve these feature detectors on a new large image.– This gives us different feature activation values at each location of the image.

• Run 8x8 window over 64x64 image to get sets of 57x57 convolved features (400 sets in our case).

Page 27: Deep Nets for image classification - Colorado School of Mines

Convolution Implementation

• Compute activations for every 8x8 patch in new image.

• Loop go through features, and convolve the image with the feature using matlabs conv2 function over “valid” region.

• Then run the resulting convolved image plus the bias for this feature through the sigmoid function to get the activations.

Page 28: Deep Nets for image classification - Colorado School of Mines

Pooling

• In theory we could run the convolved features right through a classifier – but this is computationally challenging.– 57*57*400 = 1,299,600 features per example.

• Aggregate statistics of features over windows.• Mean pooling or Max pooling• PoolDim = 19, so 3x3 pooling.

Page 29: Deep Nets for image classification - Colorado School of Mines

Classification

• We can now use the pooled for classification.• Soft max Regression

– Supervised– Similar to logistic regression (binary classification) but we can have multiple class labels.

– Compute the probability of a label given an input.

Page 30: Deep Nets for image classification - Colorado School of Mines

Soft max Regression

• No way to closed‐form way to solve for minimum of J(theta)

• Use gradient descent or L‐BFGS to solve for minimum.• Add weight decay parameter to guarantee convergence to unique solution.

Page 31: Deep Nets for image classification - Colorado School of Mines

Demo

Page 32: Deep Nets for image classification - Colorado School of Mines

Architecture Overview• Self taught learning with sparse auto encoder.– Preprocessed with ZCA whitening.

• Use learned features for convolution on large image.

• Pool convolutions to reduce dimensionality and and over fitting.

• Softmax regression for classification.

Example of self taught learning.

Page 33: Deep Nets for image classification - Colorado School of Mines

Layers of Depth

• Deep networks have multiple hidden layers– Remember our auto encoder had 1 hidden layer.

– You can stack auto encoders to achieve greater depth.

– Ditch the “decoding” layer and attach to next layer or classifier.

Page 34: Deep Nets for image classification - Colorado School of Mines

Questions?

Page 35: Deep Nets for image classification - Colorado School of Mines

Sources• http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial

• http://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10‐workshop‐tutorial‐final.pdf

• http://www.cs.toronto.edu/~kriz/learning‐features‐2009‐TR.pdf

• http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf

• http://www.cs.toronto.edu/~hinton/absps/ranzato_cvpr2011.pdf

• http://www.cs.toronto.edu/~hinton/science.pdf