Deep Learning in theano
-
Upload
massimo-quadrana -
Category
Engineering
-
view
556 -
download
3
Transcript of Deep Learning in theano
Deep Learning in TheanoMassimo Quadrana
PhD Student @ Politecnico di MilanoResearch Intern @ Telefonica I+D
[email protected] @mxqdrOriginal slides are available here: https://goo.gl/VLYsnR
Before startingOS: Linux / Mac OS (sorry Windows guys :) )
Required software:
python 2.7x, git, openblas
Optional software (for faster math and better packages/virtualenv support):
Anaconda (https://www.continuum.io/downloads)
Anaconda Intel MKL (free student licence) (https://www.continuum.io/anaconda-academic-subscriptions-available)
Before startingOpen your terminal and create a new virtualenv
> virtualenv -p /usr/bin/python2.7 theano-env
Activate the virtualenv
> source theano-env/bin/activate
Install the Theano package with dependences
> pip install Theano
(To exit the the virtualenv)
> deactivate
Before startingTo check if your Theano env is correctly configured run the following
python -c 'import theano'
It should complete without errors
Before startingGet the lab code here
> git clone https://github.com/mquad/DNN_Lab_UPF
Structure of the repo:
● exercise/: directory with the code for the lab (it won’t run)● complete/: directory with the code completed with the missing parts (it should
run :-) )● notebooks/: some jupyther notebooks to show you some cool stuff
If you spot any error, or you have any feature request, open a new issue. I’ll do my best to maintain the repo up-to-date :-)
OutlineImage classification
● Logistic Regression● “Modern” Multi-layer NN● Convolutional Neural networks
Sequence Modeling
● Character Based RNN
Open your editor and write the following. Save it as example_mul.py, then run python example_mul.py
Theano intro
The official documentation:
http://deeplearning.net/software/theano/index.html
Theano intro
MNIST Dataset60000 grayscale images (28 x 28 pixels each)
10 classes
8
Inputs Computation Outputs
Model
Logistic Regression on MNIST
0.1
T.dot(X, W) + b
softmax(X)
0. 0.10. 0.0. 0.0. 0.10.7
Zero One Two Three Four Five Six Seven Eight Nine
Logistic Regression on MNISTOpen exercise/logreg_raw.py
Many parts have already been coded for you (library import, data import and split, evaluation)
Write the code for the Logistic Regression classifier
LogReg: input vars and model parameters
Shared variables in Theano maintain their state across functions.
Use them to store your model’s parameters.
If execute on GPU, shared variables are stored into the GPU memory for faster access.
LogReg: model and cost function
Softmax: generalization of sigmoid over multiple classes
Predicted class: class with maximum expected probability
LogReg: model and cost function
Cross-entropy loss
y one-hot encoding of the correct class of input features x (y_i = 1 iif class of x is i)
here we keep y integer, and use indexing of y_hat to save computations
Note: average loss over the minibatch (the cost must be scalar)
LogReg: SGD
T.grad() does the automatic differentiation of the loss function
updates tells Theano how to update the model (shared) parameters (it can be a list of tuples, a dict or OrderedDict)
LogReg: Training, Loss and Predictions
LogReg: Softmax
exp function can easily overflow: subtract by the maximum x value to get more stable results (without any effects on correctness)
LogReg: File logreg.py contains a cleaner version of the Logistic Regression classifier.
init(): defines model parameters
model(): defines our model
fit() and predict(): fits the model on training data and predict the class given new data
Logistic Regression on MNIST
0.1
T.dot(X, w)
softmax(X)
0. 0.10. 0.0. 0.0. 0.10.7
Zero One Two Three Four Five Six Seven Eight Nine
Test accuracy: ~92%
“Modern” multi-layer network0.0
h0 = relu(T.dot(X, Wh0) + b0)
y = softmax(T.dot(h1, Wy) + by)
0. 0.10. 0.0. 0.0. 0.0.9
Zero One Two Three Four Five Six Seven Eight Nine
h1 = relu(T.dot(h0, Wh1) + b1)
Noise
Noise
Noise(or augmentation)
“Modern” multi-layer networkOpen and complete mlp.py. The missing parts are:
● init(): initialize the MLP parameters● model(): define the model using dropouts● dropout(): apply dropout to the input● apply_momentum(): apply momentum over the given updates
MLP: init()
MLP: model()
MLP: dropout()
MLP: apply_momentum()
“Modern” multi-layer network0.0
h0 = relu(T.dot(X, Wh0) + b0)
y = softmax(T.dot(h1, Wy) + by)
0. 0.10. 0.0. 0.0. 0.0.9
Zero One Two Three Four Five Six Seven Eight Nine
h1 = relu(T.dot(h0, Wh1) + b1)
Noise
Noise
Noise(or augmentation)
Test accuracy: ~98%
Convolutional Neural Networks
from deeplearning.net
CNNs in TheanoOpen convnet.py and complete the following parts
● get_conv_output_shape(): compute the output shape of the convolutional layer
● init(): complete the initialization of the convolutional filters● model(): define entirely the cnn model● adagrad(): define the update rules for adagrad● rmsprop(): define the update rules for rmsprop (easy if you do adagrad first)
Dealing with ConvolutionsInputs have 3 dimensions:
width, height (spatial dimensions W)
and depth
Convolutions are
● local in width and height (receptive field F)● full in depth
Dealing with ConvolutionsConvolution hyper-parameters
● depth: number of neurons connected to the same input region● stride: space btw depth columns in the spatial dimensions● padding: how to treat borders (not covered in the examples)
The spatial size of the output volume is given by the formula
(W - F + 2P) / S + 1
Our CNN (variation of LeNet5)
INPUT, CONV(5,5)*, MAX POOL, CONV(5,5)*, MAX POOL, FC*
*The actual number of filters and Fully Connected layers is programmable
CNNs: get_conv_output_volume()
We don’t consider padding for simplicity
CNNs: init()First CONV(5,5), MAX POOL
CNNs: init()Analogously for the second CONV(5,5), MAX POOL
CNNs: model()
CNNs: adagrad()
CNNs: rmsprop()
Convolutional Neural Networks
from deeplearning.net
Test accuracy: 99.5%
SGD/Adagrad/Rmsprop in training convnets
Recurrent Neural Networks
Recurrent Neural NetworksOpen char_rnn/char_rnn_vanilla.py and complete the following
● init(): define and initialize the parameters of the Vanilla RNN● model(): compute the updates of hidden states of the RNN● model_sample(): compute the updates of the hidden state of the RNN after
only one step
RNN: init()
RNN: model()theano.scan() defines symbolic loops in Theano.
It has 4 main arguments (+ several additional ones):
● fn: function to be applied at every iteration● sequences: variables scan has to iterate over (iteration is done over the first
dimension of each variable)● outputs_info: initial state of the of the outputs computed recurrently● non_sequences: list of additional arguments passed to fn
At each iteration, fn receives the parameters in the following order:
sequences (if any), outputs_info (if needed), non_sequences (if any)
RNN: model()
RNN: model_sample()
RNN - LSTM
RNN - LSTMUnder the complete/ folder you have the code for the LSTM version of char-rnn
● char_rnn_lstm.py: standard LSTM● char_rnn_lstm_fast.py: fast LSTM, makes better usage of vectorized
operations (>2x faster)● sampler.py: to sample from your RNN
EXERCISE: They differ from VanillaRNN in their init(), model(), model_sample() and sampler() methods. Try to figure out how to pass from one model to the other.
Additional remarksHow to choose the optimal hyperparameters of my DNN?
● Grid search (overly expensive)● Bayesian Optimization (effective but quite complex)● Random search (cheap, effective and easy to implement)
Check out mlp_opt.py to run random
hyperparameter search for the MLP.
EXERCISE: Try with CNNs, RNNs
Additional Remarks (2)Packages worth checking
● Built on-top of Theano: Lasagne, Keras● Standalone packages: Caffe (Berkely), Tensorflow (Google), CNTK
(Microsoft)
Repositories
● gitxiv.com
CreditsThe slides and code used in this lab were inspired by some great works done by some great Deep Learning researchers
Alec Recford’s slides: “Introduction to Deep Learning with Python”, http://www.slideshare.net/indicods/general-sequence-learning-with-recurrent-neural-networks-for-next-ml
Andrey Karpathy’s blog post “The unreasonable effectiveness of Recurrent Neural Networks”, http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Andrey Karpathy’s char-rnn repo, https://github.com/karpathy/char-rnn