Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic...
Transcript of Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic...
![Page 1: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/1.jpg)
Lecture 1: Introduction to Deep Learning
CSE599W: Spring 2018
![Page 2: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/2.jpg)
Lecturers
![Page 3: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/3.jpg)
ML Applications need more than algorithms
Learning Systems: this course
![Page 4: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/4.jpg)
What’s this course
● Not about Learning aspect of Deep Learning (except for the first two)
● System aspect of deep learning: faster training, efficient serving, lower memory consumption.
![Page 5: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/5.jpg)
Logistics
● Location/Date: Tue/Thu 11:30 am - 12:50pm MUE 153
● Join slack: https://uw-cse.slack.com dlsys channel
● We may use other time and locations for invited speakers.
● Compute Resources: AWS Education, instruction sent via email.
● Office hour by appointment
![Page 6: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/6.jpg)
Homeworks and Projects
● Two code assignments
● Group project○ Two to three person team○ Poster presentation and write-up
![Page 7: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/7.jpg)
A Crash Course on Deep Learning
![Page 8: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/8.jpg)
Elements of Machine Learning
Model
Objective
Training
![Page 9: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/9.jpg)
What’s Special About Deep Learning
CompositionalModel
End to End Training
layer1exractor
layer2extractor
predictor
![Page 10: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/10.jpg)
Ingredients in Deep Learning
● Model and architecture
● Objective function, training techniques ○ Which feedback should we use to guide the algorithm?○ Supervised, RL, adversarial training.
● Regularization, initialization (coupled with modeling)○ Dropout, Xavier
● Get enough amount of data
![Page 11: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/11.jpg)
Major Architectures
Image ModelingConvolutional Nets
Language/SpeechRecurrent Nets
![Page 12: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/12.jpg)
Image Modeling and Convolutional Nets
![Page 13: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/13.jpg)
Breakthrough of Image Classification
![Page 14: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/14.jpg)
Evolution of ConvNets• LeNet (LeCun, 1998)
– Basic structures: convolution, max-pooling, softmax• Alexnet (Krizhevsky et.al 2012)
– ReLU, Dropout• GoogLeNet (Szegedy et.al. 2014)
– Multi-independent pass way (Sparse weight matrix)• Inception BN (Ioffe et.al 2015)
– Batch normalization• Residual net (He et.al 2015)
– Residual pass way
![Page 15: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/15.jpg)
Fully Connected Layer
Output
Input
![Page 16: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/16.jpg)
Convolution = Spatial Locality + Sharing
Spatial Locality
Without Sharing
With Sharing
![Page 17: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/17.jpg)
Convolution with Multiple Channels
Source: http://cs231n.github.io/convolutional-networks/
![Page 18: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/18.jpg)
Pooling Layer
Source: http://cs231n.github.io/convolutional-networks/
Can be replaced by strided convolution
![Page 19: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/19.jpg)
LeNet (LeCun 1998)
• Convolution• Pooling• Flatten• Fully connected• Softmax output
![Page 20: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/20.jpg)
AlexNet (Krizhevsky et.al 2012)
![Page 21: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/21.jpg)
Challenges: From LeNet to AlexNet
● Need much more data: ImageNet● A lot more computation burdens: GPU
● Overfitting prevention○ Dropout regularization
● Stable initialization and training○ Explosive/vanishing gradient problems○ Requires careful tuning of initialization and data normalization
![Page 22: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/22.jpg)
ReLU Unit
• ReLU
• Why ReLU?– Cheap to compute– It is roughly linear..
![Page 23: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/23.jpg)
Dropout Regularization
Dropout Mask
● Randomly zero out neurons with probability 0.5
● During prediction, use expectation value (keep all neurons but scale output by 0.5)
![Page 24: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/24.jpg)
Dropout Regularization
Dropout Mask
● Randomly zero out neurons with probability 0.5
● During prediction, use expectation value (keep all neurons but scale output by 0.5)
![Page 25: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/25.jpg)
GoogleNet: Multiple Pathways, Less Parameters
![Page 26: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/26.jpg)
Vanishing and Explosive Value Problem
● Imagine each layer multiplies Its input by same weight matrix
○ W > 1: exponential explosion○ W < 1: exponential vanishing
● In ConvNets, the weight are not tied, but their magnitude matters
○ Deep nets training was initialization sensitive
![Page 27: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/27.jpg)
Batch Normalization: Stabilize the Magnitude
• Subtract mean• Divide by standard deviation• Output is invariant to input scale!
– Scale input by a constant– Output of BN remains the same
• Impact– Easy to tune learning rate– Less sensitive initialization
(Ioffe et.al 2015)
![Page 28: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/28.jpg)
The Scale Normalization (Assumes zero mean)
ScaleNormalization
Invariance to Magnitude!
![Page 29: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/29.jpg)
Residual Net (He et.al 2015)
● Instead of doing transformationadd transformation result to input
● Partly solve vanishing/explosivevalue problem
![Page 30: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/30.jpg)
Evolution of ConvNets• LeNet (LeCun, 1998)
– Basic structures: convolution, max-pooling, softmax• Alexnet (Krizhevsky et.al 2012)
– ReLU, Dropout• GoogLeNet (Szegedy et.al. 2014)
– Multi-independent pass way (Sparse weight matrix)• Inception BN (Ioffe et.al 2015)
– Batch normalization• Residual net (He et.al 2015)
– Residual pass way
![Page 31: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/31.jpg)
More Resources
● Deep learning book (Goodfellow et. al)
● Stanford CS231n: Convolutional Neural Networks for Visual Recognition
● http://dlsys.cs.washington.edu/materials
![Page 32: Lecture 1: Introduction to Deep LearningEvolution of ConvNets • LeNet (LeCun, 1998) – Basic structures: convolution, max-pooling, softmax • Alexnet (Krizhevsky et.al 2012) –](https://reader030.fdocuments.in/reader030/viewer/2022041115/5f26bf74421c4b2b0840bb16/html5/thumbnails/32.jpg)
Lab1 on Thursday
● Walk through how to implement a simple model for digit recognition using MXNet Gluon
● Focus is on data I/O, model definition and typical training loop● Familiarize with typical framework APIs for vision tasks
● Before class: sign up for AWS educate credits ● https://aws.amazon.com/education/awseducate/apply/● Create AWS Educate Starter Account to avoid getting charged● Will email out instructions, but very simple to DIY, so do it today!