OverFeat( - Stanford...
Transcript of OverFeat( - Stanford...
![Page 1: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/1.jpg)
OverFeat Integrated Recogni.on, Localiza.on and Detec.on using Convolu.onal Networks
Sermanet et. al
Presenta.on by Eric Holmdahl
![Page 2: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/2.jpg)
Roadmap 1. Goal 2. Background 3. Related Work 4. Algorithm Overview 5. Breakdown By Task
1. Classifica.on 2. Localiza.on 3. Detec.on
![Page 3: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/3.jpg)
Goal
Perform classifica.on, localiza.on, and detec.on on the ImageNet Dataset
![Page 4: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/4.jpg)
Classi3ication Determining what is the main object in an image
![Page 5: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/5.jpg)
Localization • Determining where an object is located in an image
![Page 6: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/6.jpg)
Detection • Performing localiza.on for all objects present in an image
![Page 7: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/7.jpg)
Background: Feed Forward Neural Networks
![Page 8: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/8.jpg)
Background: Convolutional Nets
• Alterna.ng convolu.on and max pooling layers feed into fully connected neural net
• Max pooling: with window size kxk, outputs highest intensity value in window size
• Convolu.on: Scanning window, shared weights within window
![Page 9: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/9.jpg)
Related Work
![Page 10: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/10.jpg)
Krizhevsky et. Al: ImageNet Classi-ication With Deep Convolutional Neural Networks
![Page 11: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/11.jpg)
Review: Krizhevsky Architecture • Large CNNs used to densely process images with overlapping windows
• ReLU Nonlinear neuron output • DropOut
![Page 12: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/12.jpg)
Krizhevksy Results • Brought CNNs to forefront of classifica.on/localiza.on/detec.on problem
![Page 13: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/13.jpg)
Giusti et. al: Fast Image Scanning With Deep Convolutional Networks
![Page 14: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/14.jpg)
Giusti Fast Scanning
• Problem: CNNs perform a great deal of redundant compu.ng of convolu.ons due to overlapping patches • Solu.on: Apply convolu.on to en.re image at once!
![Page 15: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/15.jpg)
Giusti et. al: Fast Image Scanning With Deep Convolutional Networks Convolu.onal Layer: Max Pooling Layer:
![Page 16: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/16.jpg)
Giusti et. al: Fast Image Scanning With Deep Convolutional Networks
![Page 17: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/17.jpg)
Giusti et. al: Results
Provides massive improvements in speed for sliding window CNNs!
![Page 18: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/18.jpg)
Algorithm Overview
![Page 19: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/19.jpg)
Algorithm Overview: Training
Train Classifier Train
Localiza.on Regressor
![Page 20: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/20.jpg)
Algorithm Overview: Training
Train Classifier Train
Localiza.on Regressor
![Page 21: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/21.jpg)
Algorithm Overview: Training
Train Classifier Train
Localiza.on Regressor
![Page 22: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/22.jpg)
Algorithm Overview: Training
Train Classifier Train
Localiza.on Regressor
• Input: Images with classifica.on and bounding box • Training objec.ve: Minimize l2 norm between generated bounding box and ground truth
• One regressor generated for each possible image class
• Output: (x,y) coordinates of top le[, top right corner of bounding box
![Page 23: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/23.jpg)
Algorithm Overview: Runtime
1. Perform classifica.on at each loca.on using trained CNN
![Page 24: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/24.jpg)
Algorithm Overview: Runtime
2. Perform localiza.on on all classified regions generated by classifier
![Page 25: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/25.jpg)
Algorithm Overview: Runtime
3. Merge bounding boxes with sufficient overlap from localiza.on and sufficient confidence of being same object from classifier
![Page 26: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/26.jpg)
Breakdown By Task
![Page 27: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/27.jpg)
Classi3ication
![Page 28: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/28.jpg)
OverFeat Feature Extraction • First 5 layers of Deep Convolu.onal Neural Net: similar to Krizhevsky’s
• Images downsampled to 256x256 • No contrast normaliza.on, non-‐overlapping pooling
![Page 29: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/29.jpg)
OverFeat Classi3ication: Dense Sliding Window
![Page 30: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/30.jpg)
Multi-‐Scale Classi3ication • Classifica.on performed at 6 scales at test .me, but only 1 scale at run.me
• Increases robustness of model
![Page 31: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/31.jpg)
ConvNets for Detection
● Single output: ○ ○ ○ ○ ○
1x1 output no feature space blue: feature maps green: operation kernel typical training setup
OverFeat • Pierre Sermanet • New York University
Classi3ication: CNNs and Sliding Windows
![Page 32: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/32.jpg)
ConvNets for Detection
● Multiple outputs: ○ ○ ○
2x2 output input stride 2x2 recompute only extra yellow areas
OverFeat • Pierre Sermanet • New York University
Classi3ication: CNNs and Sliding Windows
![Page 33: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/33.jpg)
ConvNets for Detection
● With feature space ○ ○ ○ ○ ○
3 input channels 4 feature maps 2 feature maps 4 feature maps 2 outputs (e.g. 2-class classifier)
OverFeat • Pierre Sermanet • New York University
Classi3ication: CNNs and Sliding Windows
![Page 34: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/34.jpg)
Classi3ication: Results
![Page 35: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/35.jpg)
Localization
![Page 36: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/36.jpg)
Training Localizer • Use same first 5 layers as trained classifier • Remove fully connected layers, replace with regressor • Train again on labeled input with bounding boxes
![Page 37: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/37.jpg)
Localization: Fully Connected Layers
![Page 38: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/38.jpg)
Localization: Bounding Boxes Produced By Regression
![Page 39: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/39.jpg)
Localization: Combing Predictions Algorithm:
![Page 40: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/40.jpg)
Localization: Results
![Page 41: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/41.jpg)
Detection
![Page 42: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/42.jpg)
ImageNet Challenge 2013
OverFeat • Pierre Sermanet • New York University
● Detection: ○ ○ ○ ○
200 classes Smaller objects than classification/localization Any number of objects (including zero) Penalty for false positives
![Page 43: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/43.jpg)
Differences Between Detection and Location
• Can now have many objects instead of just one • Penalized for incorrect guesses • Need to dis.nguish background from objects
![Page 44: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/44.jpg)
Training Detector
• Almost iden.cal to classifica.on/localiza.on training • New class added – background • Background class updated on the fly: extremely incorrect classifica.ons are used to train background class
![Page 45: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/45.jpg)
Detection Results
![Page 46: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/46.jpg)
Conclusion
OverFeat provides a way to extract powerful CNN based features for image classifica.on, localiza.on and detec.on with high speed and precision
![Page 47: OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets](https://reader033.fdocuments.in/reader033/viewer/2022042022/5e79957310a4e13fda2602b6/html5/thumbnails/47.jpg)
Thanks!