Intro to Deep Learning for Computer Vision
-
Upload
christoph-koerner -
Category
Technology
-
view
218 -
download
0
Transcript of Intro to Deep Learning for Computer Vision
![Page 1: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/1.jpg)
Applications of Deep Learningin Computer Vision
Christoph Körner
![Page 2: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/2.jpg)
Outline
1) Introduction to Neural Networks
2) Deep Learning
3) Applications in Computer Vision
4) Conclusion
![Page 3: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/3.jpg)
Why Deep Learning?
● Wins every computer vision challenge (classification, segmentation, etc.)
● Can be applied in various domains (speech recognition, game prediction, computer vision, etc.)
● Beats human accuracy● Big communities and resources● Hardware for Deep Learning
![Page 4: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/4.jpg)
Perceptron (1958)
● Weighted sum of inputs● Threshold operator
![Page 5: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/5.jpg)
Artificial Neural Network (1960)
● Universal function approximator● Can solve the XOR problem
![Page 6: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/6.jpg)
Backpropagation (1982)
● Propagate the error through the network● Allows Optimization (SGD, etc.)● Enables training of multi-layer networks
![Page 7: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/7.jpg)
Convolution and Pooling (1989)
● Less parameters than hidden layers● More efficient training
![Page 8: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/8.jpg)
Handwritten ZIP Codes (1989)
● 30 training passes● Achieved 92% accuracy
![Page 9: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/9.jpg)
What happened until 2011?
● Better Initialization● Better Non-linearities: ReLU● 1000 times more training data● More computing power
● Factor 1 million speedup in training time through parallelization on GPUs
![Page 10: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/10.jpg)
Deep Learning
● Conv-, Pool- and Fully-Connected Layers● ReLU activations● Deep nested models with many parameters● New layer types and structures● New techniques to reduce overfitting● Loads of training data and compute power
● 10.000.000 images● Weeks of training on multi-GPU machines
![Page 11: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/11.jpg)
AlexNet (2012)
● 62.378.344 parameters (250MB)● 24 layers
![Page 12: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/12.jpg)
VGGNet (2013)
● 102.908.520 parameters (412MB)● 23 layers
![Page 13: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/13.jpg)
GoogLeNet (2014)
● 6.998.552 parameters (28MB)● 143 layers
![Page 14: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/14.jpg)
Inception Module
● Heavy use of 1x1 convolutions (applied along the depth dimension)
● Very efficient
![Page 15: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/15.jpg)
ResNet (2015)
● Residual learning● 152 layers
![Page 16: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/16.jpg)
Applications in Computer Vision
![Page 17: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/17.jpg)
Classification
● One class per image● Softmax layer at the end
![Page 18: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/18.jpg)
Localization
● Bounding box Regression● Sigmoid layer with 4 outputs at the end
● Via Classification
![Page 19: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/19.jpg)
Detection
● Multiple Objects, multiple classes● Solved using multiple networks
![Page 20: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/20.jpg)
Segmentation
![Page 21: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/21.jpg)
More Applications
● Compression● Auto-encoders, Self-organizing maps
● Image Captioning● Solved with Recurrent Architecture
● Image Stylization● Clustering● Many more...
![Page 22: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/22.jpg)
Conclusion
● Powerful, learn from data instead of hand-crafted feature extraction● Better than humans
● Deeper is always better● Overfitting
● More data is always better● Data quality● Ground truth
![Page 23: Intro to Deep Learning for Computer Vision](https://reader034.fdocuments.in/reader034/viewer/2022051101/58720de81a28ab176b8b7e81/html5/thumbnails/23.jpg)
Thank you!
Christoph Körner