Intro to Deep Learning for Computer Vision

Applications of Deep Learningin Computer Vision

Christoph Körner

Outline

1) Introduction to Neural Networks

2) Deep Learning

3) Applications in Computer Vision

4) Conclusion

Why Deep Learning?

● Wins every computer vision challenge (classification, segmentation, etc.)

● Can be applied in various domains (speech recognition, game prediction, computer vision, etc.)

● Beats human accuracy● Big communities and resources● Hardware for Deep Learning

Perceptron (1958)

● Weighted sum of inputs● Threshold operator

Artificial Neural Network (1960)

● Universal function approximator● Can solve the XOR problem

Backpropagation (1982)

● Propagate the error through the network● Allows Optimization (SGD, etc.)● Enables training of multi-layer networks

Convolution and Pooling (1989)

● Less parameters than hidden layers● More efficient training

Handwritten ZIP Codes (1989)

● 30 training passes● Achieved 92% accuracy

What happened until 2011?

● Better Initialization● Better Non-linearities: ReLU● 1000 times more training data● More computing power

● Factor 1 million speedup in training time through parallelization on GPUs

Deep Learning

● Conv-, Pool- and Fully-Connected Layers● ReLU activations● Deep nested models with many parameters● New layer types and structures● New techniques to reduce overfitting● Loads of training data and compute power

● 10.000.000 images● Weeks of training on multi-GPU machines

AlexNet (2012)

● 62.378.344 parameters (250MB)● 24 layers

VGGNet (2013)


GoogLeNet (2014)


Inception Module

● Heavy use of 1x1 convolutions (applied along the depth dimension)

● Very efficient

ResNet (2015)

● Residual learning● 152 layers

Applications in Computer Vision

Classification

● One class per image● Softmax layer at the end

Localization

● Bounding box Regression● Sigmoid layer with 4 outputs at the end

● Via Classification

Detection

● Multiple Objects, multiple classes● Solved using multiple networks

Segmentation

More Applications

● Compression● Auto-encoders, Self-organizing maps

● Image Captioning● Solved with Recurrent Architecture

● Image Stylization● Clustering● Many more...

Conclusion

● Powerful, learn from data instead of hand-crafted feature extraction● Better than humans

● Deeper is always better● Overfitting

● More data is always better● Data quality● Ground truth

Thank you!

Christoph Körner

Intro to Deep Learning for Computer Vision

Technology

Transcript of Intro to Deep Learning for Computer Vision