Spatial transformer networks

Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu[arXiv] [GitXiv] [slides] [video]

Spatial transformer networks

Slides by Víctor Campos [GDoc]Computer Vision Reading Group (28/03/2016)

http://arxiv.org/abs/1506.02025

http://gitxiv.com/posts/5WTXTLuEA4Hd8W84G/spatial-transformer-networks

http://lear.inrialpes.fr/workshop/allegro2015/slides/zisserman02.pdf

https://www.youtube.com/watch?v=Ywv0Xi2-14Y

https://docs.google.com/presentation/d/1lXUnecjeGJh2xnYUR1phr8ivt5ms5DUlJHeq_1bNWuM/edit?usp=sharing

https://imatge.upc.edu/web/teaching/computer-vision-reading-group

https://imatge.upc.edu/web/teaching/computer-vision-reading-group

1.IntroductionWhy do we need spatial transformer networks?

Why do we needSpatial transformer networks?

Are Convolutional Neural Networks invariant to…

▪ Scale?▪ Rotation?▪ Translation?


CS231n: Convolutional Neural Networks for Visual Recognition (Stanford)

http://cs231n.github.io/convolutional-networks/

http://cs231n.stanford.edu/index.html



▪ Scale? No▪ Rotation? ▪ Translation?



▪ Scale? No▪ Rotation? No▪ Translation?



▪ Scale? No▪ Rotation? No▪ Translation? Partially


A. W. Harley, "An Interactive Node-Link Visualization of Convolutional Neural Networks," in ISVC, pages 867-877, 2015

http://scs.ryerson.ca/~aharley/vis/conv/

http://scs.ryerson.ca/~aharley/vis/conv/

2.Spatial transformersUnderstanding how they work

Intuition behindSpatial transformers

Intuition behindSpatial transformers

Sampling!

FormulatingSpatial transformers

Three main differentiable blocks:▪ Localisation network▪ Grid generator▪ Sampler

Grid generator:Examples

Affine transform Attention model

- coordinates in the target (output) feature map

- coordinates in the source (input) feature map

Sampler:Mathematical formulation

Generic sampling kernel

From the grid generator

All pixels in the output feature map

3.ExperimentsEvaluating spatial transformer networks

Experiment #1:Distorted MNIST

Experiment #1:Distorted MNIST

Distortions: Rotation, Translation, Projective, ElasticTransformations: Affine, Projective, Thin Plate Spline (TPS)

Experiment #2:Multiple spatial transformers

Experiment #2:Adding two digits in an image

Other experiments:Applications of spatial transformers

▪ Street View House Numbers

▪ Fine-grained classification

4.Conclusions

Conclusions:Spatial transformer networks

▪ A module that performs spatial transformations to features has been presented

▪ It is differentiable and learnt in an end-to-end fashion▪ No modifications to the loss function are needed▪ Outperforms the state-of-the-art performance in some tasks

http://www.youtube.com/watch?v=Ywv0Xi2-14Y

Thanks!Any question?

Spatial transformer networks

Technology

Transcript of Spatial transformer networks