Spatial transformer networks

27
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu [arXiv ] [GitXiv ] [slides ] [video ] Spatial transformer networks Slides by Víctor Campos [GDoc ] Computer Vision Reading Group (28/03/2016)

Transcript of Spatial transformer networks

1.IntroductionWhy do we need spatial transformer networks?

Why do we needSpatial transformer networks?

Are Convolutional Neural Networks invariant to…

▪ Scale?▪ Rotation?▪ Translation?

Why do we needSpatial transformer networks?

CS231n: Convolutional Neural Networks for Visual Recognition (Stanford)

Why do we needSpatial transformer networks?

Are Convolutional Neural Networks invariant to…

▪ Scale? No▪ Rotation? ▪ Translation?

Why do we needSpatial transformer networks?

Are Convolutional Neural Networks invariant to…

▪ Scale? No▪ Rotation? No▪ Translation?

Why do we needSpatial transformer networks?

Are Convolutional Neural Networks invariant to…

▪ Scale? No▪ Rotation? No▪ Translation? Partially

Why do we needSpatial transformer networks?

A. W. Harley, "An Interactive Node-Link Visualization of Convolutional Neural Networks," in ISVC, pages 867-877, 2015

2.Spatial transformersUnderstanding how they work

Intuition behindSpatial transformers

Intuition behindSpatial transformers

Intuition behindSpatial transformers

Sampling!

FormulatingSpatial transformers

Three main differentiable blocks:▪ Localisation network▪ Grid generator▪ Sampler

Grid generator:Examples

Affine transform Attention model

- coordinates in the target (output) feature map

- coordinates in the source (input) feature map

Sampler:Mathematical formulation

Generic sampling kernel

From the grid generator

All pixels in the output feature map

3.ExperimentsEvaluating spatial transformer networks

Experiment #1:Distorted MNIST

Experiment #1:Distorted MNIST

Distortions: Rotation, Translation, Projective, ElasticTransformations: Affine, Projective, Thin Plate Spline (TPS)

Experiment #2:Multiple spatial transformers

Experiment #2:Adding two digits in an image

Experiment #2:Adding two digits in an image

Experiment #2:Adding two digits in an image

Other experiments:Applications of spatial transformers

▪ Street View House Numbers

▪ Fine-grained classification

4.Conclusions

Conclusions:Spatial transformer networks

▪ A module that performs spatial transformations to features has been presented

▪ It is differentiable and learnt in an end-to-end fashion▪ No modifications to the loss function are needed▪ Outperforms the state-of-the-art performance in some tasks

Thanks!Any question?