Context Encoders - Stanford...

Post on 25-Jun-2020

7 views 0 download

Transcript of Context Encoders - Stanford...

Context EncodersFeature Learning by InpaintingBy Pathak et al. (2016)

Photo: Live on the Edge Photography

Unsupervised Semantic Feature Learning

Intro Related Work Main Contributions Results Conclusion

More supervised

More semantic

ImageNet

Image Captioning

Learning to Generate Chairs

Image reconstruction

Semantic Inpainting

GAN

Image denoising

Context Prediction

OdometryPrediction

Inputs:

( , )

Task: Learn a

f( ) =

Semantic Inpainting

Intro Related Work Main Contributions Results Conclusion

Photo: Live on the Edge Photography

Semantic Inpainting+ For large regions, requires

semantics+ Unsupervised

- Ill-posed (not well-defined)

Intro Related Work Main Contributions Results Conclusion

Photo: Zhang et al (ECCV 2016)

Hypothesis Selection in Semantic InpaintingHow to choose between possibilities?

L2: Choose them all

Adversarial: Pick the most believable

Intro Related Work Main Contributions Results Conclusion

Photo: Pathak et al. (2016)

Related Work

Intro Related Work Main Contributions Results ConclusionIntro Related Work Main Contributions Results Conclusion

Unsupervised Semantic Feature Learning

More supervised

More semantic

ImageNet

Image reconstruction

Semantic Inpainting

GAN

Image Captioning

Image denoising

Learning to Generate Chairs

Context Prediction

OdometryPrediction

Visual Memex

Intro Related Work Main Contributions Results Conclusion

Visual MemexCreates graph of previously seen objects, and compares query image to graph

Intro Related Work Main Contributions Results Conclusion

Malisiewicz et al. (2009)

Unsupervised Semantic Feature Learning

More supervised

More semantic

ImageNet

Image reconstruction

Semantic Inpainting

GAN

Image Captioning

Image denoising

Learning to Generate Chairs

Context Prediction

OdometryPrediction

Intro Related Work Main Contributions Results Conclusion

Dosovitsky et al. (2015)

Intro Related Work Main Contributions Results Conclusion

Unsupervised Semantic Feature Learning

More supervised

More semantic

ImageNet

Image Captioning

Learning to Generate Chairs

Image reconstruction

Semantic Inpainting

GAN

Image denoising

Context Prediction

OdometryPrediction

Intro Related Work Main Contributions Results Conclusion

Autoencoders

Intro Related Work Main Contributions Results Conclusion

Shinya Yuki (2016)

Unsupervised Semantic Feature Learning

More supervised

More semantic

ImageNet

Image Captioning

Learning to Generate Chairs

Image reconstruction

Semantic Inpainting

GAN

Image denoising

Context Prediction

OdometryPrediction

Intro Related Work Main Contributions Results Conclusion

Context Prediction

Intro Related Work Main Contributions Results Conclusion

Doersch et al. (2016)

Unsupervised Semantic Feature Learning

More supervised

More semantic

ImageNet

Image reconstruction

Semantic Inpainting

GAN

Image Captioning

Image denoising

Learning to Generate Chairs

Context Prediction

OdometryPrediction

Intro Related Work Main Contributions Results Conclusion

Learning to See by Moving

Intro Related Work Main Contributions Results Conclusion

Agrawal et al. (2015)

Main Contributions

Intro Related Work Main Contributions Results Conclusion

Context Aware L210x scaled loss in context region,

Intro Related Work Main Contributions Results Conclusion

Inputs:

( , )

Random Patches

Intro Related Work Main Contributions Results Conclusion

AlexNet Architecture

Intro Related Work Main Contributions Results Conclusion

Channel-Wise Fully Connected

Followed by 1x1 convolution to propagate across channels

Intro Related Work Main Contributions Results Conclusion

100M → <0.4М

Context Encoder Architecture

Intro Related Work Main Contributions Results Conclusion

Context Encoder Architecture Continued

Intro Related Work Main Contributions Results Conclusion

GAN Objective:

Adversarial LossTerm:

Context EncoderObjective:

Results

Intro Related Work Main Contributions Results Conclusion

Feature Transfer Evaluation Methodology

Intro Related Work Main Contributions Results Conclusion

● Feature transfer capability evaluated on three tasks: a. Classification pretrainingb. Detection pretrainingc. Semantic Segmentation pretraining

● Compared against:a. Random weight initializationb. Autoencoder initializationc. Learning to see by moving (Agrawal et al.)d. Context prediction (Doersch et al.)e. Unsupervised learning with videos (Wang et al.)

Further Details

Intro Related Work Main Contributions Results Conclusion

● Classification○ Pascal VOC 2007 Dataset○ ~10000 images for training○ Output generated by voting from 10 random croppings of input image

● Detection○ Pascal VOC 2007 Detection Challenge Dataset ○ Fast R-CNN method (Girshick, 2015) used to generate detection hypotheses

from features● Segmentation

○ Pascal VOC 2012 Dataset ○ Fully convolutional network (FCN) (Shelhamer et al., 2015) used to generate

segmentation hypothesis from features

Intro Related Work Main Contributions Results Conclusion

Pretraining Results

Doersch et al. 65.3% 51.1%Modified

Intro Related Work Main Contributions Results Conclusion

Inpainting Results

Intro Related Work Main Contributions Results Conclusion

Encoded Features Nearest Neighbors

Recapitulation

Intro Related Work Main Contributions Results Conclusion

Intro Related Work Main Contributions Results Conclusion

Paper Contributions● Idea of using semantic inpainting as a supervisory signal for

unsupervised feature learning● Idea of using adversarial loss as a modular loss function that

can be combined with other losses● Qualitatively nice inpainting results

Intro Related Work Main Contributions Results Conclusion

Negatives of Paper● Seemed to be two “separate tasks”

a. Unsupervised feature learningb. Semantic inpainting

● No feature transfer results for context encoder● No results for how adversarial loss affects pre-trainability of

context encoder features● Worked on par with other pre-training methods

Semantic Inpainting

Feature Learning

Semantic Inpainting

Feature Learning