for Image Segmentation - aims.robots.ox.ac.uk · different testimage,as one-shotlearning. Here we...

1
Deep Image Prior for Image Segmentation Yuki M. Asano, Christian Rupprecht and Andrea Vedaldi. University of Oxford. [email protected] Introduction The paper [2] shows how a convolutional neural network itself is a strong prior model on natural images and that it can be used for image denoising and inpainting. We extend this work to find out how much the architecture `naturally’ captures for the task of image segmentation. This can be either seen as inpainting for segmentation labels or, given a different test image, as one-shot learning. Here we show work-in-progress where we give the network only a sparse, input of the labels on which the loss will be computed and give the full-RGB image (see Fig. 1). We compare our results with the unsupervised GrabCut algorithm [1]. Method & Data GrabCut DIPSEG (wo/RGB) DIPSEG (w/RGB) mIOU: 50.55% mIOU: 53.92% RGB Ground Truth Sampled points (here: 185) CNN (skip-net) Random segment. map pixels Loss = MSE(RGB) + Cross-Entropy (mask Segmap) 80.56% 82.63% 84.05% 86.86% Ground Truth Prediction Reconstruction RGB Loss Epoch 0 Epoch 40 Epoch 200 Pasval VOC 2012: images with segmentations for 20 different objects. We sample different number of randomly chosen pixels per class (uniform or three `fat’, spread-out blops) to train the neural network using the RGB loss and the cross-entropy of the predicted label at these sampled points. Our Method: Architecture as in [2] with different final number of output dimensions to predict the presence of a certain class at any pixel: Deep Image Prior for Image Segmentation (DIPSEG). Comparison [1]: Unsupervised object segmentation extraction algorithm based on Markov random fields. Conclusion and next steps 1. Feasibility and superior performance of the DIPSEG method compared to GrabCut 2. Good segmentation performance with relatively few input pixels that can be uniformly sampled or `fat’ 3. Fat blops allow the method to be used for quickly generating segmentation maps from rough, human drawing, e.g. from Amazon Mechanical Turk Next steps: Explore space of different architectures and data augmentation, use bounding boxes [1] Rother, C., Kolmogorov, V., and Blake, A. "GrabCut". ACM Transactions on Graphics 23, 3 (2004), 309. [2] Ulyanov, D., Vedaldi, A., and Lempitsky, V. Deep Image Prior. CVPR 2017.

Transcript of for Image Segmentation - aims.robots.ox.ac.uk · different testimage,as one-shotlearning. Here we...

Page 1: for Image Segmentation - aims.robots.ox.ac.uk · different testimage,as one-shotlearning. Here we show work-in-progress where we give the network only a sparse, input of the labels

DeepImagePriorforImageSegmentation

Yuki M. Asano, Christian Rupprecht and Andrea Vedaldi. University of Oxford. [email protected]

IntroductionThe paper [2] shows how a convolutional neuralnetwork itself is a strong prior model on naturalimages and that it can be used for image denoisingand inpainting.We extend this work to find out how much thearchitecture `naturally’ captures for the task ofimage segmentation. This can be either seen asinpainting for segmentation labels or, given adifferent test image, as one-shot learning.Here we show work-in-progress where we give thenetwork only a sparse, input of the labels on whichthe loss will be computed and give the full-RGB image(see Fig. 1). We compare our results with theunsupervised GrabCut algorithm [1].

Method & Data

Conclusion/Next Steps• DL builds on concepts x

GrabCut DIPSEG(wo/RGB)

DIPSEG(w/RGB)

mIOU:50.55%

mIOU:53.92%

RGB GroundTruth Sampledpoints(here:185)

CNN(skip-net)Randomsegment.mappixels

Loss=MSE(RGB)+Cross-Entropy(mask⊙ Segmap)

80.56% 82.63%

84.05% 86.86%

GroundTruth PredictionReconstructionRGB Loss

Epoch0

Epoch40

Epoch200

Pasval VOC 2012: images with segmentations for 20different objects. We sample different number ofrandomly chosen pixels per class (uniform or three`fat’, spread-out blops) to train the neural networkusing the RGB loss and the cross-entropy of thepredicted label at these sampled points.

Our Method: Architecture as in [2] with different finalnumber of output dimensions to predict the presenceof a certain class at any pixel: Deep Image Prior forImage Segmentation (DIPSEG).

Comparison [1]: Unsupervised object segmentationextraction algorithm based on Markov randomfields.

Conclusion and next steps1. Feasibility and superior performance of the DIPSEG method compared to GrabCut

2. Good segmentation performance with relatively few input pixels that can be uniformly sampled or `fat’

3. Fat blops allow the method to be used for quickly generating segmentation maps from rough, human drawing, e.g. from Amazon Mechanical Turk

Next steps:Explore space of different architectures and data augmentation, use bounding boxes

[1]Rother,C.,Kolmogorov,V.,andBlake,A."GrabCut".ACMTransactionsonGraphics23,3(2004),309.[2]Ulyanov,D.,Vedaldi,A.,andLempitsky,V.DeepImagePrior.CVPR2017.