Structured Models for Image Segmentationpeople.inf.ethz.ch/alucchi/talks/Berkeley_130213.pdf ·...

Post on 25-Feb-2021

2 views 0 download

Transcript of Structured Models for Image Segmentationpeople.inf.ethz.ch/alucchi/talks/Berkeley_130213.pdf ·...

1

Structured Models for Image Segmentation

Aurelien Lucchi

Wednesday February 13th, 2013

Joint work with Yunpeng Li, Kevin Smith, Raphael Sznitman, Radhakrishna Achanta, Bohumil Maco,

Graham Knott, Pascal Fua.

2

Image Segmentation

● Goal: partition an image into meaningful regions with respect to a particular application.

3

Image Segmentation

● Goal: partition an image into meaningful regions with respect to a particular application.

4

Understanding the Brain

5

Electron Microscopy Data

5 × 5 × 5 μm section taken from the CA1 hippocampus, corresponding to a 1024 × 1024 × 1000 volume (N ≈ 109 total voxels)

● Human brain contains ~100 billion (1011) neurons and 100 trillion (1014) synapses.

6

Image Segmentation

Statistics

7

Image Segmentation

Featureextraction

ClassificationStructuredprediction

ground-truth

8

Image Segmentation

Featureextraction

ClassificationStructuredprediction

ground-truth

9

Outline

1. CRF for Image segmentation

2. Maximum Margin Training for CRFs - Cutting Plane (Structured SVM)

3. Maximum Margin Training of CRFs - Online Subgradient Descent (SGD)

4. SLIC superpixels/supervoxels

10

1. CRF for Image Segmentation

11

Structured Prediction

● Non structured output

● inputs X can be any kind of objects● output y is a real number

● Prediction of complex outputs

● Structured output y is complex (images, text, audio...)● Ad hoc definition of structured data: data that consists of several parts, and not

only the parts themselves contain information, but also the way in which the parts belong together

Slide courtesy: Christoph Lampert

12

Structured Prediction for Images

Histograms, Filter responses, ...

13

CRF for Image Segmentation

Pair-wise Terms MAP SolutionUnary likelihoodData (D)

Maximum-a-posteriori (MAP) solution :

Boykov and Jolly [ICCV 2001], Blake et al. [ECCV 2004]Slide courtesy : Pushmeet Kohli

14

CRF for Image Segmentation

Pair-wise Terms MAP SolutionUnary likelihoodData (D)

Maximum-a-posteriori (MAP) solution :

Boykov and Jolly [ICCV 2001], Blake et al. [ECCV 2004]Slide courtesy : Pushmeet Kohli

15

CRF for Image Segmentation

Pair-wise Terms

Favors the same label for neighboring nodes.

16

CRF for Image Segmentation

Pair-wise Terms MAP SolutionUnary likelihoodData (D)

Maximum-a-posteriori (MAP) solution :

Boykov and Jolly [ICCV 2001], Blake et al. [ECCV 2004]Slide courtesy : Pushmeet Kohli

17

Energy Minimization

● MAP inference for discrete graphical models:

● Dynamic programming– Exact on non loopy graphs

● Graph-cuts (Boykov, 2001)– Optimal solution if energy function is submodular

● Belief propagation (Pearl, 1982)– No theoretical guarantees on loopy graphs but seems to work well in

practice.● Mean field (root in statistical physics)● ...

18

Training a Structured Model

● First rewrite the energy function as:

where w is a vector of parameters to be learned from training data and is a joint feature map to map the input-output pair into a linear feature space.

Log-linear model

19

Training a Structured Model

● Energy function is parametrized by vector w

-1 1

-1 ? ?

1 ? ?+

20

Training a Structured Model

● Energy function is parametrized by vector w

-1 1

-1 0 1

1 1 0+

Low energy

High energy

21

Training a Structured Model

● Maximum likelihood● Pseudo-likelihood● Variational approximation● Contrastive divergence● Maximum-margin framework

22

2. Maximum Margin Training for CRFs - Cutting Plane (Structured SVM)

23

Structured SVM

● Given a set of N training examples with ground truth labels , we can write

Energy for the correct labeling at least as low as energy of any incorrect labeling.

24

Structured SVM

E( )

E( )

E( )

E( )

<ground-truth

25

Structured SVM

● Given a set of N training examples with ground truth labellings we optimize :

26

Structured SVM

● In order to deal with the exponential number of constraints, Tsochantaridis* proposed a cutting plane algorithm.● Iteratively finds the most violated constraint and

adds it to the working set of constraints.

* I. Tsochantaridis et al., Support vector machine learning for interdependent and structured output spaces, ICML, 2004.

Loss-augmented inference

27

Illustrative Example

SSVM Problem● Exponential constraints● Most are dominated by a small set of

“important” constraints

Cutting plane approach● Repeatedly finds the next most

violated constraint…● …until set of constraints is a good

approximation.

An Introduction to Structured Output Learning Using Support Vector MachinesYisong Yue, Thorsten Joachims

28

Illustrative Example

SSVM Problem● Exponential constraints● Most are dominated by a small set of

“important” constraints

Cutting plane approach● Repeatedly finds the next most

violated constraint…● …until set of constraints is a good

approximation.

An Introduction to Structured Output Learning Using Support Vector MachinesYisong Yue, Thorsten Joachims

29

Illustrative Example

SSVM Problem● Exponential constraints● Most are dominated by a small set of

“important” constraints

Cutting plane approach● Repeatedly finds the next most

violated constraint…● …until set of constraints is a good

approximation.

An Introduction to Structured Output Learning Using Support Vector MachinesYisong Yue, Thorsten Joachims

30

Illustrative Example

SSVM Problem● Exponential constraints● Most are dominated by a small set of

“important” constraints

Cutting plane approach● Repeatedly finds the next most

violated constraint…● …until set of constraints is a good

approximation.

An Introduction to Structured Output Learning Using Support Vector MachinesYisong Yue, Thorsten Joachims

31

Drawbacks of SSVM

● Finding the most violated constraint at each iteration of the cutting plane is intractable in loopy graphical models.

● Approximations can sometimes be imprecise enough to have a major impact on learning● An unsatisfactory constraint can cause the cutting

plane algorithm to prematurely terminate.

32

3. Maximum Margin Training of Structured Models: Online Subgradient

Descent (SGD)

33

SGD Approach

● Reformulate the problem as an unconstrained optimization:

● SGD approach : compute and step in the negative direction of a sub-gradient of

● See N. Ratliff et al., (Online) Subgradient Methods for Structured Prediction. In AISTATS, 2007.

34

Subgradient

● A subgradient for the convex loss function L at w is defined as a vector g, such that:

w

● Set of all subgradients is called the subdifferential.

● If L is differentiable at w, then g is the gradient .

35

Subgradient

● How to compute a subgradient?

● Subgradient can be obtained by computing

at

Loss-augmented inference

36

SGD approach

37

SGD Approach

● Convergence guarantees:

Goes to 0 with appropriate step size

38

What can we say about Approximate Subgradients ?

● Epsilon subgradients:

w

ε

39

What can we say about Approximate Subgradients ?

● Convergence guarantees (see S. M. Robinson. Linear convergence of epsilon-subgradient descent methods for a class of convex functions. Mathematical Programming, 86:41–50, 1999):

Goes to 0 with appropriate step size

40

Proposed Algorithm

● Goal: better estimate the subgradient by using working sets of constraints, denoted .

● Algorithm:● First solves the loss-augmented inference to find a

constraint and add it to the working set .● It then steps in the opposite direction of the

subgradient computed as an average over the set of violated constraints belonging to .

41

Proposed Algorithm

42

EM Segmentation ResultsSegmentation performance measured with the Jaccard index.

[Lucchi] Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks with Learned Shape Features, A. Lucchi et al.[SGD+inference] (Online) Subgradient Methods for Structured Prediction, N. Ratliff et al.[SSVM] Support Vector Machine Learning for Interdependent and Structured Output Spaces, I. Tsochantaridis et al.[Samplerank] Samplerank: Training Factor Graphs with Atomic Gradients, M. Wick et al.

43

MSRC Segmentation Results

[Ladicky] What, Where and How Many? Combining Object Detectors and Crfs, L. Ladicky et al.[SGD+inference] (Online) Subgradient Methods for Structured Prediction, N. Ratliff et al.[SSVM] Support Vector Machine Learning for Interdependent and Structured Output Spaces, I. Tsochantaridis et al.[Yao] Describing the Scene as a Whole: Joint Object Detection, Scene Classification and Semantic Segmentation, J. Yao et al.

All the methods in the top part of the table were optimized for the average score.

44

MSRC Segmentation Results

[SGD+inference] (Online) Subgradient Methods for Structured Prediction, N. Ratliff et al.

45

Time Analysis

● Sampling can replace the more expensive inference step without much performance loss, leading to significantly lower learning time.

[SGD+inference] (Online) Subgradient Methods for Structured Prediction, N. Ratliff et al.[Samplerank] Samplerank: Training Factor Graphs with Atomic Gradients, M. Wick et al.

Running time for T = 1000 iterations.

Computational overhead = increase in time resulting from the working set.

46

Summary

● Structured learning makes it possible to jointly learn the model parameters.

● Used a working sets of constraints to produce better subgradient estimates and higher-quality solutions

47

Future Challenges

● Better energy functions, fully connected CRFs...● High order potentials

Increasing connectivity

48

4. SLIC superpixels/supervoxels

SLIC Superpixels Compared to State-of-the-art Superpixel Methods, IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2012R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua and S. Süsstrunk.

49

SLIC Superpixels

● Clusters pixels in the combined five-dimensional color and image plane space.

● Efficiently generate compact, nearly uniform superpixels.

SLIC Superpixels Compared to State-of-the-art Superpixel Methods, IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2012.Source code available online.

50

SLIC Supervoxels

Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks with Learned Shape Features,IEEE Transactions on Medical Imaging, 2011, A. Lucchi, K.Smith, R. Achanta, G. Knott, P. Fua.

51

SLIC Superpixels

Under-segmentation error = error with respect to a known ground truth.

Boundary recall = fraction of ground truth edges fall within one pixel of a least one superpixel boundary.

● GS04: Efficient Graph-Based Image Segmentation, P. Felzenszwalb and D. Huttenlocher.● NC05: Normalized Cuts and Image Segmentation, J. Shi and J. Malik.● QS09: Quick Shift and Kernel Methods for Mode Seeking, A. Vedaldi et al.● TP09: Turbopixels: Fast Superpixels Using Geometric Flows, A. Levinshtein et al.

SLIC

SLIC

52

SLIC Superpixels

● GKM: 10 iterations of k-means.● GS04: Efficient Graph-Based Image Segmentation, P. Felzenszwalb and D. Huttenlocher.● NC05: Normalized Cuts and Image Segmentation, J. Shi and J. Malik.● QS09: Quick Shift and Kernel Methods for Mode Seeking, A. Vedaldi et al.● TP09: Turbopixels: Fast Superpixels Using Geometric Flows, A. Levinshtein et al.

53

SLIC Superpixels

● Simple approach but successful: more than 100 citations, re-implemented in vlfeat and scikit + GPU implementation.

● Code: http://cvlab.epfl.ch/~lucchi/code.php

54

Questions

55

Resources

● http://cvlab.epfl.ch/research/medical/em/mitochondria/● http://cvlab.epfl.ch/research/medical/em/synapses/● http://cvlab.epfl.ch/~lucchi/

56

Credits

● Slides courtesy :● Christoph Lampert (Learning with Structured Inputs

and Outputs)● Pushmeet Kohli (Efficiently Solving Dynamic

Markov Random Fields using Graph Cuts)● Ben Taskar (Structured Prediction: A Large Margin

Approach, NIPS tutorial)● Yisong Yue, Thorsten Joachims (An Introduction to

Structured Output Learning Using Support Vector Machines)