R & S C - cdann.net · POTTICS-THE POTTS TOPIC MODEL FOR SEMANTIC IMAGE SEGMENTATION Christoph...

1
P OTTICS -T HE P OTTS T OPIC M ODEL FOR S EMANTIC I MAGE S EGMENTATION Christoph Dann 1 , Peter Gehler 2 , Stefan Roth 1 , Sebastian Nowozin 3 1 TU Darmstadt, 2 MPI-Intelligent Systems, 3 Microsoft Research P ROBLEM Semantic Image Segmentation: Assign a semantic class in {1,...,C } to each pixel of an image, e.g. person, dog or car, etc. background bottle chair diningtable person ... E XAMPLE R ESULTS unary-only prediction Pottics ground-truth A LGORITHM O VERVIEW Main Idea: Capture higher-order spatial relations of segment labels through latent topics, defined on multiple binary figure-ground segmentations (from CPMC [1]) – efficient inference through bipartite graph structure. Input Image 1. Generation of 100 binary proposal segmentations with Constrained Para- metric Min-Cuts [1]: ... 2. Intersection of all proposal segmentations produces su- perpixelation (regions) 3. Prob- abilistic Topic Model for labeling superpixels P ROBABILISTIC T OPIC M ODEL R 1 f u R 2 f u R 3 f u R 4 f u R 5 f u S 1 S 2 S 3 f p f p f p f p f p f p f p f p Variables: S i ∈{1,...,T } (latent) assigns a topic to each binary proposal segmenta- tion R j ∈{1,...,C } specifies the class label for all pixels in the j th superpixel Factors: f u : pre-trained unary potential based on TextonBoost; incorporates image information such as color, location, HOG, bounding box detections [3] f p : pairwise potential connects all pairs of binary proposal segmentations and superpixels that overlap: f p (S i ,R j )= w S i ,R j models the consis- tency of a semantic class and a latent topic Learning: parameters w tc are learned with a Structured Output SVM [2]. N UMERICAL R ESULTS VOC (2010) % UO* RP* SP* Potts* 1T* Pottics background 80.3 80.5 77.5 78.8 78.7 80.8 plane 27.6 27.4 12.8 22.4 10.1 41.0 bicycle 0.6 0.6 0.0 0.6 0.1 3.9 bird 11.9 11.9 2.3 10.7 2.8 22.1 boat 16.0 16.1 4.8 13.7 4.8 25.3 bottle 15.2 14.9 2.3 12.7 3.8 24.2 bus 33.0 33.1 29.0 32.2 22.4 41.3 car 43.3 44.2 37.3 43.2 31.4 52.8 cat 28.8 30.4 25.4 26.5 25.0 25.3 chair 5.2 5.5 3.4 5.4 2.5 6.4 cow 10.7 10.5 2.2 7.8 2.5 20.2 table 5.4 4.7 1.0 4.5 2.7 12.5 dog 12.2 12.3 7.3 11.9 4.5 11.5 horse 16.1 16.3 3.7 13.4 6.1 18.6 motorbike 28.4 29.4 20.9 26.9 16.5 34.7 person 34.6 35.7 33.2 35.2 28.4 37.1 plant 11.0 10.3 8.4 9.7 5.7 16.2 sheep 20.0 20.9 9.9 17.4 10.8 21.0 sofa 12.8 12.9 5.7 11.7 7.4 12.3 train 30.1 30.3 19.1 28.8 18.2 39.9 tv 23.2 22.0 8.0 20.9 4.2 27.6 total 22.2 22.4 14.8 20.7 13.8 27.4 S UMMARY Novel CRF model beyond Potts for semantic image segmentation Latent topics may have a meaningful interpretation, such as "bus and car appear together" Substantial improvement over unary-only prediction and simple Gibbs- models Efficient inference (here: iterated conditional modes – ICM) as topic as- signments and superpixel labels form a bipartite graph R EFERENCES &S OURCE C ODE [1] Carreira, J., Sminchisescu, C.: Constrained Parametric Min-Cuts for Automatic Object Segmentation. In: CVPR (2010) [2] Yu, C.N., Joachims, T.: Learning Structural SVMs with Latent Variables. In: ICML (2009) [3] Krähenbühl, P., Koltun, V.: Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. In: NIPS (2011) The source code is available at http://github.com/chrodan/ pottics * (UO) Pixel prediction [3]; (RP) f u -only prediction; (SP) Unary-potential for pixels in a proposal segmentation; (Potts) Potts-Model on superpixels; (1T) Pottics restricted to 1 topic

Transcript of R & S C - cdann.net · POTTICS-THE POTTS TOPIC MODEL FOR SEMANTIC IMAGE SEGMENTATION Christoph...

Page 1: R & S C - cdann.net · POTTICS-THE POTTS TOPIC MODEL FOR SEMANTIC IMAGE SEGMENTATION Christoph Dann1, Peter Gehler2, Stefan Roth1, Sebastian Nowozin3 1 TU Darmstadt, 2 MPI-Intelligent

POTTICS - THE POTTS TOPIC MODEL FORSEMANTIC IMAGE SEGMENTATION

Christoph Dann1, Peter Gehler2, Stefan Roth1, Sebastian Nowozin3

1 TU Darmstadt, 2 MPI-Intelligent Systems, 3 Microsoft Research

PROBLEMSemantic Image Segmentation: Assign a semantic class in{1, . . . , C} to each pixel of an image, e.g. person, dog or car,etc.

background

bottle

chair

diningtable

person

...

EXAMPLE RESULTS

unary-only prediction Pottics ground-truth

ALGORITHM OVERVIEWMain Idea: Capture higher-order spatial relations of segment labels throughlatent topics, defined on multiple binary figure-ground segmentations (fromCPMC [1]) – efficient inference through bipartite graph structure.

InputImage

1. Generation of 100 binary proposalsegmentations with Constrained Para-metric Min-Cuts [1]:

. . .

2. Intersectionof all proposalsegmentationsproduces su-perpixelation(regions)

3. Prob-abilisticTopicModelfor labelingsuperpixels

PROBABILISTIC TOPIC MODEL

R1

fu

R2

fu

R3

fu

R4

fu

R5

fu

S1 S2 S3

fp fp fpfp fp fp fpfp

Variables:

• Si ∈ {1, . . . , T} (latent) assigns a topic to each binary proposal segmenta-tion

• Rj ∈ {1, . . . , C} specifies the class label for all pixels in the jth superpixel

Factors:

• fu: pre-trained unary potential based on TextonBoost; incorporates imageinformation such as color, location, HOG, bounding box detections [3]

• fp: pairwise potential connects all pairs of binary proposal segmentationsand superpixels that overlap: fp(Si, Rj) = wSi,Rj → models the consis-tency of a semantic class and a latent topic

Learning: parameters wtc are learned with a Structured Output SVM [2].

NUMERICAL RESULTSVOC (2010) % UO* RP* SP* Potts* 1T* Pottics

background 80.3 80.5 77.5 78.8 78.7 80.8plane 27.6 27.4 12.8 22.4 10.1 41.0

bicycle 0.6 0.6 0.0 0.6 0.1 3.9bird 11.9 11.9 2.3 10.7 2.8 22.1boat 16.0 16.1 4.8 13.7 4.8 25.3

bottle 15.2 14.9 2.3 12.7 3.8 24.2bus 33.0 33.1 29.0 32.2 22.4 41.3car 43.3 44.2 37.3 43.2 31.4 52.8cat 28.8 30.4 25.4 26.5 25.0 25.3

chair 5.2 5.5 3.4 5.4 2.5 6.4cow 10.7 10.5 2.2 7.8 2.5 20.2

table 5.4 4.7 1.0 4.5 2.7 12.5dog 12.2 12.3 7.3 11.9 4.5 11.5

horse 16.1 16.3 3.7 13.4 6.1 18.6motorbike 28.4 29.4 20.9 26.9 16.5 34.7

person 34.6 35.7 33.2 35.2 28.4 37.1plant 11.0 10.3 8.4 9.7 5.7 16.2

sheep 20.0 20.9 9.9 17.4 10.8 21.0sofa 12.8 12.9 5.7 11.7 7.4 12.3

train 30.1 30.3 19.1 28.8 18.2 39.9tv 23.2 22.0 8.0 20.9 4.2 27.6

total 22.2 22.4 14.8 20.7 13.8 27.4

SUMMARY• Novel CRF model beyond Potts for semantic image segmentation

• Latent topics may have a meaningful interpretation, such as "bus and carappear together"

• Substantial improvement over unary-only prediction and simple Gibbs-models

• Efficient inference (here: iterated conditional modes – ICM) as topic as-signments and superpixel labels form a bipartite graph

REFERENCES & SOURCE CODE

[1] Carreira, J., Sminchisescu, C.: Constrained Parametric Min-Cuts forAutomatic Object Segmentation. In: CVPR (2010)

[2] Yu, C.N., Joachims, T.: Learning Structural SVMs with Latent Variables.In: ICML (2009)

[3] Krähenbühl, P., Koltun, V.: Efficient Inference in Fully Connected CRFswith Gaussian Edge Potentials. In: NIPS (2011)

The source code is available at http://github.com/chrodan/pottics

* (UO) Pixel prediction [3]; (RP) fu-only prediction; (SP) Unary-potential for pixels in a proposalsegmentation; (Potts) Potts-Model on superpixels; (1T) Pottics restricted to 1 topic