Assessing Higher Education's Advancement Toward a New Vision ...
CVPR2010: higher order models in computer vision: Part 1, 2
description
Transcript of CVPR2010: higher order models in computer vision: Part 1, 2
Tractable Higher Order Models in Computer Vision
Carsten Rother Sebastian Nowozin
Microsoft Research Cambridge
Schedule
830-900 Introduction
900-1000 Models: small cliques and special potentials
1000-1030 Tea break
1030-1200 Inference: Relaxation techniques: LP, Lagrangian, Dual Decomposition
1200-1230 Models: global potentials and global parameters + discussion
A gentle intro to MRFs
Goal
Given z and unknown (latent) variables x :
P(x|z) = P(z|x) P(x) / P(z) ~ P(z|x) P(x)
z = (R,G,B)n x = {0,1}n
Posterior Probability
Likelihood (data-
dependent)
Maximium a Posteriori (MAP): x* = argmax P(x|z)
Prior (data-
independent)
x
Likelihood P(x|z) ~ P(z|x) P(x)
Red
Gre
en
Red
Gre
en
Likelihood P(x|z) ~ P(z|x) P(x)
Maximum likelihood:
x* = argmax P(z|x) = argmax P(zi|xi)
Log P(zi|xi=0) P(zi|xi=1)
X
x ∏ xi
Prior P(x|z) ~ P(z|x) P(x)
P(x) = 1/f ∏ θij (xi,xj) f = ∑ ∏ θij (xi,xj) “partition function”
θij (xi,xj) = exp{-|xi-xj|} “ising prior”
xi xj
x
i,j Є N
i,j Є N
(exp{-1}=0.36; exp{0}=1)
Prior
Solutions with highest probability (mode)
P(x) = 0.012 P(x) = 0.012 P(x) = 0.011
Pure Prior model:
Faire Samples
Smoothness prior needs the likelihood
P(x) = 1/f ∏ exp{-|xi-xj|} i,j Є N
Weight prior and likelihood
w =0
E(x,z,w) = ∑ θi (xi,zi) + w∑ θij (xi,xj)
w =10
w =200 w =40
Posterior distribution
P(x|z) = 1/f(z,w) exp{-E(x,z,w)}
E(x,z,w) = ∑ θi (xi,zi) + w∑ θij (xi,xj) i i,j
P(zi|xi=1) xi + P(zi|xi=0) (1-xi) θi (xi,zi) =
θij (xi,xj) = |xi-xj|
P(x|z) ~ P(z|x) P(x)
“Gibbs” distribution:
Likelihood
prior
Energy
Unary terms Pairwise terms
Energy minization
-log P(x|z) = -log (1/f(z,w)) + E(x,z,w)
MAP same as minimum Energy
MAP; Global min E
x* = argmin E(x,z,w)
ML
f(z,w) = ∑ exp{-E(x,z,w)} X
X
P(x|z) = 1/f(z,w) exp{-E(x,z,w)}
Model : Variables: discrete or continuous?
If discrete: how many labels? Space: discrete or continuous? Dependences between variables? How many variables? …
Random Field Models for Computer Vision
Inference: Combinatorial optimization: e.g. Graph Cut
Message Passing: e.g. BP, TRW
Iterated Conditional Modes (ICM)
LP-relaxation: e.g. Cutting-plane
Problem decomposition + subgradient
…
Learning: Exhaustive search (grid search)
Pseudo-Likelihood approximation
Training in Pieces
Max-margin
…
Applications: 2D/3D Image segmentation Object Recognition 3D reconstruction Stereo / optical flow Image denoising Texture Synthesis Pose estimation Panoramic Stitching …
Introducing Factor Graphs
Write probability distributions as Graphical model: - Direct graphical model - Undirected graphical model “traditionally used for MRFs”
- Factor graphs “best way to visualize the underlying energy”
References: - Pattern Recognition and Machine Learning *Bishop ‘08, book, chapter 8+
- several lectures at the Machine Learning Summer School 2009 (see video lectures)
Factor Graphs
P(x) ~ θ(x1,x2,x3) θ(x2,x4) θ(x3,x4) θ(x3,x5)
x2 x1
x4 x3
x5
Factor graph
unobserved
P(x) ~ exp{-E(x)} E(x) = θ(x1,x2,x3) + θ(x2,x4) + θ(x3,x4) + θ(x3,x5)
variables are in same factor.
“4 factors”
Gibbs distribution
Definition “Order”
Definition “Order”: The arity (number of variables) of the largest factor
P(X) ~ θ(x1,x2,x3) θ(x2,x4) θ(x3,x4) θ(x3,x5)
x2 x1
x4 x3
x5
Factor graph with order 3
arity 3 arity 2
Extras: • I will use “factor” and “clique” in the same way • Not fully correct since clique may or may not
decompose • Definition of “order” same for clique and factor • Markov Property: Random Field with low-order
factors/cliques.
x2 x1
x4 x3
x5
Undirected model
Triple clique
Random Fields in Vision
4-connected; pairwise MRF
Higher-order MRF
E(x) = ∑ θij (xi,xj) i,j Є N4
higher(8)-connected; pairwise MRF
E(x) = ∑ θij (xi,xj) i,j Є N8
Order 2 Order 2 Order n
E(x) = ∑ θij (xi,xj)
+θ(x1,…,xn) i,j Є N4
MRF with global variables
E(x) = ∑ θij (xi,xj) i,j Є N8
Order 2
4-connected: Segmentation
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj) i
Observed variable
Unobserved variable
xi
i,j Є N4
zi
Factor graph
xj
4-connected: Segmentation (CRF)
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj,zi,zj) i
Observed variable Unobserved (latent) variable
i,j Є N4
Conditional Random Field (CRF) no pure prior
MRF
xj
zj
zi
xj
zj
zi
xi
xj
Factor graph
4-connected: Stereo matching
Ground truth depth Image – left(a) Image – right(b)
• Images rectified • Ignore occlusion for now
E(d): {0,…,D-1}n → R
Energy:
Labels: d (depth/shift)
di
[Boykov et al. ‘01+
Random Fields in Vision
4-connected; pairwise MRF
Higher-order MRF
E(x) = ∑ θij (xi,xj) i,j Є N4
higher(8)-connected; pairwise MRF
E(x) = ∑ θij (xi,xj) i,j Є N8
Order 2 Order 2 Order n
E(x) = ∑ θij (xi,xj)
+θ(x1,…,xn) i,j Є N4
MRF with global variables
E(x,) = ∑ θij (xi,xj) i,j Є N8
Order 2
Highly-connect: Discretization artefacts
higher-connectivity can model true Euclidean length
4-connected Euclidean
8-connected Euclidean
*Boykov et al. ‘03; ‘05+
3D reconstruction
[Slide credits: Daniel Cremers]
Stereo with occlusion
Each pixel is connected to D pixels in the other image
E(d): {1,…,D}2n → R
Ground truth Stereo with occlusion *Kolmogrov et al. ‘02+
Stereo without occlusion *Boykov et al. ‘01+
Texture De-noising
Test image Test image (60% Noise) Training images
Result MRF 9-connected
(7 attractive; 2 repulsive)
Result MRF 4-connected
Result MRF 4-connected (neighbours)
Random Fields in Vision
4-connected; pairwise MRF
Higher-order MRF
E(x) = ∑ θij (xi,xj) i,j Є N4
higher(8)-connected; pairwise MRF
E(x) = ∑ θij (xi,xj) i,j Є N8
Order 2 Order 2 Order n
E(x) = ∑ θij (xi,xj)
+θ(x1,…,xn) i,j Є N4
MRF with global variables
E(x,) = ∑ θij (xi,xj) i,j Є N8
Order 2
Reason 4: Use Non-local parameters: Interactive Segmentation (GrabCut)
*Boykov and Jolly ’01+
GrabCut *Rother et al. ’04+
MRF with Global parameters: Interactive Segmentation (GrabCut)
An object is a compact set of colors:
*Rother et al Siggraph ’04+
E(x,w) = ∑ θi (xi,w) + ∑ θij (xi,xj) i i,j Є N4
E(x,w): {0,1}n x {GMMs}→ R
Red
Red
w
Model jointly segmentation and color model:
Latent/Hidden CRFs
• ObjCut Kumar et. al. ‘05, Deformable Part Model Felzenszwalb et al., CVPR ’08; PoseCut Bray et al. ’06, LayoutCRF Winn et al. ’06
• Hidden variable are never observed (either training or test), e.g. parts
• Maximizing over hidden variables
• ML: Deep Belief Networks, Restricted Booltzman machine [Hinton et al.] (often sampling is done)
“parts”
“instance label”
“instance”
[LayoutCRF Winn et al. ’06+
Random Fields in Vision
4-connected; pairwise MRF
Higher-order MRF
E(x) = ∑ θij (xi,xj) i,j Є N4
higher(8)-connected; pairwise MRF
E(x) = ∑ θij (xi,xj) i,j Є N8
Order 2 Order 2 Order n
E(x) = ∑ θij (xi,xj)
+θ(x1,…,xn) i,j Є N4
MRF with global variables
E(x,) = ∑ θij (xi,xj) i,j Є N8
Order 2
First Example
This Tutorial: 1. What higher-order models have been used in vision? 2. How is MAP inference done for those models? 3. Relationship between higher-order MRFs and MRFs with global variables?
User input Standard MRF: with connectivity:
E(x) = P(x) + h(x) with h(x)= { ∞ if not 4-connected 0 otherwise
Ising prior: * Smoothing boundary * Removes noise
Ising prior: * Smoothing boundary
Inference (very brief summary)
• Message Passing Techniques (BP, TRW, TRW-S) – Defined on the factor graph ([Potetz ’07, Lan ‘06+)
– Can be applied to any model (in theory) (higher order, multi-label)
• LP-relaxation: (… more in part III)
– Relax original problem ({0,1} to [0,1]) and solve with existing techniques (e.g. sub-gradient)
– Can be applied any model (dep. on solver used)
– Connections to TRW (message passing)
Inference (very brief summary)
• Dual/Problem Decomposition (… more in part III)
– Decompose (NP-)hard problem into tractable once. Solve with sub-gradient
– Can be applied any model (dep. on solver used)
• Combinatorial Optimization: (… more in part II)
– Binary, Pairwise MRF: Graph cut, BHS (QPBO) – Multiple label, pairwise: move-making; transformation – Binary, higher-order factors: transformation – Multi-label, higher-order factors:
move-making + transformation
Inference higher-oder models (very brief summary)
• Arbitrary potentials are only tractable for order <7 (memory, computation time)
• For ≥7 potentials need some structure to be exploited in order to make them tractable
Forthcoming book!
• MIT Press (probably end of 2010)
• Most topics of this course and much, much more
• Contributors: usual suspects: Editors + Boykov, Kolmogorov,
Weiss, Freeman, Komodiakis, ....
Advances in Markov Random Fields for Computer Vision (Blake, Kohli, Rother)
Schedule
830-900 Introduction
900-1000 Models: small cliques and special potentials
1000-1030 Tea break
1030-1200 Inference: Relaxation techniques: LP, Lagrangian, Dual Decomposition
1200-1230 Models: global potentials and discussion
Small cliques (<7) (and transformation approach)
Optimization: Binary, Pairwise
E(x) = ∑ θi (xi) + ∑ θij (xi,xj) i,j Є N
Submodular: • θij (0,0) + θij (1,1) ≤ θij (0,1) + θij (1,0) • Condition holds naturally for many vision problems
(e.g. segmentation: |xi-xj|) • Graph cut computes efficiently the global optimum
(~0.5sec for 1MPixel *Boykov, Kolmogorov ‘01+ )
Non-Submolduar: • BHS algorithm (also called QPBO algorithm) ([Borors, Hammer, and Sun ’91,
Kolmogrov et al. ‘07+)
Graph cut on a special graph: Output ,0,1,’?’-; • Partial optimality (various solutions for ‘?’ nodes) • Solves underlying LP-relaxation • Quality depends on application (see *Rother et al CVPR ‘07+)
• Extensions exists QPBOP, QPBOI (see [Rother et al CVPR ’07, Woodford et al. ‘08+)
xj ϵ {0,1} i
Optimization: Binary, Pairwise
f(x1,x2) = θ11x1x2 + θ10x1(1-x2) + θ01(1-x1)x2 + θ00(1-x1)(1-x2)
Quadratic Pseudo-Boolean optimization (QPBO): B2 → R
f(x1,x2) = ax1x2 + bx1 + cx2 + d
Reminder : Encoding for graph-cut
x1 x2
s
t
Θ00
Θ11
Θ01 - Θ00
Θ10 – Θ11
a cut gives a labelling (energy)
all weights are positive if submodular (re-parameterization to normal form)
Optimization: binary, higher-order
f(x1,x2,x3) = θ111x1x2x3 + θ110x1x2(1-x3) + θ101x1(1-x2)x3 + …
f(x1,x2,x3) = ax1x2x3 + bx1x2 + cx2x3… + 1
Quadratic polynomial can be done
Idea: transform 2+ order terms into 2nd order terms
Methods: 1. transformation by “substitution” 2. transformation by “min. selection”
Transformation by “substitution” *Rosenberg ’75, Boros and Hammer ’02, Ali et al. ECCV ‘08+
f(x1,x2,x3) = x1x2x3 + x1x2 + x2
D(x1,x2,z) = x1x2 – 2x1z – 2x2z + 3z
It is: D(x1,x2,z) = 0 if x1x2 = z D(x1,x2,z) > 0 if x1x2 ≠ z
Auxiliary function:
f(x1,x2,x3) = min g(x1,x2,x3,z) = zx3 + z + x2 + K D(x1,x2,z)
Since K very large then x1x2 = z
z
Apply it recursively ….
Problem: • Doesn’t work in practice *Ishikawa CVPR ‘09+ • function D is non-submodular and “K enforces this strongly”
“Substitution”:
z ϵ {0,1}
Transformation by “min. selection” [Freedman and Drineas ’05, Kolmogorov and Zabhi ’04, Ishikawa ’09+
f(x1,x2,x3) = ax1x2x3
-x1x2x3 = min –z(x1+x2+x3-2)
Useful :
z z ϵ {0,1}
Check: - all x1,x2,x3 = 1 then z=1 - Otherwise z=0
f(x1,x2,x3) = min –az (x1+x2+x3-2)
Transform: Case a<0:
z
f(x1,x2,x3) = min a{z(x1+x2+x3-1)+(x1x2+x2x3+x3x1)-(x1+x2+x3+1)} Case a>0: (similar trick)
z
Transformation by “min. selection”
with nd = floor(d-1/2) many new variables w
From *Ishikawa PAMI ’09+
The general case:
Full Procedure *Ishikawa ‘09+
• Worst case exponential: potential order 5 gives up to 15 new variables. • Probably tractable for up to order 6 • May get very hard to solve (non-submodular) • Code available online on Ishikawa’s webpage
General 5-order potential: f(x1,x2,x3) = ax1x2x3x4x5 + bx1x2x3x4 + c x1x2x3x5 + d x1x2x4x5 + … … transform all 2+ degree terms are only degree 2 terms
Application 1: De-noising with Field-of-Experts
[Roth and Black ’05, Ishikawa ‘09+
FoE prior
xc set of nxn patches (here 2x2) Jk set of filters:
From *Ishikawa PAMI ’09, Roth et al ‘05+
E(X) = ∑ (zi-xi)2 / 2σ2 + ∑ ∑ αk (1+ 0.5(Jk xc)
2)
Unary liklihood
i c k
non-convex optimization problem
z
x
How to handle continuous labels in discrete MRF?
Solve with fusion move *Lempitsky et al ICCV ’07, ’08, ‘10, Woodford et al. ‘08+
Fusion move optimization: 1. X = arbitrary labelling
2. E’(X’) = binary MRF for fusion with proposal
3. go to 2) if energy went down
X’=0 X’=1
= ●
X’ (use BHS algorithm)
X’=1
“Alpha expansion”
=
Final X’
initial X’
Properties of fusion move: 1. Performance depends on performance of BHS algorithm (labelled nodes) 2. Guarantee: E goes not up 3. In practice often labelled nodes. Because:
θij (0,0) + θij (1,1) ≤ θij (0,1) + θij (1,0)
“often low cost”
Application 1: De-noising with Field-of-Experts
[Lempitsky et al ICCV ’07, ’08, ‘10, Woodford et al. ‘08]
X’=0 X’=1
Results
“Factor Graph: BP based”
Result: PSNR/E:
original noisy Pairwise-model
Pairwise-model FoE TV-norm (continuous model) From *Ishikawa PAMI ’09+
Results
From *Ishikawa PAMI ’09+
Comparison with “substitution”: Blur & random 0.0002% labelled
Application 2: Curvature in stereo *Woodford et al CVPR ‘08+
f(x1,x2,x3) = x1 -2x2 + x3 where xi ϵ {0,…,D} depth
Example: slanted plane: f(1,2,3)=0
From *Woodford et al. CVPR ’08+
image Pair-wise Triple terms
Application 3: Higher-Order Likelihood for optical flow *Glocker et al. ECCV ‘10+
• Pair-wise MRF: Likelihood in unaries as NCC cost: approximation error!
• Higher-order likelihood: done with triple cliques (ideally higher)
Image 1 Image 2
• Also use 3/4-order term to not penalize any affine motion
One image Bi-layer triangulation Optical flow
Any size, Special potentials (and transformation approach)
Label-Cost Potential [Hoiem et al. ’07, Delong et al. ’10, Bleyer et al. ‘10+
E(x) = P(x) +
From *Delong et al. ’10+
Image Grabcut-style result With cost for each new label *Delong et al. ’10+ (Same function as [Zhu and Yuille ‘96+)
“pairwise MRF”
∑ cl [ p: xp= l ] l Є L
E
“Label cost”
Label cost = 4c
Label cost = 10c
E: {1,…,L}n → R
Basic idea: penalize the complexity of the model • Minimum description length (MDL) • Bayesian information criterion (BIC) • Akaike information criteriorn (AIC)
How is it done …
From *Delong et al. ’10+
In an alpha expansion step:
E(x’) = P(x’) + ∑ (cl – cl Π x’p) l Є L
p Є Pl
Case a<0: a Πxi = min aw (∑xi - |Pl|+1) w i
0 1 1 0 1
1 0 1 1 1
a b b c b
Cost for b: cb example 1: a b a a a
a a a c a
x’
x’
example 2:
Formally:
Cost for b: 0
a a a a a ●
x’= 0 x’= 1
Submodular! p Є Pl
Application: Model fitting *Delong et al. ‘10+
No MRF
Example: surface-based stereo *Bleyer et al. ‘10+
Left Image
No Label Cost prior
With Label Cost prior
3D scene explained by a small set of 3D surfaces
surfaces depth depth surfaces
Example: 3DLayout CRF: Recognition and Segmentation
[Hoiem et. al ‘07+
Result s with instance cost
Robust(Soft) Pn Potts model *Kohli et. al. CVPR ‘07, ‘08, PAMI ’08, IJCV ‘09+
Image Segmentation
E(X) = ∑ ci xi + ∑ dij |xi-xj| i i,j
E: {0,1}n → R
0 →fg, 1→bg
n = number of pixels
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother et al.`04]
Image Unary Cost Segmentation
Pn Potts Potentials
Patch Dictionary (Tree)
Cmax 0
{ 0 if xi = 0, i ϵ p Cmax otherwise
h(Xp) =
p
[slide credits: Kohli]
Pn Potts Potentials
E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp) i i,j p
p
{ 0 if xi = 0, i ϵ p Cmax otherwise
h(Xp) =
E: {0,1}n → R
0 →fg, 1→bg
n = number of pixels
[slide credits: Kohli]
Image Segmentation
E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp) i i,j
Image Pairwise Segmentation Final Segmentation
p
E: {0,1}n → R
0 →fg, 1→bg
n = number of pixels
[slide credits: Kohli]
Application: Recognition and Segmentation
from [Kohli et al. ‘08]
Image
Unaries only TextonBoost
*Shotton et al. ‘06+
Pairwise CRF only *Shotton et al. ‘06+
Pn Potts
One super-pixelization
another super-pixelization
Robust(soft) Pn Potts model
{ 0 if xi = 0, i ϵ p f(∑xp) otherwise
h(xp) = p
p
from [Kohli et al. ‘08]
Robust Pn Potts Pn Potts
Application: Recognition and Segmentation
From [Kohli et al. ‘08]
Image
Unaries only TextonBoost
*Shotton et al. ‘06+
Pairwise CRF only *Shotton et al. ‘06+
Pn Potts
robust Pn Potts
robust Pn Potts
(different f)
One super-pixelization
another super-pixelization
Same idea for surface-based stereo *Bleyer ‘10+
One input image
Ground truth depth
Stereo with hard-segmentation
Stereo with robust Pn Potts
This approach gets best result on Middlebury Teddy image-pair:
How is it done…
H (X) = F ( ∑ xi )
Most general binary function:
H (X)
∑ xi
concave
0
The transformation is to a submodular pair-wise MRF, hence optimization globally optimal
[slide credits: Kohli]
Higher order to Quadratic
• Start with Pn Potts model:
{ 0 if all xi = 0 C1 otherwise f(x) = x ϵ {0,1}n
min f(x) min C1a + C1 (1-a) ∑xi x = x,a ϵ {0,1}
Higher Order Function
Quadratic Submodular Function
∑xi = 0 a=0 f(x) = 0
∑xi > 0 a=1 f(x) = C1
[slide credits: Kohli]
Higher order to Quadratic
min f(x) min C1a + C1 (1-a) ∑xi x
= x,a ϵ {0,1}
Higher Order Function Quadratic Submodular Function
∑xi
1 2 3
C1
C1∑xi
[slide credits: Kohli]
Higher order to Quadratic
min f(x) min C1a + C1 (1-a) ∑xi x
= x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
C1
C1∑xi
a=1 a=0 Lower envelop of concave functions is
concave
[slide credits: Kohli]
Higher order to Quadratic
min f(x) min f1 (x)a + f2(x) (1-a) x
= x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
Lower envelop of concave functions is
concave
f2(x)
f1(x)
[slide credits: Kohli]
Higher order to Quadratic
min f(x) min f1 (x)a + f2(x) (1-a) x
= x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
a=1 a=0 Lower envelop of concave functions is
concave
f2(x)
f1(x)
Arbitrary concave functions: sum potentials up (each breakpoint adds a new binary variable) *Vicente et al. ‘09+ [slide credits: Kohli]
+
=
Beyond Pn Potts … soft Pattern-based Potentials *Rother et al. ’08, Komodikis et al. ‘08+
Training Image
Test Image
with noise
Result pairwise-MRF 9-connected
(7 attractive; 2 repulsive)
Motivation: binary image de-noising
Higher Order Structure not Preserved
Higher-order MRF
Minimize:
Where:
Higher Order Function (|c| = 10x10 = 100)
Assigns cost to 2100 possible labellings!
Exploit function structure to transform it to a Pairwise function
E(X) = P(X) + ∑ hc (Xc) c
hc: {0,1}|c| → R
Sparse higher-order functions
How this can be done…
hc(x) = { 0 if xc = P0
P0
k otherwise
Check it: 1. Pattern off => a=1,b=0 (cost k) 2. Patter on => a=0, b=1 (cost 0) Problem: 1. Term: “kab” is non-submodular 2. Only BP, TRW worked for inference
General Potential: Add all terms up
hc(x) = min ka + k(1-b) – ka(1-b) + k ∑ (1-a)xi + k ∑ b(1-xi) a,b i ϵ S0(P0) i ϵ S1(P0)
One clique, one pattern:
k
k
k k
k
-k
Soft multi-label Pattern-based
hc(x) = min {min ka + ∑ wia [xi ≠ Pa(i)] , kmax}
a ϵ {1,…,L}
P patterns (multi-label)
P soft deviation functions
i
w1 w2 w3
P1 P2 P3
Function per clique:
How it is done…
We use BP for optimization, since submodular and other solvers inferior
With a pattern-switching variable z:
hc(xc) = min f(z) + ∑ g(z,xi)
z ϵ {1,…,L+1}
f(z) =
g(z,xi) =
i ϵ c
{
{
ka if z = a
kmax if z = L+1
wia if z = a and xi ≠ Pa(i)
0 if z = L+1
hc(x) = min {min ka + ∑ wia [xi ≠ Pa(i)] , kmax}
a ϵ {1,…,L} i
z
Function per clique:
Results: Multi-label
Training; 256 labels Test; 256 labels Test + noise; 256 labels
Pairwise (15 label) (5.6sec; BP 10iter.)
10 10x10 Hard Pattern (15 label) (48sec; BP 10iter.)
10 10x10 Soft Pattern (15 label) (48sec; BP 10iter.)
Standard Patch-based MRFs [Learning Low-Level Vision, Freeman IJCV ‘04+
Mu
lti-
lab
el
xi xj xk
xl
Not all labels possible (comparison still to be done)
E(x) = U(x) + P(x) E: {1,…,L}n → R
measures patch overlap
IMPORTANT
Tea break!