Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.
-
Upload
cecil-trevor-curtis -
Category
Documents
-
view
216 -
download
0
Transcript of Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.
![Page 1: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/1.jpg)
Advanced Computer Vision(Module 5F16)
Carsten RotherPushmeet Kohli
![Page 2: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/2.jpg)
Syllabus (updated)• L1&2: Intro
– Intro: Probabilistic models– Different approaches for learning – Generative/discriminative models, discriminative functions
• L3&4: Labelling Problems in Computer Vision– Graphical models– Expressing vision problems as labelling problems
• L5&6: Optimization- Message Passing (BP, TRW) - Submodularity and Graph Cuts - Move Making algorithms (Expansion/Swap/Range/Fusion) - LP Relaxations - Dual Decomposition
![Page 3: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/3.jpg)
Syllabus (updated)
• L7&8 (8.2): Optimization and Learning- compare max-margin vs. maximum likelihood
• L9&10 (15.2): Case Studies - tbd … Decision Trees and Random Fields, Kinect Person detection
• L11&12 (22.2): Optimization Comparison, Case Studies (tbd)
![Page 4: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/4.jpg)
Books
1. Advances in Markov Random Fields for Computer Vision. MIT Press 2011. (Edited by Andrew Blake, Pushmeet Kohli and Carsten Rother)
2. Pattern Recognition and Machine Learning, Springer 2006, by Chris Bishop
3. Structured Learning and Prediction in Computer Vision (Sebastian Nowozin and Christoph H. Lampert; Foundations and Trends in Computer Graphics and Vision series of now publishers, 2011).
4. Computer Vision, Springer 2010, by Rick Szeliski
![Page 5: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/5.jpg)
A gentle Start:Interactive Image Segmentation
and Probabilities
![Page 6: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/6.jpg)
Probabilities
• Probability distribution: P(x): ∑ P(x) = 1, P(x) ≥ 0; discrete x ϵ {0,…L}
• Joint distribution: P(x,z)
• Conditional distribution: P(x|z) • Sum rule: P(x) = ∑ P(x,z)
• Product rule: P(x,z) = P(x|z) P(z)
• Bayes’ rule: P(x|z) = P(z|x) P(x) / P(z)
x
z
![Page 7: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/7.jpg)
Interactive Segmentation
Goal
Given z and unknown variables x: P(x|z) = P(z|x) P(x) / P(z) ~ P(z|x) P(x)
z = (R,G,B)n x = {0,1}n
Posterior Probability
Likelihood(data-
dependent)
Maximium a Posteriori (MAP): x* = argmax P(x|z)
Prior(data-
independent)
x
x* = argmin E(x)x
We will express this as anenergy minimization problem:
constant
(user-specified pixels are not optimized for)
![Page 8: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/8.jpg)
Likelihood P(x|z) ~ P(z|x) P(x)
Red
Gre
en
RedG
reen
![Page 9: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/9.jpg)
Likelihood P(x|z) ~ P(z|x) P(x)
Maximum likelihood:
x* = argmax P(z|x) =
= argmax ∏ P(zi|xi)
p(zi|xi=0) p(zi|xi=1)
x
ix
![Page 10: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/10.jpg)
Prior P(x|z) ~ P(z|x) P(x)
P(x) = 1/f ∏ θij (xi,xj)
f = ∑ ∏ θij (xi,xj) “partition function”
θij (xi,xj) = exp{-|xi-xj|} “ising prior”
xi xj
x
i,j Є N4
i,j Є N
(exp{-1}=0.36; exp{0}=1)
![Page 11: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/11.jpg)
Prior – 4x4 Grid
Best Solutions sorted by probability
Pure Prior model:
“Smoothness prior needs the likelihood”
P(x) = 1/f ∏ exp{-|xi-xj|} i,j Є N4
Worst Solutions sorted by probability
![Page 12: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/12.jpg)
Prior – 4x4 Grid
Distribution
Pure Prior model: P(x) = 1/f ∏ exp{-|xi-xj|} i,j Є N4
Samples
216 configurations
Prob
abili
ty
![Page 13: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/13.jpg)
Prior – 4x4 Grid
Best Solutions sorted by probability
Pure Prior model: P(x) = 1/f ∏ exp{-10|xi-xj|} i,j Є N4
Worst Solutions sorted by probability
![Page 14: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/14.jpg)
Prior – 4x4 Grid
Distribution
Pure Prior model: P(x) = 1/f ∏ exp{-10|xi-xj|} i,j Є N4
Samples
216 configurations
Prob
abili
ty
![Page 15: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/15.jpg)
Putting it together…
… let us look at this later
Posterior: P(x|z) = P(z|x) P(x) / P(z)
P(x|z) = 1/P(z) * 1/f ∏ exp{-|xi-xj|} * ∏ p(zi|xi)
Rewriting it…
P(x,z) = P(z|x) P(x)Joint:
with f(z) = ∑ exp{-E(x,z)}
i,j Є N4 i
= 1/f(z) exp{- (∑ |xi-xj| + ∑ -log p(zi|xi)) } i i,j Є N4
= 1/f(z) exp{-E(x,z)}
“Gibbs distribution”
= 1/f(z) exp{- (∑ |xi-xj| + ∑ -log p(zi|xi=0)(1-xi) -log p(zi|xi=1)xi)}
i i,j Є N4
x
![Page 16: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/16.jpg)
Gibbs Distribution is more general
-log p(zi|xi=1) xi -log p(zi|xi=0) (1-xi) θi (xi,zi) =
θij (xi,xj) = |xi-xj|
Unary term
“encoded our prior knowledge over labellings”
P(x|z) = 1/f(z) exp{-E(x,z)} with f(z) = ∑ exp{-E(x,z)}
E(x,z) = ∑ θi (xi,z) + w∑ θij (xi,xj,z) + ∑ θij,k (xi,xj,xk,z) +... i i,j i,j,k
Gibbs distribution does not has to decompose into prior and likelihood:
x
Energy:
Pairwise term
“encoded our dependency on the data”
Higher-order termsIn our case:
![Page 17: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/17.jpg)
Energy minimization
-log P(x|z) = -log (1/f(z)) + E(x,z)
Minimum Energy solution is the same as MAP solution
MAP; Global min E
x* = argmin E(x,z)
ML
f(z,w) = ∑ exp{-E(x,z)}X
X
P(x|z) = 1/f(z) exp{-E(x,z)}
x*= argmax P(x|z) x
maximum-a-posteriori (MAP) solution
![Page 18: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/18.jpg)
Recap• Posterior, Likelihood, Prior
P(x|z) = P(z|x) P(x) / P(z)
• Gibbs distribution: P(x|z) = 1/f(z) exp{-E(x,z)}
• Energy minimization same as MAP estimationx* = argmax P(x|z)= argmin E(x)
xx
![Page 19: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/19.jpg)
Weighting of Unary and Pairwise term
w =0
E(x,z,w) = ∑ θi (xi,zi) + w∑ θij (xi,xj)
w =10
w =200w =40
![Page 20: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/20.jpg)
Learning versus Optimization/PredictionGibbs distribution: P(x|z,w) = 1/f(z,w) exp{-E(x,z,w)}
Testing phase: infer x which does depends on test image z
Training phase: infer w which does not depend on a test image z
{xt,zt} => w
z,w => x
ztzt xt
z
=>
![Page 21: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/21.jpg)
A simple procedure to learn w
Questions: - Is it the best and only way?- Can we over-fit to training data?
w
1. Iterate w = 0,…,400 1. Compute x*t for all training images {xt,zt}
2. Compute average error Er = 1/|T| ∑
with loss function: (Hamming error)
2. Take w with smallest Er
Er
Δ(xt,x*t)
Hamming error: number of misclassified pixels
Δ(x,x*) = ∑ xi xi*i
t
![Page 22: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/22.jpg)
Model : discrete or continuous variables? discrete or continuous space? Dependence between variables? …
Big Picture: Statistical Models in Computer Vision
Optimisation/Prediction/inference Combinatorial optimization:
e.g. Graph Cut Message Passing: e.g. BP, TRW Iterated Conditional Modes (ICM) LP-relaxation: e.g. Cutting-plane Problem decomposition + subgradient …
Learning: Maximum Likelihood Learning
Pseudo-likelihood approximation Loss minimizing Parameter Learning
Exhaustive search Constraint generation …
Applications: 2D/3D Image segmentation Object Recognition 3D reconstruction Stereo matching Image denoising Texture Synthesis Pose estimation Panoramic Stitching …
![Page 23: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/23.jpg)
Machine Learning view:Structured Learning and Prediction
”Normal” Machine Learning:
f : Z N (classification) f : Z R (regression)
Input: Image, textOutput: real number(s)
f : Z X
Input: Image , textOutput: complex structure object
(labelling, parse tree) Parse tree of a sentence
Image labellingChemical structure
Structured Output Prediction:
![Page 24: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/24.jpg)
Structured Output
Ad hoc definition (from [Nowozin et al. 2011])Data that consists of several parts, and not only the parts themselves contain information, but also the way in which the parts belong together.
![Page 25: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/25.jpg)
Learning: A simple toy problem
Label generation:
Data generation:
“small deviation of a 2x2 foreground (white) square at arbitrary position”
1. Foreground pixels are white, Background black2. Flip label of a few random pixels3. Add some Gaussian noise
Example man-made object detection [Nowozin and Lampert ‘2011]
![Page 26: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/26.jpg)
A possible model for the dataIsing model on 4x4 grid graph: P(x|z,w) = 1/f(z,w) exp{-( ∑ (zi(1-xi)+(1-zi)xi) + w∑ |xi-xj| )}
i i,j Є N4
Unary term Pairwise terms
Data z:
Label x:
![Page 27: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/27.jpg)
Decision TheoryAssume w has been learned and P(x|z,w) is:
Which solution x* would you choose?
Best Solutions sorted by probability Worst Solutions sorted by probability
Distribution
216 configurations
Prob
abili
ty
![Page 28: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/28.jpg)
How to make a decision
Risk R is the expected loss:
“loss function”
Goal: Choose x* which minimizes the risk R
Assume model P(x|z,w) is known
R = ∑ P(x|z,w) Δ(x,x*)x
![Page 29: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/29.jpg)
Decision Theory
Best Solutions sorted by probability Worst Solutions sorted by probability
0/1 loss:Δ(x,x*) = 0 if x*=x, 1 otherwise
Risk: R = ∑ P(x|z,w) Δ(x,x*)x
MAP x* = argmax P(x|z,w) x
![Page 30: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/30.jpg)
Decision Theory
Best Solutions sorted by probability Worst Solutions sorted by probability
Risk: R = ∑ P(x|z,w) Δ(x,x*)x
Hamming loss:Δ(x,x*) = ∑ xi xi*
i
Maximize Marginals: xi* = argmax P(xi|z,w)xi
![Page 31: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/31.jpg)
Decision Theory
Best Solutions sorted by probability Worst Solutions sorted by probability
Maximize Marginals: xi* = argmax P(xi|z,w)xi
Marginal: P(xi=k) = ∑ P(x1,…,xi=k,…,xn)
Xj\i
Computing marginals is sometimes called “probabilistic inference” different to MAP inference.
![Page 32: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/32.jpg)
Recap
A different loss function gives a very different solution !
![Page 33: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/33.jpg)
Two different approaches to learning
1. Probabilistic Parameter Learning:
“P(x|z,w) is needed”
2. Loss-based Parameters Learning
“E(x,z,w) is sufficient”
![Page 34: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/34.jpg)
Probabilistic Parameter Learning
{xt,zt} w* = argmin Π –log P(xt|zt,w)+|w|2
Choose a Loss
0/1 loss
Hamming loss Regularized Maximum
Likelihood estimation
Construct thedecision function
Test time:
optimize decision function for new test image z, e.g. x* = argmax P(x|z,w)
Trainingdatabase
w t
x
Training:
x
It is:P(w|zt,xt) ~ P(xt|w,zt) P(w|zt)
x* = argmax P(x|z,w)
Learn weights
xi
x* = argmax P(xi|z,w)
![Page 35: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/35.jpg)
ML estimation for our toy image
Images zt
Labels xt
P(x|z,w) = 1/f(z,w) exp{-( ∑ (zi(1-xi)+(1-zi)xi) + w∑ |xi-xj| )}
i i,j Є N4
Train:w* = argmin ∑ -log P(xt|zt,w)
w
PLOT
t
1/|T| ∑ -log P(xt|zt,w)t
How many training images?
![Page 36: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/36.jpg)
ML estimation for or toy image
Images zt
Labels xt
P(x|z,w) = 1/f(z,w) exp{-( ∑ (zi(1-xi)+(1-zi)xi) + w∑ |xi-xj| )}
i i,j Є N4
Train:w* = argmin ∑ -log P(xt|zt,w) = 0.8 w t
Exhaustive search:
Testing (1000 images): 1. MAP (0/1 Loss):
av. Error 0/1: 0.99; av. Error Hamming: 0.322. Marginals (Hamming Loss): av. Error 0/1: 0.92; av. Error Hamming: 0.17
![Page 37: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/37.jpg)
ML estimation for or toy image
So, probabilistic inference is better than MAP inference … since better loss function
Example test results
![Page 38: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/38.jpg)
Two different approaches to learning
1. Probabilistic Parameter Learning:
“P(x|z,w) is needed”
2. Loss-based Parameters Learning
“E(x,z,w) is sufficient”
![Page 39: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/39.jpg)
Loss-based Parameter learning
“loss function”Minimize R = ∑ P(x|z,w) Δ(x,x*)
x
“Replace this by samples from the true distribution, i.e. training data”
How much training data is needed?
R = 1/|T| ∑ Δ(xt,x*t)~t
with:x* = argmax P(x|z,w)
x
![Page 40: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/40.jpg)
Loss-based Parameter learning
Testing 1. 0/1 Loss (w=0.2)
Error 0/1: 0.69; Error Hamming: 0.11
2. Hamming Loss (w=0.1)Error 0/1: 0.7; Error Hamming: 0.10
Minimize R = 1/|T| ∑ Δ(xt,x*t)t
x* = argmax P(x|z,w)x
Search: 0/1 loss Search: Hamming loss
![Page 41: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/41.jpg)
Loss-based Parameter learningExample test results
0/1 Loss Hamming Loss
![Page 42: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/42.jpg)
Which approach is better?
Model mismatch: our model cannot represent the true distribution of the training data!… and we probably always have that in vision
Comment: marginals do also give an uncertainty for every pixel which can be used in a bigger systems
Hamming Test Error:
1. ML: MAP (0/1 Loss) - Error 0.322. ML: Marginals (Hamming Loss) - Error 0.173. Loss-based: MAP (0/1 Loss) - Error 0.114. Loss-based: MAP (Ham. Loss) - Error 0.10
Why are Loss-based methods much better?
![Page 43: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/43.jpg)
Check: sample from true model (w=0.8)
Data:
Sampled Label:
My toy data labelling:
Re-train gives w=0.8
![Page 44: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/44.jpg)
A real world application: Image denoising
Z1..m
Ground truthsTrain images
Model: 4-connected graph with 64 labels and total 128 weights
ML training: MAP (image 0-1 loss)
ML training: MMSE (pixel-wise squared loss)
Test image - true
Input test image - noisy
x1..m
[see details in: Putting MAP back on the map, Pletscher et al. DAGM 2010]
zoom
![Page 45: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/45.jpg)
Example – Image denoising
Loss-based MAP (pixel-wise squared loss)
Test image - true
Input test image - noisy
Z1..m
Ground truthsTrain imagesx1..m
![Page 46: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/46.jpg)
Comparison of the two pipelines: models
Loss-minimizing
Probabilistic
Unary potential: |zi-xi| Pairwise potential: |xi-xj|
Unary potential: |zi-xi| Pairwise potential: |xi-xj|
Data z
Lable x
![Page 47: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/47.jpg)
Comparison of the two pipelines
[see details in: Putting MAP back on the map, Pletscher et al. DAGM 2010]
Deviation from true model
Pred
ictio
n er
ror
![Page 48: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/48.jpg)
Recap
• Loss functions
• Two Pipelines for Parameter learning– Loss-based– Probabilistic
• MAP inference is good, if trained well
![Page 49: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/49.jpg)
Another Machine Learning view
We can identify 3 different approaches: [see details in Bishop, page 42ff]:
• Generative (probabilistic) models
• Discriminative (probabilistic) models
• Discriminative functions
![Page 50: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/50.jpg)
Generative modelModels that model explicitly (or implicitly) the distribution of the in- and output
Joint Probability: P(x,z) = P(z|x) P(x)
Pros: 1. Most elaborate model 2. possible to sample both, x and z
Cons: might not always be possible to write down the full distribution (involves a distribution over images).
likelihood prior
![Page 51: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/51.jpg)
Generative Model: ExampleP(x,z) = P(z|x) P(x)
x
P(z|x) as GMMs
P(x) = 1/f ∏ exp{-|xi-xj|} Ising Prior i,j Є N4
z
Samples: True image:
Most likely:
![Page 52: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/52.jpg)
Why does segmentation still work?
P(x|z) = 1/P(z) P(z,x)
Remember:P(x|z) = 1/f(z) exp{-E(x,z)}
We use the posterior not the joint, so image z is given:
Comments:- a better likelihood p(z|x) may give a better model- when you test models keep in mind that data is never random it is very structured!
z Samples x
Samples from the toy-model (with strong likelihood):
![Page 53: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/53.jpg)
Discriminative model
P(x|z) = 1/f(z) exp{-E(x,z)}
Models that model the Posterior directly are discriminative models:
We later call them: “Conditional random field”
Pros: 1. simpler to write down (no need to model z)and goes directly for the desired output x
2. probability can be used in bigger systems
Cons: we can not sample images z
![Page 54: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/54.jpg)
Discriminative model - Example
Gibbs: P(x|z) = 1/f(z) exp{-E(x,z)}
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj,zi,zj)i i,j Є N4
θij (xi,xj,zi,zj) = |xi-xj| (-exp{-ß||zi-zj||})
ß=2(Mean(||zi-zj||2) )-1
||zi-zj||
θij
Ising Edge-dependent
![Page 55: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/55.jpg)
Discriminative functions
E(x,z): Ln -> R
Models that model the classification problem via a function
Examples: - Energy which has been Loss-based trained - support vector machines - decision trees
Pros: most direct approach to model the problem
Cons: no probabilities
x* = argmax E(x,z)x
![Page 56: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/56.jpg)
Recap
• Generative (probabilistic) models
• Discriminative (probabilistic) models
• Discriminative functions
![Page 57: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/57.jpg)
Image segmentation … the full story
… a meeting with the Queen
![Page 58: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/58.jpg)
Segmentation [Boykov& Jolly ICCV ‘01]
Fp = ∞ Bp = 0
E(x) =
Image zand user input
Output x* = argmin E(x) ϵ {0,1}
∑ Fp xp+ Bp (1-xp) + ∑ wpq|xp-xq|
Graph Cut: Global optimum in polynomial time ~0.3sec for 1MPixel image [Boykov, Kolmogorov, PAMI ‘04]
wpq = wi + wc exp(-wβ||zp-zq||2)
Fp = 0 Bp = ∞
pq ϵ E p ϵ V
x
How to prevent the trivial solution?
![Page 59: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/59.jpg)
What is a good segmentation?
Objects (fore- and background) are self-similar wrt appearance
Input Image
Option 1 Option 2 Option 3
Eunary(x, θF,θB) = -log p(z|x, θF,θB) = ∑ -log p(zp|θF) xp -log p(zp|θB) (1-xp)
p ϵ V
Eunary = 460000 Eunary = 482000 Eunary = 483000
foreground background foreground background foreground background
θF θB θF θB θF θB
θF θBx
z
![Page 60: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/60.jpg)
GrabCut[Rother, Kolmogorov, Blake, Siggraph ‘04]
Background
Foreground G
R
Fp(θF) = -log p(zp|θF) Bp(θB) = -log p(zp|θB)
E(x,θF,θB) =∑ Fp(θF)xp+ Bp(θB)(1-xp) + ∑ wpq|xp-xq|pq Є E pЄV
“others”
Output GMMs θF,θB
Problem: Joint optimization of x,θF,θB is NP-hard
Image zand user input
Output xϵ {0,1}
Fp = ∞ Bp = 0
Fp = 0 Bp = ∞
![Page 61: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/61.jpg)
GrabCut: Optimization[Rother, Kolmogorov, Blake, Siggraph ‘04]
Learning of the colour distributions
Graph cut to infer segmentation
xmin E(x, θF, θB) θF,θB
min E(x, θF, θB)
Image zand user input
Initial segmentation x
![Page 62: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/62.jpg)
1 2 3 4
GrabCut: Optimization
Energy after each IterationResult0
![Page 63: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/63.jpg)
GrabCut: Optimization
Background
Foreground & Background G
RBackground
Foreground G
RIterated
graph cut
![Page 64: Advanced Computer Vision (Module 5F16) Carsten Rother Pushmeet Kohli.](https://reader034.fdocuments.in/reader034/viewer/2022052701/56649d0c5503460f949e107d/html5/thumbnails/64.jpg)
Summary
– Intro: Probabilistic models
– Two different approaches for learning
– Generative/discriminative models, discriminative functions
– Advanced segmentation system: GrabCut