Lecture IV: A Bayesian Viewpoint on Sparse Models
-
Upload
zivanka-rumer -
Category
Documents
-
view
36 -
download
0
description
Transcript of Lecture IV: A Bayesian Viewpoint on Sparse Models
![Page 1: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/1.jpg)
Lecture IV:A Bayesian Viewpoint on Sparse
Models
Yi Ma John WrightMicrosoft Research Asia Columbia University
(Slides courtesy of David Wipf, MSRA)
IPAM Computer Vision Summer School, 2013
![Page 2: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/2.jpg)
Convex Approach to Sparse Inverse Problems
1. Ideal (noiseless) case:
2. Convex relaxation (lasso):
¨ Note: These may need to be solved in isolation, or embedded in a larger system depending on the application
1
2
2min xxy
x
. , s.t. min0
mnR xyxx
![Page 3: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/3.jpg)
When Might This Strategy Be Inadequate?
Two representative cases:
1. The dictionary F has coherent columns.
2. There are additional parameters to estimate, potentially embedded in F.
The ℓ1 penalty favors both sparse and low-variance
solutions. In general, the cause of ℓ1 failure is always that the later influence can sometimes dominate.
![Page 4: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/4.jpg)
Dictionary Correlation Structure
T T
Examples:
Unstructured
Example:
Structured
( ) ( ) A Bstr unstr
arbitrary blockdiagonal
( ) iid (0,1) entries unstr N
( ) random rows of DFT unstr
![Page 5: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/5.jpg)
Block Diagonal Example
¨ The ℓ1 solution typically selects either zero or one basis vector from each cluster of correlated columns.
¨ While the ‘cluster support’ may be partially correct, the chosen basis vectors likely will not be.
( ) ( ) Bstr unstr
blockdiagonal
( ) ( ) Tstr str
Problem:
![Page 6: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/6.jpg)
Dictionaries with Correlation Structures
¨ Most theory applies to unstructured incoherent cases, but many (most?) practical dictionaries have significant coherent structures.
¨ Examples:
![Page 7: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/7.jpg)
MEG/EEG Example
?
F
source space (x) sensor space (y)
¨ Forward model dictionary F can be computed using Maxwell’s equations [Sarvas,1987].
¨ Will be dependent on location of sensors, but always highly structured by physical constraints.
![Page 8: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/8.jpg)
MEG Source Reconstruction Example
Ground Truth Group Lasso Bayesian Method
![Page 9: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/9.jpg)
Bayesian Formulation¨ Assumptions on the distributions:
¨ This leads to the MAP estimate:
. ,0 ; i.e. ,||2
1exp)|(
prior sparse general a ,2
1exp)(
INp
gxgpi
i
xy||x-yxy
x
22
|)log(|)( e.g. )( ||1
min
).()|(maxarg )|(maxarg*
iii
i xxgxg
ppp
22
x||x-y
xxyyxx
![Page 10: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/10.jpg)
Latent Variable Bayesian Formulation
Sparse priors can be specified via a variational form in terms of maximizing scaled Gaussians:
where or are latent variables.
is a positive function, which can be chose to define any sparse priors (e.g. Laplacian, Jeffreys, generalized Gaussians etc.) [Palmer et al., 2006].
iiii
iiii
ii
xNp
xNxpxppi
)(),0;()(
)(),0;( max)( ),()(0
x
x
![Page 11: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/11.jpg)
Posterior for a Gaussian Mixture
For a fixed , with the prior:
the posterior is a Gaussian distribution:
The “optimal estimate” for x would simply be the mean
but this is obviously not optimal…
,)(),0;()( i
iiixNp x
.)I(
,)I(
),;(N~)(p)|(p)|(p
1TT
1TTx
xx
x
y
xxxyyx
.)I( 1TTx yx
![Page 12: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/12.jpg)
Approximation via Marginalization
iiiixNp
ppp
i)(),0;(max)|(maxarg
).(max)|(maxarg)|(maxarg*
xy
xxyyxx
x
xx
We want to approximate
. fixed somefor )()|( )(max)|( **
xxyxxy pppp
)]()()[|(minarg
)(),0;()|(maxarg*
xxxxy
xy
dppp
dxxNp ii
iii
Find that maximizes the expected value with respect to x:
![Page 13: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/13.jpg)
Latent Variable Solution
).(log2||logminarg
)(),0;()|(log2minarg
)(),0;()|(maxarg
1
*
ii
T
ii
iii
ii
iii
dxxNp
dxxNp
yy yy
xy
xy
.TI ywith
,||||1
min 11 xxxyyy 22xy
TT
.)( 1* yx TT I
![Page 14: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/14.jpg)
MAP-like Regularization
)(
)(2
1
,
**
)(log2||logmin||||1
minarg
)(log2||log||||1
minarg),(
x
y22
x
y22
x
xy
xxxyx
g
i
f
iii
i
ii
T
i
i
x
Very often, for simplicity, we often choose
Notice that g(x) is in general not separable:
.)( )(logmin)(2
i
ii
iT
i
i xgfIx
gi
x
.constant) (a )( bf i
![Page 15: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/15.jpg)
Theorem. When is a concave, nondecreasingfunction of |x|. Also, any local solution x* has at most n nonzeros.
)g( ,)( xbf i
Theorem. When the program has no local minima. Furthermore, g(x) becomes separable and has the closed form
which is a non-descreasing strictly concave function on
, ,)( Ibf Ti
4||2log4||
||2)()( 22
2
iii
i i ii
ii xxx
xx
xxgg x
.|| ix
Tipping, 2001; Wipf and Nagarajan, 2008[ ]
Properties of the Regularizer
![Page 16: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/16.jpg)
Smoothing Effect: 1D Feasible Region
( )
0.01
0
II
ii
g
x x
x
pen
alt
y va
lue
0
ull
where is a scalar
= maximally sparse solution
N
v
x0 x x v
xg
0 xy
![Page 17: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/17.jpg)
Noise-Aware Sparse Regularization
1
0
)( ,
|)log(|)( ,0
x
x
ii
ii
ii
xg
xxg
![Page 18: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/18.jpg)
Philosophy
¨ Literal Bayesian: Assume some prior distribution on unknown parameters and then justify a particular approach based only on the validity of these priors.
¨ Practical Bayesian: Invoke Bayesian methodology to arrive at potentially useful cost functions. Then validate these cost functions with independent analysis.
![Page 19: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/19.jpg)
¨ Candidate sparsity penalty:
primal dual
Aggregate Penalty Functions
i
T
i
idual I
xg )diag(logmin)(
2
x )||diag(log)( Tprimal Ig xx
|)log(|)( i
iprimal xg x
ii
i
idual
xg )log(min)(
2
0
x
Tipping, 2001; Wipf and Nagarajan, 2008[ ]
NOTE: If l → 0, both penalties have same minimum as ℓ0 norm
If l → , both converge to scaled versions of the ℓ1 norm.
![Page 20: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/20.jpg)
How Might This Philosophy Help?
¨ Consider reweighted ℓ1 updates using primal-space penalty
(1)
1(1) (1) diagprimal T Ti i i
i
gw I
x
x x
xx
(1) arg min s.t. iix
xx y x
Initial ℓ1 iteration with w(0) = 1:
Weight update:
Reflects the subspace of all active columns*and* any columns of F that are nearby
Correlated columns will produce similar weights, small if in the active subspace, large otherwise.
![Page 21: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/21.jpg)
Basic Idea
¨ Initial iteration(s) locate appropriate groups of correlated basis vectors and prune irrelevant clusters.
¨ Once support is sufficiently narrowed down, then regular ℓ1 is sufficient.
¨ Reweighed ℓ1 iterations naturally handle this transition.
¨ The dual-space penalty accomplishes something similar and has additional theoretical benefits …
![Page 22: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/22.jpg)
Alternative ApproachWhat about designing an ℓ1 reweighting function
directly?
¨ Iterate:
¨ Note: If f satisfies relatively mild properties there will exist an associated sparsity penalty that is being minimized.
( 1) ( )arg min s.t. k ki ii
w x x
x y x
( 1) ( 1) k kf w x
Can select f without regard to a specific penalty function
![Page 23: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/23.jpg)
¨ Implicit penalty function can be expressed in integral form for certain selections for p and q.
¨ For the right choice of p and q, has some guarantees for clustered dictionaries …
Example f(p,q)
1( 1) ( 1) diag
qp
k T k Ti i iw I
x
, 0p q
![Page 24: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/24.jpg)
(unstr) (str) (unstr)N(0,1), D
Convenient optimization via reweighted ℓ1 minimization [Candes 2008]
Provable performance gains in certain situations [Wipf
2013]
Toy Example: Generate 50-by-100
dictionaries:
Generate a sparse x
Estimate x from observations
Numerical Simulations
bayesian, F(unstr)
bayesian, F(str)
standard, F(unstr)
standard, F(str)
0x
succ
ess
rate
(unstr) (unstr) (str ) (str ) , y x y x
B
![Page 25: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/25.jpg)
Summary
¨ In practical situations, dictionaries are often highly structured.
¨ But standard sparse estimation algorithms may be inadequate in this situation (existing performance guarantees do not generally apply).
¨ We have suggested a general framework that compensates for dictionary structure via dictionary-dependent penalty functions.
¨ Could lead to new families of sparse estimation algorithms.
![Page 26: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/26.jpg)
Dictionary Has Embedded Parameters
1. Ideal (noiseless) :
2. Relaxed version:
¨ Applications: Bilinear models, blind deconvolution, blind image deblurring, etc.
1
2
2,min xxky
kx
xkyxkx
s.t. min0,
![Page 27: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/27.jpg)
Blurry Image Formation
¨ Relative movement between camera and scene during exposure causes blurring:
single blurrymulti-blurryblurry-noisy
[Whyte et al., 2011]
![Page 28: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/28.jpg)
Blurry Image Formation¨ Basic observation model (can be generalized):
blurryimage
blur kernel
sharpimage
noise
![Page 29: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/29.jpg)
Blurry Image Formation¨ Basic observation model (can be generalized):
blurryimage
blur kernel
sharpimage
noise
√ ? ?Unknown quantities we would like to estimate
![Page 30: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/30.jpg)
Gradients of Natural Images are Sparse
Hence we work in gradient domain
: vectorized derivatives of the sharp image: vectorized derivatives of the blurry image
![Page 31: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/31.jpg)
Blind Deconvolution
¨ Observation model:
¨ Would like to estimate the unknown x blindly since k is also unknown.
¨ Will assume unknown x is sparse.
nxknxky convolution
operatortoeplitz matrix
![Page 32: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/32.jpg)
Attempt via Convex Relaxation
Solve:
Problem:
¨ So the degenerate, non-deblurred solution is favored:
xkyxkx
s.t. min1, k
ikk ii
ik ,0 ,1 : k
xk, feasible
I kk ,
11
11
xxxy tt
tt
tt kk
translated image superimposed
![Page 33: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/33.jpg)
Bayesian Inference
¨ Assume priors p(x) and p(k) and likelihood p(y|x,k).
¨ Compute the posterior distribution via Bayes Rule:
¨ Then infer x and or k using estimators derived from p(x,k|y), e.g., the posterior means, or marginalized means.
y
kxkxyykx
p
pppp
,||,
![Page 34: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/34.jpg)
Bayesian Inference: MAP Estimation
¨ Assumptions:
¨ Solve:
¨ This is just regularized regression with a sparse penalty that reflects natural image statistics.
iixg
ppp
k
kk
2
2,
,,
1minarg
)(log),|(logminarg)|,(maxarg
xky
xkxyykx
kx
kxkx
INp
p
gxgp
k
ii
0, ;:),|(
)0 1,||||(say set over uniform:)(
images natural from estimated ,2
1exp:)(
1
xkykxy
kkk
x
![Page 35: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/35.jpg)
Failure of Natural Image Statistics¨ Shown in red are 15 X 15 patches where
(Standardized) natural image gradient statistics suggest
xky with i
p
ii
p
i yx
8.0,5.0p
p
xxp2
1exp
[Simoncelli, 1999]
![Page 36: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/36.jpg)
The Crux of the Problem
¨ MAP only considers the mode, not the entire location of prominent posterior mass.
¨ Blurry images are closer to the origin in image gradient space; they have higher probability but lie in a restricted region of relatively low overall mass which ignores the heavy tails.
Natural image statistics are not the best choice with MAP, they favor blurry images more than sharp ones!
feasible set
sharp: sparse, high variance
blurry: non-sparse, low variance
![Page 37: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/37.jpg)
¨ Rather than accurately reflecting natural image statistics, for MAP to work we need a prior/penalty such that
¨ Theoretically ideal … but now we have a combinatorial optimization problem, and the convex relaxation provably fails.
Lemma: Under very mild conditions, the ℓ0 norm (invariant to changes in variance) satisfies:
with equality iff k = d. (Similar concept holds when x is not exactly sparse.)
An “Ideal” Deblurring Cost Function
pairs blurry , sharp yxi
ii
i ygxg
00xkx
![Page 38: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/38.jpg)
Local Minima Example
¨ 1D signal is convolved with a 1D rectangular kernel
¨ MAP estimation using ℓ0 norm implemented with IRLS minimization technique.
Provable failure because of convergence to local minima
![Page 39: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/39.jpg)
Motivation for Alternative Estimators
¨ With the ℓ0 norm we get stuck in local minima.
¨ With natural image statistics (or the ℓ1 norm) we favor the degenerate, blurry solution.
¨ But perhaps natural image statistics can still be valuable if we use an estimator that is sensitive to the entire posterior distribution (not just its mode).
![Page 40: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/40.jpg)
Latent Variable Bayesian Formulation
¨ Assumptions:
¨ Follow the same process as the general case, we have:
INp
p
fxNxpxpp
k
iiii
iii
0, ;:),|(
)0 1,||||(say set over uniform:)(
)(2
1exp) ,0 ;( max)( with ),(:)(
1
0
xkykxy
kkk
x
),,(
22
0,)()||||log(min||||
1min
kx
222
kxkxky
VB
i
g
iii
i
i fx
![Page 41: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/41.jpg)
¨ Choosing p(x) is equivalent to choosing function f embedded in gVB.
¨ Natural image statistics seem like the obvious choice [Fergus et al., 2006; Levin et al., 2009].
¨ Let fnat denote the f function associated with such a prior (it can be computed using tools from convex analysis [Palmer et al., 2006]).
¨ So the implicit VB image penalty actually favors the blur
solution even more than the original natural image statistics!
(Di)Lemma:
is less concave in |x| than the original image prior [Wipf and
Zhang, 2013].
Choosing an Image Prior to Use
i
iii
iVB f
xg
i
nat
2
2
2
0loginf,, kkx
![Page 42: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/42.jpg)
Practical Strategy¨ Analyze the reformulated cost function independently of its
Bayesian origins.
¨ The best prior (or equivalently f ) can then be selected based on properties directly beneficial to deblurring.
¨ This is just like the Lasso: We do not use such an ℓ1 model because we believe the data actually come from a Laplacian distribution.
Theorem. When has the closed form
with
),,( ,)( kxVBi gbf
4||2log4||
||2)(),( 22
2
iii
i i ii
ii xxx
xx
xxgg x
22k ||||
![Page 43: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/43.jpg)
Sparsity-Promoting Properties
If and only if f is constant, then gVB satisfies the following:
¨ Sparsity: Jointly concave, non-decreasing function of |xi| for all i.
¨ Scale-invariance: Constraint set Wk on k does not affect solution.
¨ Limiting cases:
¨ General case:
12
2
02
2
of verion caled ,, then If
of verion caled ,,n the0 If
xkxk
xkxk
sg
sg
VB
VB
bbVBaaVB
b
b
a
a gg ,, torelative concave is ,, then If 2
2
2
2
kxkxkk
[Wipf and Zhang, 2013]
![Page 44: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/44.jpg)
Why Does This Help?
¨ gVB is a scale-invariant sparsity penalty that interpolates between the ℓ1 and ℓ0 norms
¨ More concave (sparse) if¨ l is small (low noise, modeling error)¨ k norm is big (meaning the kernel is sparse)¨ These are the easy cases
¨ Less concave if¨ l is big (large noise or kernel errors near the beginning of
estimation)¨ k norm is small (kernel is diffuse, before fine scale details are
resolved)
0 1 2 3 4 50
1
2
1.5
2
2.5
z
pen
alty
val
ue
Relative Sparsity Curve
1=0.012=1
This shape modulation allows VB to avoid local minima initially while automatically introducing additional non-convexity to resolve fine details as estimation progresses.
![Page 45: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/45.jpg)
Local Minima Example Revisited
¨ 1D signal is convolved with a 1D rectangular kernel
¨ MAP using ℓ0 norm versus VB with adaptive shape
![Page 46: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/46.jpg)
Remarks¨ The original Bayesian model, with f constant, results
from the image prior (Jeffreys prior)
¨ This prior does not resemble natural images statistics at all!
i
i xxp
1
¨ Ultimately, the type of estimator may completely determine which prior should be chosen.
¨ Thus we cannot use the true statistics to justify the validity of our model.
![Page 47: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/47.jpg)
Variational Bayesian Approach
¨ Instead of MAP:
¨ Solve
¨ Here we are first averaging over all possible sharp images, and natural image statistics now play a vital role
)|,(pmaxk,
ykxkx
xykxykkk
dppkk
)|,(max )|(max
Lemma: Under mild conditions, in the limit of large images, maximizing p(k|y) will recover the true blur kernel k if p(x) reflects the true statistics.
[Levin et al., 2011]
![Page 48: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/48.jpg)
Approximate Inference
¨ The integral required for computing p(k|y) is intractable.
¨ Variational Bayes (VB) provides a convenient family of upper bounds for maximizing p(k|y) approximately.
¨ Technique can be applied whenever p(x) is expressible in a particular variational form.
![Page 49: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/49.jpg)
Maximizing Free Energy Bound¨ Assume p(k) is flat within constraint set, so we
want to solve:
¨ Useful bound [Bishop 2006]:
¨ Minimization strategy (equivalent to EM algorithm):
¨ Unfortunately, updates are still not tractable.
γx
γx
kyγxγxγxkky dd
q
pqqp
,
|,,log, ,,F |log
ykγxγx ,|, , pq with equality iff
,,F max
, ,γxk
γxkq
qk
)|(max kyk
pk
![Page 50: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/50.jpg)
Practical Algorithm¨ New looser bound:
¨ Iteratively solve:
¨ Efficient, closed-form updates are now possible because the factorization decouples intractable terms.
γx
kyγxγxkky dd
qxq
pqxqqp
ii
ii
ii
|,,log ,,F |log
i
ii
qqxqqq γxγxk
γxk, s.t. ,,F max
,,
[Palmer et al., 2006; Levin et al., 2011]
![Page 51: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/51.jpg)
Questions¨ The above VB has been motivated as a way of
approximating the marginal likelihood p(y|k).
¨ However, several things remain unclear:
¨ What is the nature of this approximation, and how good is it?
¨ Are natural image statistics a good choice for p(x) when using VB?
¨ How is the underlying cost function intrinsically different from MAP?
¨ A reformulation of VB can help here …
![Page 52: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/52.jpg)
EquivalenceSolving the VB problem
is equivalent to solving the MAP-like problem
,,min2
2,kxxky
kxVBg
k
i
iii
iVB f
xg
i
2
2
2
0loginf,, kkx
i
ii
qqxqqq
k
, s.t. ,,F max
, ,γxγxk
γxk
where
[Wipf and Zhang, 2013]
function that depends only on p(x)
![Page 53: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/53.jpg)
Remarks
¨ VB (via averaging out x) looks just like standard penalized regression (MAP), but with a non-standard image penalty gVB whose shape is dependent on both the noise variance lambda and the kernel norm.
¨ Ultimately, it is this unique dependency which contributes to VB’s success.
![Page 54: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/54.jpg)
![Page 55: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/55.jpg)
Blind Deblurring Results
Levin et al. dataset [CVPR, 2009]¨ 4 images of size 255 × 255 and 8 different empirically
measured ground-truth blur kernels, giving in total 32 blurry images
x1 x2x4x3
K1-K4
K5-K8
Imag
es
Blu
r K
ern
els
![Page 56: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/56.jpg)
Comparison of VB Methods
Note: VB-Levin and VB-Fergus are based on natural image statistics [Levin et al., 2011; Fergus et al., 2006]; VB-Jeffreys is based on the theoretically motivated image prior.
![Page 57: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/57.jpg)
Comparison with MAP Methods
Note: MAP methods [Shan et al., 2008; Cho and Lee, 2009; Xu and Jia, 2010] rely on carefully-defined structure selection heuristics to local salient edges, etc., to avoid the no-blur (delta) solution. VB requires no such added complexity.
![Page 58: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/58.jpg)
ExtensionsCan easily adapt the VB model to more general scenarios:
1. Non-uniform convolution models
2. Multiple images for simultaneous denoising and deblurring
Blurry image is a superposition of translated and rotated sharp images
Blurry Noisy
[Yuan, et al., SIGGRAPH, 2007]
![Page 59: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/59.jpg)
Non-Uniform Real-World Deblurring
Blurry Whyte et al. Zhang and Wipf
O. Whyte et al. , Non-uniform deblurring for shaken images, CVPR, 2010.
![Page 60: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/60.jpg)
Non-Uniform Real-World Deblurring
Blurry Gupta et al. Zhang and Wipf
S. Hirsch et al. , Single image deblurring using motion density functions, ECCV, 2010.
![Page 61: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/61.jpg)
Non-Uniform Real-World Deblurring
Blurry Joshi et al. Zhang and Wipf
N. Joshi et al. , Image deblurring using inertial measurement sensors, SIGGRAPH, 2010.
![Page 62: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/62.jpg)
Non-Uniform Real-World Deblurring
Blurry Hirsch et al. Zhang and Wipf
S. Hirsch et al. , Fast removal of non-uniform camera shake, ICCV, 2011.
![Page 63: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/63.jpg)
Dual Motion Blind Deblurring Real-world Image
Test images from: J.-F. Cai, H. Ji, C. Liu, and Z. Shen. Blind motion deblurring usingmultiple images. J. Comput. Physics, 228(14):5057–5071, 2009.
Blurry I
![Page 64: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/64.jpg)
Dual Motion Blind Deblurring Real-world Image
64
Test images from: J.-F. Cai, H. Ji, C. Liu, and Z. Shen. Blind motion deblurring usingmultiple images. J. Comput. Physics, 228(14):5057–5071, 2009.
Blurry II
![Page 65: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/65.jpg)
Dual Motion Blind Deblurring Real-world Image
J.-F. Cai, H. Ji, C. Liu, and Z. Shen. Blind motion deblurring using multiple images. J. Comput. Physics, 228(14):5057–5071, 2009.
Cai et al.
![Page 66: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/66.jpg)
Dual Motion Blind Deblurring Real-world Image
F.Sroubek and P. Milanfar. Robust multichannel blind deconvolution via fast alternating minimization. IEEE Trans. on Image Processing, 21(4):1687–1700, 2012.
Sroubek et al.
![Page 67: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/67.jpg)
Dual Motion Blind Deblurring Real-world Image
Zhang et al.
H. Zhang, D.P. Wipf and Y. Zhang, Multi-Image Blind Deblurring Using a Coupled Adaptive Sparse Prior, CVPR, 2013.
![Page 68: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/68.jpg)
Dual Motion Blind Deblurring Real-world Image
Zhang et al.Cai et al. Sroubek et al.
![Page 69: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/69.jpg)
Dual Motion Blind Deblurring Real-world Image
Zhang et al.Cai et al. Sroubek et al.
![Page 70: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/70.jpg)
Take-away Messages¨ In a wide range of applications, convex
relaxations are extremely effective and efficient.
¨ However, there remain interesting cases where non-convexity still plays a critical role.
¨ Bayesian methodology provides one source of inspiration for useful non-convex algorithms.
¨ These algorithms can then often be independently justified without reliance on the original Bayesian statistical assumptions.
![Page 71: Lecture IV: A Bayesian Viewpoint on Sparse Models](https://reader036.fdocuments.in/reader036/viewer/2022062309/568138a4550346895da06089/html5/thumbnails/71.jpg)
Thank you, questions?
• D. Wipf and H. Zhang, “Revisiting Bayesian Blind Deconvolution,” arXiv:1305.2362, 2013.
• D. Wipf, “Sparse Estimation Algorithms that Compensate for Coherent Dictionaries,” MSRA Tech Report, 2013.
• D. Wipf, B. Rao, S. Nagarajan, “Latent Variable Bayesian Models for Promoting Sparsity,” IEEE Trans. Info Theory, 2011.
• A. Levin, Y. Weiss, F. Durand, and W.T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” Computer Vision and Pattern Recognition (CVPR), 2009.
References