Learning Inhomogeneous Gibbs Models
How to Describe the Virtual World
Histogram Histogram: marginal distribution of image variances
Non Gaussian distributed
Texture Synthesis (Heeger et al, 95)
Image decomposition by steerable filters Histogram matching
FRAME (Zhu et al, 97) Homogeneous Markov random field (MRF) Minimax entropy principle to learn homogeneous
Gibbs distribution Gibbs sampling and feature selection
Our Problem To learn the distribution of structural signals
Challenges• How to learn non-Gaussian distributions in high
dimensions with small observations?• How to capture the sophisticated properties of the
distribution?• How to optimize parameters with global convergence?
Inhomogeneous Gibbs Models (IGM)
A framework to learn arbitrary high-dimensional distributions• 1D histograms on linear features to describe high-
dimensional distribution
• Maximum Entropy Principle– Gibbs distribution
• Minimum Entropy Principle– Feature Pursuit
• Markov chain Monte Carlo in parameter optimization
• Kullback-Leibler Feature (KLF)
1D Observation: Histograms Feature f(x): Rd→ R
• Linear feature f(x)=fTx• Kernel distance f(x)=||f-x||
Marginal distribution
Histogram - dxxfxzzh T )()()( ff
N
ii
T xN
H1
)(1 ff )0,,0,1,0,,0()( iT xf
Intuition
)(xf
1f
2f
1fH
2fH
Learning Descriptive Models)(xf
1f
2f
obsH1f
obsH2f
1f
2f
synH1f
synH2f=
)()( xpxf
Learning Descriptive Models Sufficient features can make the learnt model f(x)
converge to the underlying distribution p(x) Linear features and histograms are robust
compared with other high-order statistics Descriptive models
},,1),()(|)({ mizhzhxp pff ii
ff
Maximum Entropy Principle Maximum Entropy Model
• To generalize the statistical properties in the observed• To make the learnt model present information no more
than what is available Mathematical formulation
})(log)(max{arg
))((maxarg)(*
-
dxxpxp
xpentropyxp
miHH fpii
,,1,: tosubjected ff
Intuition of Maximum Entropy Principle
)}()(|)({11zHzHxp pf
f ff
)(xf
1f
synH1f
)(* xp
Solution form of maximum entropy model
Parameter:
})(),(exp{)(
1);(1
-
m
i
Tii zZ
xp f
Inhomogeneous Gibbs Distribution
)(zi
Gibbs potential
)( xTif )(),( xz Tii f
}{ i
Estimating Potential Function Distribution form
Normalization
Maximizing Likelihood Estimation (MLE)
1st and 2nd order derivatives
})(,exp{)(
1);(1
-
m
i
Tii x
Zxp f
- dxxZm
i
Ti })(,exp{)(
1
f
)(maxarg);(log)(:Let *
1
LxpLn
ii
f
iii
HZZ
Lf
-
-
1)( obsTixp i
HxE ff - )();(
Parameter Learning Monte Carlo integration
Algorithm
synTixp i
HxE ff )]([);(obssyn
iii
HHLff
-
)(
)}({},{:Input zH obsi if
fsi},{:Initialize
);(~}{:Sampling xpximiH syn
i:1,:histograms syn Compute f
miHHs obssyni ii
:1),(:parameters Update - ff
),(:sdivergence Histogram1
obssynm
i iiHHKLD ff
D:Untill
s Reduce
}{Λ:Output ix,
Loop
Gibbs Sampling
x
y
),,,|(~ )()(3
)(21
)1(1
tK
ttt xxxxx
),,,|(~ )()(3
)1(12
)1(2
tK
ttt xxxxπx
),,|(~ )1(1
)1(1
)1( -
tK
tK
tK xxxx
Minimum Entropy Principle Minimum entropy principle
• To make the learnt distribution close to the observed
Feature selection
dxxpxfxfxpfKL
);(
)(log)());(,( **
)];([log)]([log *- xpExfE ff
))(());(( * xfentropyxpentropy -
}{ )(if
));((minarg ** Ipentropy
})(,)(,exp{)(
1);(1
--
xx
Zxp T
m
i
Tii ff
})(,exp{)(
1);(1
-
m
i
Tii x
Zxp f
Feature Pursuit A greedy procedure to learn the feature set
Reference model
Approximate information gain
},{ f
));(),(());(),(()( -xpxfKLxpxfKLd ref
f
Kii 1}{ f
));(),((maxarg);(
xpxfKLp
xp
ref
Proposition
The approximate information gain for a new feature is
and the optimal energy function for this feature is
),()( pobs HHKLd fff
obs
p
H
H
f
ff
log
Kullback-Leibler Feature Kullback-Leibler Feature
Pursue feature by• Hybrid Monte Carlo• Sequential 1D optimization• Feature selection
z
syn
obsobssynobs
KL zHzH
zHHHKL)()(
log)(maxarg),(maxargf
ffff
ff
Acceleration by Importance Sampling
Gibbs sampling is too slow… Importance sampling by the reference model
})(,exp{)(
1),(1
1
-
m
i
Ti
refiref
ref xZ
xp f
})(),(exp{1
1
--m
i
refj
Ti
refiij xw f
),(~ refrefj xpx
Flowchart of IGM
IGM Syn Samples
Obs Samples
FeaturePursuit
KL Feature
KL<
Output
MCMC
Obs Histograms
N
Y
Toy Problems (1)
Synthesizedsamples
Gibbs potential
Observedhistograms
Synthesizedhistograms
Featurepursuit
Mixture of two Gaussians Circle
Toy Problems (2)
Swiss Roll
Applied to High Dimensions In high-dimensional space
• Too many features to constrain every dimension• MCMC sampling is extremely slow
Solution: dimension reduction by PCA Application: learning face prior model
• 83 landmarks defined to represent face (166d)• 524 samples
Face Prior Learning (1)
Observed face examples Synthesized face samples without any features
Face Prior Learning (2)
Synthesized with 10 features Synthesized with 20 features
Face Prior Learning (3)
Synthesized with 30 features Synthesized with 50 features
Observed Histograms
Synthesized Histograms
Gibbs Potential Functions
Learning Caricature Exaggeration
Synthesis Results
Learning 2D Gibbs Process
Observed Pattern Triangulation Random Pattern
Obs Histogram (1)
Synthesized Histogram1
Syn Pattern (1)Syn Histogram (1)
Obs Histogram (2)
Obs Histogram (3)
Synthesized Histogram2
Synthesized Histogram3
Obs Histogram (4)
Syn Pattern (2)
Syn Pattern (3)
Syn Pattern (4)
Syn Histogram (2)
Syn Histogram (3)
Syn Histogram (4)
Top Related