Variational Bayes Model Selectionfor Mixture Distribution
Presented by Shihao Ji
Duke University Machine Learning Group
Jan. 20, 2006
Authors: Adrian Corduneanu & Christopher M. Bishop
• Introduction – model selection
• Automatic Relevance Determination (ARD)
• Experimental Results
• Application to HMMs
Outline
Introduction
• Cross validation
• Bayesian approaches
– MCMC and Laplace approximation
– (Traditional) variational method
– (Type II) variational method
dMpMDpMDp )|(),|()|(
)()|(log QLMDp
• relevance vector regression
Given a dataset , we assume is Gaussian
Automatic Relevance Determination (ARD)
Nnnn t 1},{ x )),(|( 2xytN
2
22/22
21exp)2(),|( wtwt
Np
N
iiiwΝp
0
1),0|()|( αw
wαtwwtαt dppp ),,|(),|(),|( 222
Determination of hyperparameters:
)|(),|(),,|( 22 αwwtαtw ppp
)|( xtp
Likelihood:
Prior:
Posterior:
Type II ML
• mixture of Gaussian Given an observed dataset , we assume each data point is drawn independently from a mixture of Gaussian density
M
iiii Np
1
),|(),,|( xx
N
nnpDp
1
),,|(),,|( x
ddDpDpDp ),|,(),,|()|(
),|,( Dp
NnnD 1}{ x
Likelihood:
Prior:
Posterior:
Determination of mixing coefficients:
),0|()( ii Np ),|()( VvWp ii
VB
Type II ML
Automatic Relevance Determination (ARD)
Bayesian method: ,
• model selection
Automatic Relevance Determination (ARD)
)|( Dpm
Component elimination: if ,
i.e.,
},,2,1{ maxMm
i },,2,1{ maxMi
410
Experimental Results
600 points drawn from a mixture of 5 Gaussians.
• Bayesian method vs. cross-validation
)|( Dpm
Initially the model had 15 mixtures, finally was pruned down to 3 mixtures
• Component elimination
Experimental Results
Experimental Results
• hidden Markov model Given an observed dataset , we assume each data sequence is generated independently from an HMM
N
nn ApADp
1
),,|(),,|( x
dADpADpADp ),,|(),,|(),|(
),,|( ADp
NnnD 1}{ x
Likelihood:
Prior:
Posterior:
Determination of and A:
VB
Type II ML
Automatic Relevance Determination (ARD)
T
tttss
T
tst
T
tsss xpaAP
,, 1
1
11
11)|(),,|(
x
),,,|,()( mβiii NWp
Define -- visiting frequency
where
Bayesian method: ,
• model selection
Automatic Relevance Determination (ARD)
),|( ADpm
State elimination: if ,
},,2,1{ maxMm
ivf },,2,1{ maxMi
),|()( )()( nt
nt ispi x
n t
nti ivf )()(
Experimental Results (1)
0 10 20 30 40 50 60-1600
-1550
-1500
-1450
-1400
-1350
number of iterations
mar
gina
l log
-like
lihoo
d
Experimental Results (2)
0 10 20 30 40 50 60-1680
-1660
-1640
-1620
-1600
-1580
-1560
-1540
-1520
-1500
number of iterations
mar
gina
l log
-like
lihoo
d
Experimental Results (3)
0 20 40 60 80 100 120 140 160 180-1820
-1800
-1780
-1760
-1740
-1720
-1700
-1680
-1660
-1640
number of iterations
mar
gina
l log
-like
lihoo
d
Questions?
Top Related