Variational Bayes Model Selection for Mixture Distribution
description
Transcript of Variational Bayes Model Selection for Mixture Distribution
Variational Bayes Model Selectionfor Mixture Distribution
Presented by Shihao Ji
Duke University Machine Learning Group
Jan. 20, 2006
Authors: Adrian Corduneanu & Christopher M. Bishop
• Introduction – model selection
• Automatic Relevance Determination (ARD)
• Experimental Results
• Application to HMMs
Outline
Introduction
• Cross validation
• Bayesian approaches
– MCMC and Laplace approximation
– (Traditional) variational method
– (Type II) variational method
dMpMDpMDp )|(),|()|(
)()|(log QLMDp
• relevance vector regression
Given a dataset , we assume is Gaussian
Automatic Relevance Determination (ARD)
Nnnn t 1},{ x )),(|( 2xytN
2
22/22
21exp)2(),|( wtwt
Np
N
iiiwΝp
0
1),0|()|( αw
wαtwwtαt dppp ),,|(),|(),|( 222
Determination of hyperparameters:
)|(),|(),,|( 22 αwwtαtw ppp
)|( xtp
Likelihood:
Prior:
Posterior:
Type II ML
• mixture of Gaussian Given an observed dataset , we assume each data point is drawn independently from a mixture of Gaussian density
M
iiii Np
1
),|(),,|( xx
N
nnpDp
1
),,|(),,|( x
ddDpDpDp ),|,(),,|()|(
),|,( Dp
NnnD 1}{ x
Likelihood:
Prior:
Posterior:
Determination of mixing coefficients:
),0|()( ii Np ),|()( VvWp ii
VB
Type II ML
Automatic Relevance Determination (ARD)
Bayesian method: ,
• model selection
Automatic Relevance Determination (ARD)
)|( Dpm
Component elimination: if ,
i.e.,
},,2,1{ maxMm
i },,2,1{ maxMi
410
Experimental Results
600 points drawn from a mixture of 5 Gaussians.
• Bayesian method vs. cross-validation
)|( Dpm
Initially the model had 15 mixtures, finally was pruned down to 3 mixtures
• Component elimination
Experimental Results
Experimental Results
• hidden Markov model Given an observed dataset , we assume each data sequence is generated independently from an HMM
N
nn ApADp
1
),,|(),,|( x
dADpADpADp ),,|(),,|(),|(
),,|( ADp
NnnD 1}{ x
Likelihood:
Prior:
Posterior:
Determination of and A:
VB
Type II ML
Automatic Relevance Determination (ARD)
T
tttss
T
tst
T
tsss xpaAP
,, 1
1
11
11)|(),,|(
x
),,,|,()( mβiii NWp
Define -- visiting frequency
where
Bayesian method: ,
• model selection
Automatic Relevance Determination (ARD)
),|( ADpm
State elimination: if ,
},,2,1{ maxMm
ivf },,2,1{ maxMi
),|()( )()( nt
nt ispi x
n t
nti ivf )()(
Experimental Results (1)
0 10 20 30 40 50 60-1600
-1550
-1500
-1450
-1400
-1350
number of iterations
mar
gina
l log
-like
lihoo
d
Experimental Results (2)
0 10 20 30 40 50 60-1680
-1660
-1640
-1620
-1600
-1580
-1560
-1540
-1520
-1500
number of iterations
mar
gina
l log
-like
lihoo
d
Experimental Results (3)
0 20 40 60 80 100 120 140 160 180-1820
-1800
-1780
-1760
-1740
-1720
-1700
-1680
-1660
-1640
number of iterations
mar
gina
l log
-like
lihoo
d
Questions?