Post on 15-Jan-2016
description
Daphne Koller
Parameter Estimation
Max Likelihoodfor BNs
ProbabilisticGraphicalModels
Learning
Daphne Koller
MLE for Bayesian Networks• Parameters:
• Data instances: <x[m],y[m]> X
Y
Y
X y0 y1
x0 0.95 0.05
x1 0.2 0.8
X
x0 x1
0.7 0.3
Daphne Koller
MLE for Bayesian Networks• Parameters:
):][],[():(1
M
m
mymxPDL
):][|][():][(
):][|][():][(
1|
1
11M
mXY
M
mX
M
m
M
m
mxmyPmxP
mxmyPmxP
):][|][():][(1
M
m
mxmyPmxP
X
X
Y
Y|X
Data d
Daphne Koller
MLE for Bayesian Networks• Likelihood for Bayesian network
if Xi|Ui are disjoint, then MLE can be computed by
maximizing each local likelihood separately
iii
i miii
m iiii
m
DL
mmxP
mmxP
mxPDL
):(
):][|][(
):][|][(
):][():(
U
U
Daphne Koller
MLE for Table CPDs
uu
Uu
u][,][:
|,
):][|][(mxmxm
Xx
mmxP
][
],[
],'[
],[
'
| u
u
u
uu M
xM
xM
xM
x
x
):][|][():][|][(1
|1
M
mX
M
m
mmxPmmxP Uuu
uu
uu ][,][:
|, mxmxm
xx
],[
,|
u
uu
xM
xx
Daphne Koller
MLE for Linear Gaussians
Daphne Koller
Shared ParametersS(1)S(0) S(2) S(3)
S’|S
Daphne Koller
Shared Parameters
S(1)
O(1)
S(0) S(2)
O(2)
S(3)
O(3)
S’|S
O’|S’
Daphne Koller
Summary• For BN with disjoint sets of parameters in
CPDs, likelihood decomposes as product of local likelihood functions, one per variable
• For table CPDs, local likelihood further decomposes as product of likelihood for multinomials, one for each parent combination
• For networks with shared CPDs, sufficient statistics accumulate over all uses of CPD
Daphne Koller
END END END