Learning Bayesian Networks

Dimensions of Learning

Model Bayes net Markov net

Data Complete Incomplete

Structure Known Unknown

Objective Generative Discriminative

Bayes net(s)data

X1 truefalsefalsetrue

X2 1532

X3 0.7-1.65.96.3

Learning Bayes netsfrom data

Bayes-netlearner

+prior/expert information

From thumbtacks to Bayes nets

Thumbtack problem can be viewed as learningthe probability for a very simple BN:

X heads/tails

X1 X2 XN...

toss 1 toss 2 toss N

The next simplest Bayes net

Xheads/tails Y heads/tails

tailsheads “heads” “tails”

case 1

case 2

case N

case 1

case 2

case N

"parameterindependence"

case 1

case 2

case N

"parameterindependence"

two separatethumbtack-likelearning problems

A bit more difficult...

Three probabilities to learn:X=heads

Y=heads|X=heads

Y=heads|X=tails

Y|X=heads

case 1

case 2

Y|X=tails

Y|X=heads

case 1

case 2

Y|X=tails

Y|X=heads

case 1

case 2

Y|X=tails

Y|X=heads

case 1

case 2

Y|X=tails

3 separate thumbtack-like problems

In general …

Learning probabilities in a Bayes netis straightforward if

• Complete data

• Local distributions from the exponential family (binomial, Poisson, gamma, ...)

• Parameter independence

• Conjugate priors

Incomplete data makes parameters dependent

Y|X=heads

case 1

case 2

Y|X=tails

Solution: Use EM

• Initialize parameters ignoring missing data

• E step: Infer missing values usingcurrent parameters

• M step: Estimate parameters using completed data

• Can also use gradient descent

Learning Bayes-net structure

Given data, which model is correct?

X Ymodel 1:

X Ymodel 2:

Bayesian approach

Given data, which model is correct? more likely?

X Ymodel 1:

X Ymodel 2:

7.0)( 1 mp

3.0)( 2 mp

Data d

1.0)|( 1 dmp

9.0)|( 2 dmp

Bayesian approach:Model averaging

X Ymodel 1:

X Ymodel 2:

7.0)( 1 mp

3.0)( 2 mp

Data d

1.0)|( 1 dmp

9.0)|( 2 dmp

averagepredictions

Bayesian approach:Model selection

X Ymodel 1:

X Ymodel 2:

7.0)( 1 mp

3.0)( 2 mp

Data d

1.0)|( 1 dmp

9.0)|( 2 dmp

Keep the best model:- Explanation- Understanding- Tractability

To score a model,use Bayes’ theorem

Given data d:

)|()()|( mpmpmp dd

dmpmpmp )|(),|()|( dd

"marginallikelihood"

modelscore

likelihood

Thumbtack example

)|()1()|(

conjugateprior

X heads/tails

More complicated graphs

3 separate thumbtack-like learning problems

Y|X=heads

Y|X=tails

Model score for adiscrete Bayes net

ijkijkn

j ijij

11 1 )(

ijk i i ij

ij ijkk

# cases where = and =

number of states of

number of instances of parents of

ik Pa pa

Computation ofmarginal likelihood

Efficient closed form if

• Local distributions from the exponential family (binomial, poisson, gamma, ...)

• Parameter independence

• Conjugate priors

• No missing data (including no hidden variables)

Structure search• Finding the BN structure with the highest

score among those structures with at most k parents is NP hard for k>1 (Chickering, 1995)

• Heuristic methods

–Greedy–Greedy with restarts–MCMC methods score

all possiblesingle changes

anychangesbetter?

performbest

change

returnsaved structure

initializestructure

Structure priors

1. All possible structures equally likely

2. Partial ordering, required / prohibited arcs

3. Prior(m) Similarity(m, prior BN)

Parameter priors

• All uniform: Beta(1,1)

• Use a prior Bayes net

Parameter priors

Recall the intuition behind the Beta prior for the

thumbtack:

• The hyperparameters h and t can be thought

of as imaginary counts from our prior

experience, starting from "pure ignorance"

• Equivalent sample size = h + t

• The larger the equivalent sample size, the more

confident we are about the long-run fraction

Parameter priors

+equivalent

samplesize

imaginarycount

for anyvariable

configuration

parameter priors for any Bayes net structure for X1…Xn

parametermodularity

prior network+equivalent sample size

improved network(s)

x1 truefalsefalsetrue

x2 falsefalsefalsetrue

x3 truetruefalsefalse

Combining knowledge & data

Learning Bayesian Networks

Documents

Transcript of Learning Bayesian Networks

Learning Bayesian Networks from Data

A Tutorial On Learning With Bayesian Networks

Learning Bayesian Networks from Ordinal Data

Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

Learning Bayesian Networks(Neapolitan, Richard).pdf

Bayesian Learning and Learning Bayesian Networks · Bayesian Learning and Learning Bayesian Networks Chapter 20 some slides by Cristina Conati . Overview ! Full Bayesian Learning

Bayesian Networks C. Learning / C.3 Structure Learning ...

Learning Bayesian Networks(Neapolitan, Richard)

Learning in Bayesian Networks

Bayesian Learning, Regression-based learning. Overview Bayesian Learning Full MAP learning Maximum Likelihood Learning Learning Bayesian Networks.

Learning Gated Bayesian Networks for Algorithmic Tradingjospe50/main.pdf · 2014. 8. 15. · Learning Gated Bayesian Networks for Algorithmic Trading MarcusBendtsenandJoseM.Peña

Learning temporal nodes Bayesian networks - INAOEemorales/Papers/2013/2013... · Learning temporal nodes Bayesian networks ... crime risk factor analysis, ... (cause)andthecorrespondingchildevent

Active Learning based on Bayesian Networks

Bayesian Learning and Learning Bayesian Networks

V11: Structure Learning in Bayesian Networks

Bayesian Learning and Learning Bayesian Networks.

Bayesian Learning of Bayesian Networks with Informative Priors - …stoics.org.uk/~nicos/pbs/amai.pdf · 2019. 3. 4. · Bayesian Learning of Bayesian Networks Bayesian Learning of

Bayesian Networks for Data Miningpzs.dstu.dp.ua/.../bayes/bibl/Tutorial-BayesianNetworks.pdfKeywords: Bayesian networks, Bayesian statistics, learning, missing data, classiﬁcation,

￼￼￼￼￼￼￼Interactive Learning of Bayesian Networks

Bayesian Nonparametric Federated Learning of Neural Networks · Bayesian Nonparametric Federated Learning of Neural Networks in sharp contrast with existing work on federated learning

Interactive Learning of Bayesian Networks