Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750...

30
1 CS 3750 Advanced Machine Learning CS 3750 Machine Learning Milos Hauskrecht [email protected] 5329 Sennott Square, x4-8845 http://www.cs.pitt.edu/~milos/courses/cs3750/ Lecture 2 Graphical models CS 3750 Advanced Machine Learning Graphical models Aim to represent complex multivariate probabilistic models multivariate -> cover multiple random variables Parametric distribution models: Bernoulli (outcome of coin flip) Binomial (outcome of multiple coin flips) Multinomial (outcome of die) Poisson Exponential Gamma distribution Gaussian (this one is multivariate) ) ,.., , ( ) ( 2 1 d X X X P P X ) ,.., , ( ) ( 2 1 d X X X p p X

Transcript of Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750...

Page 1: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

1

CS 3750 Advanced Machine Learning

CS 3750 Machine Learning

Milos Hauskrecht

[email protected]

5329 Sennott Square, x4-8845

http://www.cs.pitt.edu/~milos/courses/cs3750/

Lecture 2

Graphical models

CS 3750 Advanced Machine Learning

Graphical models

Aim to represent complex multivariate probabilistic models

– multivariate -> cover multiple random variables

• Parametric distribution models:

– Bernoulli (outcome of coin flip)

– Binomial (outcome of multiple coin flips)

– Multinomial (outcome of die)

– Poisson

– Exponential

– Gamma distribution

– Gaussian (this one is multivariate)

),..,,()( 21 dXXXPP X

),..,,()( 21 dXXXpp X

Page 2: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

2

CS 3750 Advanced Machine Learning

Modeling complex multivariate distributions

How to model/parameterize complex multivariate distributions

with a large number of variables?

One solution:

• Decompose the distribution. Reduce the number of parameters,

using some form of independence.

Two graphical models:

• Bayesian belief networks (BBNs)

• Markov Random Fields (MRFs)

• Learning of these models relies on the decomposition.

)(XP

CS 3750 Advanced Machine Learning

Bayesian belief network.

Burglary Earthquake

JohnCalls MaryCalls

Alarm

P(B) P(E)

P(A|B,E)

P(J|A) P(M|A)

1. Directed acyclic graph

• Nodes = random variables

• Links = direct (causal) dependencies between variables

• Missing links encode independences

Page 3: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

3

CS 3750 Advanced Machine Learning

Bayesian belief network.

2. Local conditional distributions

• relate variables and their parents

Burglary Earthquake

JohnCalls MaryCalls

Alarm

P(B) P(E)

P(A|B,E)

P(J|A)P(M|A)

CS 3750 Advanced Machine Learning

Bayesian belief network.

Burglary Earthquake

JohnCalls MaryCalls

Alarm

B E T F

T T 0.95 0.05T F 0.94 0.06F T 0.29 0.71F F 0.001 0.999

P(B)

0.001 0.999

P(E)

0.002 0.998

A T F

T 0.90 0.1F 0.05 0.95

A T F

T 0.7 0.3F 0.01 0.99

P(A|B,E)

P(J|A) P(M|A)

T F T F

Page 4: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

4

CS 3750 Advanced Machine Learning

Full joint distribution in BBNs

Full joint distribution is defined in terms of local conditional

distributions (obtained via the chain rule):

))(|(),..,,(,..1

21

ni

iin XpaXXXX PP

M

A

B

J

E

),,,,( FMTJTATETBP

Example:

)|()|(),|()()( TAFMPTATJPTETBTAPTEPTBP

Then its probability is:

Assume the following assignment

of values to random variables

FMTJTATETB ,,,,

# of parameters of the full joint:

Parameter complexity problem

• In the BBN the full joint distribution is defined as:

• What did we save?

Alarm example: binary (True, False) variables

Burglary

JohnCalls

Alarm

Earthquake

MaryCalls

))(|(),..,,(,..1

21

ni

iin XpaXXXX PP

?

Page 5: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

5

# of parameters of the full joint:

Parameter complexity problem

• In the BBN the full joint distribution is defined as:

• What did we save?

Alarm example: binary (True, False) variables

Burglary

JohnCalls

Alarm

Earthquake

MaryCalls

))(|(),..,,(,..1

21

ni

iin XpaXXXX PP

3225 One parameter depends on the rest:

31125 # of parameters of the BBN:

?

Bayesian belief network: parameters count

Burglary Earthquake

JohnCalls MaryCalls

Alarm

B E T F

T T 0.95 0.05T F 0.94 0.06F T 0.29 0.71F F 0.001 0.999

P(B)

0.001 0.999

P(E)

0.002 0.998

A T F

T 0.90 0.1F 0.05 0.95

A T F

T 0.7 0.3F 0.01 0.99

P(A|B,E)

P(J|A) P(M|A)

T F T F

2 2

8

4 4

Total: 20

Page 6: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

6

# of parameters of the full joint:

Parameter complexity problem

• In the BBN the full joint distribution is defined as:

• What did we save?

Alarm example: 5 binary (True, False) variables

Burglary

JohnCalls

Alarm

Earthquake

MaryCalls

))(|(),..,,(,..1

21

ni

iin XpaXXXX PP

3225

31125

One parameter depends on the rest:

# of parameters of the BBN:

20)2(2)2(22 23

One parameter in every conditional depends on the rest:

?

Bayesian belief network: free parameters

Burglary Earthquake

JohnCalls MaryCalls

Alarm

B E T F

T T 0.95 0.05T F 0.94 0.06F T 0.29 0.71F F 0.001 0.999

P(B)

0.001 0.999

P(E)

0.002 0.998

A T F

T 0.90 0.1F 0.05 0.95

A T F

T 0.7 0.3F 0.01 0.99

P(A|B,E)

P(J|A) P(M|A)

T F T F

1 1

4

2 2

Total free

params: 10

= 1- 0.95

= 1- 0.002

Page 7: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

7

# of parameters of the full joint:

Parameter complexity problem

• In the BBN the full joint distribution is defined as:

• What did we save?

Alarm example: 5 binary (True, False) variables

Burglary

JohnCalls

Alarm

Earthquake

MaryCalls

))(|(),..,,(,..1

21

ni

iin XpaXXXX PP

3225

31125

One parameter depends on the rest:

# of parameters of the BBN:

20)2(2)2(22 23

One parameter in every conditional depends on the rest:

10)1(2)2(222

Inference in Bayesian network

• Bad news:

– Exact inference problem in BBNs is NP-hard (Cooper)

– Approximate inference is NP-hard (Dagum, Luby)

• But very often we can achieve significant improvements

• Assume our Alarm network

• Assume we want to compute:

Burglary

JohnCalls

Alarm

Earthquake

MaryCalls

)( TJP

Page 8: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

8

Inference in Bayesian networks

• Full joint uses the decomposition

• Calculation of marginals:

– Requires summation over variables we want to take out

• How to compute sums and products more efficiently?

xx

xfaxaf )()(

)( TJP

),,,,(, , , ,

mMTJaAeEbBPFTb FTe FTa FTm

M

A

B

J

E

Variable elimination

• Variable elimination:

– E.g. Query requires to eliminate A,B,E,M and

this can be done in different order

)( TJP

)()(),|()|()|(, , , ,

eEPbBPeEbBaAPaAmMPaATJPFTb FTe FTa FTm

)( TJP

Page 9: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

9

Variable elimination

Assume order: M, E, B, A to calculate

)()(),|()|()|(, , , ,

eEPbBPeEbBaAPaAmMPaATJPFTb FTe FTa FTm

)( TJP

Variable elimination

Assume order: M, E, B,A to calculate

)()(),|()|()|(, , , ,

eEPbBPeEbBaAPaAmMPaATJPFTb FTe FTa FTm

)( TJP

FTmFTb FTe FTa

aAmMPeEPbBPeEbBaAPaATJP,, , ,

)|()()(),|()|(

Page 10: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

10

Variable elimination

Assume order: M, E, B, A to calculate

)()(),|()|()|(, , , ,

eEPbBPeEbBaAPaAmMPaATJPFTb FTe FTa FTm

)( TJP

FTmFTb FTe FTa

aAmMPeEPbBPeEbBaAPaATJP,, , ,

)|()()(),|()|(

1)()(),|()|(, , ,

FTb FTe FTa

eEPbBPeEbBaAPaATJP

Variable elimination

Assume order: M, E, B, A to calculate

)()(),|()|()|(, , , ,

eEPbBPeEbBaAPaAmMPaATJPFTb FTe FTa FTm

)( TJP

FTmFTb FTe FTa

aAmMPeEPbBPeEbBaAPaATJP,, , ,

)|()()(),|()|(

1)()(),|()|(, , ,

FTb FTe FTa

eEPbBPeEbBaAPaATJP

FTb FTeFTa

eEPeEbBaAPbBPaATJP, ,,

)(),|()()|(

Page 11: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

11

Variable elimination

Assume order: M, E, B, A to calculate

)()(),|()|()|(, , , ,

eEPbBPeEbBaAPaAmMPaATJPFTb FTe FTa FTm

)( TJP

FTmFTb FTe FTa

aAmMPeEPbBPeEbBaAPaATJP,, , ,

)|()()(),|()|(

1)()(),|()|(, , ,

FTb FTe FTa

eEPbBPeEbBaAPaATJP

FTb FTeFTa

eEPeEbBaAPbBPaATJP, ,,

)(),|()()|(

),(1 bBaA

TA FA

TB

FB

)(),|(,

eEPeETBFAPFTe

),(1 bBaA)(),|(

,

eEPeEFBFAPFTe

Variable elimination

Assume order: M, E, B, A to calculate

)()(),|()|()|(, , , ,

eEPbBPeEbBaAPaAmMPaATJPFTb FTe FTa FTm

)( TJP

FTmFTb FTe FTa

aAmMPeEPbBPeEbBaAPaATJP,, , ,

)|()()(),|()|(

1)()(),|()|(, , ,

FTb FTe FTa

eEPbBPeEbBaAPaATJP

FTb FTeFTa

eEPeEbBaAPbBPaATJP, ,,

)(),|()()|(

),()()|(,

1

,

bBaAbBPaATJPFTbFTa

Page 12: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

12

Variable elimination

Assume order: M, E, B, A to calculate

)()(),|()|()|(, , , ,

eEPbBPeEbBaAPaAmMPaATJPFTb FTe FTa FTm

)( TJP

FTmFTb FTe FTa

aAmMPeEPbBPeEbBaAPaATJP,, , ,

)|()()(),|()|(

1)()(),|()|(, , ,

FTb FTe FTa

eEPbBPeEbBaAPaATJP

FTb FTeFTa

eEPeEbBaAPbBPaATJP, ,,

)(),|()()|(

),()()|(,

1

,

bBaAbBPaATJPFTbFTa

FTa FTb

bBaAbBPaATJP,

1

,

),()()|(

Variable elimination

Assume order: M, E, B, A to calculate

)()(),|()|()|(, , , ,

eEPbBPeEbBaAPaAmMPaATJPFTb FTe FTa FTm

)( TJP

FTmFTb FTe FTa

aAmMPeEPbBPeEbBaAPaATJP,, , ,

)|()()(),|()|(

1)()(),|()|(, , ,

FTb FTe FTa

eEPbBPeEbBaAPaATJP

FTb FTeFTa

eEPeEbBaAPbBPaATJP, ,,

)(),|()()|(

),()()|(,

1

,

bBaAbBPaATJPFTbFTa

FTa FTb

bBaAbBPaATJP,

1

,

),()()|(

)(2 aA

Page 13: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

13

Variable elimination

Assume order: M, E, B, A to calculate

)()(),|()|()|(, , , ,

eEPbBPeEbBaAPaAmMPaATJPFTb FTe FTa FTm

)( TJP

FTmFTb FTe FTa

aAmMPeEPbBPeEbBaAPaATJP,, , ,

)|()()(),|()|(

1)()(),|()|(, , ,

FTb FTe FTa

eEPbBPeEbBaAPaATJP

FTb FTeFTa

eEPeEbBaAPbBPaATJP, ,,

)(),|()()|(

),()()|(,

1

,

bBaAbBPaATJPFTbFTa

FTa FTe

bBaAbBPaATJP,

1

,

),()()|(

FTa

aAaATJP,

2 )()|(

Variable elimination

Assume order: M, E, B, A to calculate

)()(),|()|()|(, , , ,

eEPbBPeEbBaAPaAmMPaATJPFTb FTe FTa FTm

)( TJP

FTmFTb FTe FTa

aAmMPeEPbBPeEbBaAPaATJP,, , ,

)|()()(),|()|(

1)()(),|()|(, , ,

FTb FTe FTa

eEPbBPeEbBaAPaATJP

FTb FTeFTa

eEPeEbBaAPbBPaATJP, ,,

)(),|()()|(

),()()|(,

1

,

bBaAbBPaATJPFTbFTa

FTa FTe

bBaAbBPaATJP,

1

,

),()()|(

FTa

aAaATJP,

2 )()|( )( TJP

Page 14: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

14

CS 3750 Advanced Machine Learning

Markov random fields

• Probabilistic models with symmetric dependences.

– Typically models spatially varying quantities

)(

)()(xclc

cc xxP

)( cc x

}{ )(

)(expxx xclc

cc xEZ

- A potential function (defined over factors)

)(

)(exp1

)(xclc

cc xEZ

xP

- A partition function

- Gibbs (Boltzman) distribution

- If is strictly positive we can rewrite the definition in

terms of a log-linear model :

Energy function

)( cc x

CS 3750 Advanced Machine Learning

Graphical representation of MRFs

An undirected network (also called independence graph)

• G = (S, E)

– S=1, 2, .. N correspond to random variables

or xi and xj appear within the same factor c

Example:

– variables A,B ..H

– Assume the full joint of MRF

cjicEji },{:),(

BC

D E

F

HA G

),(),(),(

),(),,(),,(

~),...,(

654

321

HFHGFC

GAEDBCBA

HBAP

Page 15: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

15

CS 3750 Advanced Machine Learning

Markov random fields

• regular lattice

(Ising model)

• Arbitrary graph

CS 3750 Advanced Machine Learning

Markov random fields

• regular lattice

(Ising model)

• Arbitrary graph

Page 16: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

16

CS 3750 Advanced Machine Learning

Markov random fields: independence

relations

• Pairwise Markov property

– Two nodes in the network that are not directly connected

can be made independent given all other nodes

• Local Markov property

– A set of nodes (variables) can be made independent from

the rest of nodes variables given its immediate neighbors

• Global Markov property

– A vertex set A is independent of the vertex set B (A and B

are disjoint) given set C if all chains in between elements in

A and B intersect C

CS 3750 Advanced Machine Learning

Types of Markov random fields

• MRFs with discrete random variables

– Clique potentials can be defined by mapping all clique-

variable instances to R

– Example: Assume two binary variables A,B with values

{a1,a2,a3} and {b1,b2} are in the same clique c. Then:

),( BAc a1 b1 0.5

a1 b2 0.2

a2 b1 0.1

a2 b2 0.3

a3 b1 0.2

a3 b2 0.4

Page 17: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

17

CS 3750 Advanced Machine Learning

Types of Markov random fields

• Gaussian Markov Random Field

• Precision matrix

• Variables in x are connected in the network only if they

have a nonzero entry in the precision matrix

– All zero entries are not directly connected

)()(

2

1exp

)2(

1)|( 1

2/12/μxΣμx

ΣΣμ,x

T

dp

),(~ Σμx N

CS 3750 Advanced Machine Learning

MRF variable elimination inference

Example:

BC

D E

F

HA G

),...,()(,..,,

HBAPBPHDCA

HDCA

HFHGFCGAEDBCBAZ ,..,,

654321 ),(),(),(),(),,(),,(1

BC

D E

F

HA G

HGFDCA E

HFHGFCGAEDBCBAZ ,,,,,

654321 ),(),(),(),(),,(),,(1

Eliminate E

),(1 DB

Page 18: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

18

CS 3750 Advanced Machine Learning

Factors

• Factor: is a function that maps value assignments for a subset of random variables to (reals)

• The scope of the factor:

– a set of variables defining the factor

• Example:

– Assume discrete random variables x (with values a1,a2, a3) and y (with values b1 and b2)

– Factor:

– Scope of the factor:

a1 b1 0.5

a1 b2 0.2

a2 b1 0.1

a2 b2 0.3

a3 b1 0.2

a3 b2 0.4

),( yx

},{ yx

CS 3750 Advanced Machine Learning

Factor Product

b1 c1 0.1

b1 c2 0.6

b2 c1 0.3

b2 c2 0.4

a1 b1 0.5

a1 b2 0.2

a2 b1 0.1

a2 b2 0.3

a3 b1 0.2

a3 b2 0.4

a1 b1 c1 0.5*0.1

a1 b1 c2 0.5*0.6

a1 b2 c1 0.2*0.3

a1 b2 c2 0.2*0.4

a2 b1 c1 0.1*0.1

a2 b1 c2 0.1*0.6

a2 b2 c1 0.3*0.3

a2 b2 c2 0.3*0.4

a3 b1 c1 0.2*0.1

a3 b1 c2 0.2*0.6

a3 b2 c1 0.4*0.3

a3 b2 c2 0.4*0.4

Variables: A,B,C

),( CB ),( BA

),,( CBA),(),(),,( BACBCBA

Page 19: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

19

CS 3750 Advanced Machine Learning

Factor Marginalization

a1 b1 c1 0.2

a1 b1 c2 0.35

a1 b2 c1 0.4

a1 b2 c2 0.15

a2 b1 c1 0.5

a2 b1 c2 0.1

a2 b2 c1 0.3

a2 b2 c2 0.2

a3 b1 c1 0.25

a3 b1 c2 0.45

a3 b2 c1 0.15

a3 b2 c2 0.25

a1 c1 0.2+0.4=0.6

a1 c2 0.35+0.15=0.5

a2 c1 0.8

a2 c2 0.3

a3 c1 0.4

a3 c2 0.7

Variables: A,B,C ),,(),( CBACAB

CS 3750 Advanced Machine Learning

HGFCA D

HFHGFCGADBCBA,,,,

654311 ),(),(),(),(),(),,(

MRF variable elimination inference

Example (cont):

),...,()(,..,,

HBAPBPHDCA

BC

D E

F

HA G

HGFDCA

HFHGFCGADBCBA,,,,,

654311 ),(),(),(),(),(),,(

Eliminate D

)(2 B

BC

D E

F

HA G

Page 20: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

20

CS 3750 Advanced Machine Learning

GFCA H

HFHGFCGABCBA,,,

654321 ),(),(),(),()(),,(

MRF variable elimination inference

Example (cont):

),...,()(,..,,

HBAPBPHDCA

BC

D E

F

HA G

HGFCA

HFHGFCGABCBA,,,,

654321 ),(),(),(),()(),,(

Eliminate H

),,(3 HGF

BC

D E

F

HA G

),(4 GF

CS 3750 Advanced Machine Learning

GCA F

GFFCGABCBA,,

44321 ),(),(),()(),,(

MRF variable elimination inference

Example (cont):

),...,()(,..,,

HBAPBPHDCA

BC

D E

F

HA G

),(),(),()(),,( 4

,,,

4321 GFFCGABCBAGFCA

Eliminate F

),,(5 GFC

),(6 CG

BC

D E

F

HA G

Page 21: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

21

CS 3750 Advanced Machine Learning

CA F

GCGABCBA,

6321 ),(),()(),,(

MRF variable elimination inference

Example (cont):

),...,()(,..,,

HBAPBPHDCA

BC

D E

F

HA G

GCA

GCGABCBA,,

6321 ),(),()(),,(

Eliminate G

),,(7 GCA

),(8 CA

BC

D E

F

HA G

CS 3750 Advanced Machine Learning

A C

CACBAB ),(),,()( 812

MRF variable elimination inference

Example (cont):

),...,()(,..,,

HBAPBPHDCA

BC

D E

F

HA G

CA

CABCBA,

821 ),()(),,(

Eliminate C

),,(9 CBA

),(10 BA

BC

D E

F

HA G

Page 22: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

22

CS 3750 Advanced Machine Learning

MRF variable elimination inference

Example (cont):

),...,()(,..,,

HBAPBPHDCA

BC

D E

F

HA G

A

A

BAB

BAB

),()(

),()(

102

102

Eliminate A

)(11 B

BC

D E

F

HA G

A

BAB ),()( 102

)()( 112 BB B

C

D E

F

HA G

CS 3750 Advanced Machine Learning

Induced graph

• A graph induced by a specific variable

elimination order:

• a graph G extended by links that represent

intermediate factors

– .

BC

D E

F

HA G

Page 23: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

23

CS 3750 Advanced Machine Learning

Tree decomposition of the graph

• A tree decomposition of

a graph G:

– A tree T with a vertex set

associated to every node.

– For all edges {v,w}G:

there is a set containing

both v and w in T.

– For every v G : the nodes

in T that contain v form a

connected subtree.

BC

D E

F

A A

B C C

B D

E

H

FGF

CG G

A G

H

CS 3750 Advanced Machine Learning

Tree decomposition of the graph

• A tree decomposition of

a graph G:

– A tree T with a vertex set

associated to every node.

– For all edges {v,w}G:

there is a set containing

both v and w in T.

– For every v G : the nodes

in T that contain v form a

connected subtree.

BC

D E

F

A A

B C C

B D

E

H

FGF

CG G

A G

H

Cliques in

the graph

Page 24: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

24

CS 3750 Advanced Machine Learning

Tree decomposition of the graph

• A tree decomposition of

a graph G:

– A tree T with a vertex set

associated to every node.

– For all edges {v,w}G:

there is a set containing

both v and w in T.

– For every v G : the nodes

in T that contain v form a

connected subtree.

BC

D E

F

A A

B C C

B D

E

H

FGF

CG G

A G

H

CS 3750 Advanced Machine Learning

Tree decomposition of the graph

• Another tree

decomposition of a

graph G:

– A tree T with a vertex set

associated to every node.

– For all edges {v,w}G:

there is a set containing

both v and w in T.

– For every v G : the nodes

in T that contain v form a

connected subtree.

BC

D E

F

A A

B C C

B D

E

H

GF

CG

A G

H

Page 25: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

25

CS 3750 Advanced Machine Learning

Tree decomposition of the graph

• Another tree

decomposition of a

graph G:

– A tree T with a vertex set

associated to every node.

– For all edges {v,w}G:

there is a set containing

both v and w in T.

– For every v G : the nodes

in T that contain v form a

connected subtree.

BC

D E

F

A

B C

B D

E

H

G

A G

H FFC

G GH

CS 3750 Advanced Machine Learning

Treewidth of the graph

BC

D E

F

A A

B C C

B D

E

H

FGF

CG G

A G

H

• Width of the tree

decomposition:

• Treewidth of a graph

G: tw(G)= minimum

width over all tree

decompositions of G.

1||max iIi X

Page 26: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

26

CS 3750 Advanced Machine Learning

Treewidth of the graph

BC

D E

F

A A

B C C

B D

E

H

FGF

CG G

A G

H

• Treewidth of a graph G:

tw(G)= minimum width over

all tree decompositions of G

• Why is it important?

• The calculations can take

advantage of the structure and

be performed more efficiently

• treewidth gives the best case

complexityA C

B

D EF

G

H

vs

CS 3750 Advanced Machine Learning

Trees

Why do we like trees?

• Inference in trees structures can be done in time

linear in the number of nodes in the tree

A B C D

E

Page 27: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

27

CS 3750 Advanced Machine Learning

Converting BBNs to MRFs

Moral-graph H[G]: of a Bayesian network over X is an

undirected graph over X that contains an edge between x and

y if:

There exists a directed edge between them in G.

They are both parents of the same node in G.

C

D I

G S

L

JH

C

D I

G S

L

JH

CS 3750 Advanced Machine Learning

Moral Graphs

Why moralization?

),,(),,(),(),(),,(),()(

),|(),|()|()|(),|()|()(

),,,,,,,(

7654321 JGHSLJGLISDIGCDC

JGHPSLJPGLPISPDIGPCDPCP

HJLSIGDCP

),|( DIGP ),,(3 DIGC

D I

G S

L

JH

C

D I

G S

L

JH

Page 28: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

28

CS 3750 Advanced Machine Learning

Chordal graphsChordal Graph: an undirected graph G

• all cycles of four or more vertices have a chord (another

edge breaking the cycle)

• minimum cycle for every vertex in a cycle is 3 (contains 3

verticies) C

D I

G S

L

J

H

C

D I

G S

L

J

H

Chordal. Not Chordal

CS 3750 Advanced Machine Learning

Chordal Graphs

Properties:

– There exists an elimination ordering that adds no edges.

– The minimal induced treewidth of the graph is equal to the

size of the largest clique - 1.

C

D I

G S

L

JH

C

D I

G S

L

JH

C

D I

G S

L

JH

C

D I

G S

L

JH

Page 29: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

29

CS 3750 Advanced Machine Learning

Triangulation

The process of converting a graph G into a chordal graph is

called Triangulation.

A new graph obtained via triangulation is:

1) Guaranteed to be chordal.

2) Not guaranteed to be (treewidth) optimal.

• There exist exact algorithms for finding the minimal

chordal graphs, and heuristic methods with a

guaranteed upper bound.

CS 3750 Advanced Machine Learning

Chordal Graphs

• Given a minimum triangulation for a graph G, we can carry

out the variable-elimination algorithm in the minimum

possible time.

• Complexity of the optimal triangulation:

– Finding the minimal triangulation is NP-Hard.

• The inference limit:

– Inference time is exponential in terms of the largest

clique (factor) in G.

Page 30: Graphical models - University of Pittsburghmilos/courses/cs3750/... · 2018-08-30 · CS 3750 Advanced Machine Learning Modeling complex multivariate distributions How to model/parameterize

30

CS 3750 Advanced Machine Learning

Conclusions: MRFs

• We cannot escape exponential costs in the treewidth

• But in many graphs the tree-width is much smaller than the total

number of variables

• Still a problem: Finding the optimal decomposition is hard

– But, paying the cost up front may be worth it.

– Triangulate once, query many times.

– Real cost savings if not a bounded one