PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most...

49
What are MAP and Marginal Inferences? Variable elimination Message Passing PGM 2 — Inference Qinfeng (Javen) Shi ML session 10 Qinfeng (Javen) Shi PGM 2 — Inference

Transcript of PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most...

Page 1: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

PGM 2 — Inference

Qinfeng (Javen) Shi

ML session 10

Qinfeng (Javen) Shi PGM 2 — Inference

Page 2: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Table of Contents I

1 What are MAP and Marginal Inferences?Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

2 Variable eliminationVE for marginal inferenceVE for MAP inference

3 Message PassingSum-productMax-product

Qinfeng (Javen) Shi PGM 2 — Inference

Page 3: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

Marginal and MAP Queries

Given joint distribution P(Y ,E ), where

Y , query random variable(s), unknown

E , evidence random variable(s), observed i.e. E = e.

Two types of queries:

Marginal queries (a.k.a. probability queries)task is to compute P(Y |E = e)

MAP queries (a.k.a. most probable explanation )task is to find y∗ = argmaxy∈Val(Y ) P(Y |E = e)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 4: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

Marginal and MAP Inference

X1

X3

X2

(a) Directed graph

X1

X3

X2

(b) Undirected graph

X1

X3

X2

fc2 fc3

fc1

(c) Factor graph

Marginal inference: P(xi ) =∑xj :j 6=i

P(x1, x2, x3)

MAP inference: (x∗1 , x∗2 , x∗3 ) = argmax

x1,x2,x3P(x1, x2, x3)

Warning: x∗i 6= argmaxxi

P(xi ) in general

Qinfeng (Javen) Shi PGM 2 — Inference

Page 5: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

Marginal and MAP Inference

X1

X3

X2

(d) Directed graph

X1

X3

X2

(e) Undirected graph

X1

X3

X2

fc2 fc3

fc1

(f) Factor graph

Marginal inference: P(xi ) =∑xj :j 6=i

P(x1, x2, x3)

MAP inference: (x∗1 , x∗2 , x∗3 ) = argmax

x1,x2,x3P(x1, x2, x3)

Warning: x∗i 6= argmaxxi

P(xi ) in general

Qinfeng (Javen) Shi PGM 2 — Inference

Page 6: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

Marginal and MAP Inference

X1

X3

X2

(g) Directed graph

X1

X3

X2

(h) Undirected graph

X1

X3

X2

fc2 fc3

fc1

(i) Factor graph

Extends to seeing the evidence E ,

Marginal inference: P(xi |E ) =∑xj :j 6=i

P(x1, x2, x3|E )

MAP inference: (x∗1 , x∗2 , x∗3 ) = argmax

x1,x2,x3P(x1, x2, x3|E )

Qinfeng (Javen) Shi PGM 2 — Inference

Page 7: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

Example of 4WD

P(¬g , a|s)? (i.e. P(G = ¬g ,A = a|S = s))

P(S)?

argmaxG ,A,S P(G ,A,S)?

Qinfeng (Javen) Shi PGM 2 — Inference

Page 8: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

Example of 4WD

P(¬g , a|s)? (i.e. P(G = ¬g ,A = a|S = s))

P(S)?

argmaxG ,A,S P(G ,A,S)?

Qinfeng (Javen) Shi PGM 2 — Inference

Page 9: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

Marginals

When do we need marginals? Marginals are used to compute

query for probabilities like in W4D example.

normalisation constantZ =

∑xiq(xi ) =

∑xjq(xj) ∀i , j = 1, . . . .

log loss in Conditional Random Fields (CRFs) is− logP(x1, . . . , xn) = log(Z ) + . . . .Here q(xi ) is a belief (not necessarily a probability) inmarginal inference.

expectations like EP(xi )[φ(xi )] and EP(xi ,xj )[φ(xi , xj)], whereψ(xi ) = 〈φ(xi ),w〉 and ψ(xi , xj) = 〈φ(xi , xj),w〉Gradient of CRFs risk contains above expectations.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 10: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

Marginals

When do we need marginals? Marginals are used to compute

query for probabilities like in W4D example.

normalisation constantZ =

∑xiq(xi ) =

∑xjq(xj) ∀i , j = 1, . . . .

log loss in Conditional Random Fields (CRFs) is− logP(x1, . . . , xn) = log(Z ) + . . . .Here q(xi ) is a belief (not necessarily a probability) inmarginal inference.

expectations like EP(xi )[φ(xi )] and EP(xi ,xj )[φ(xi , xj)], whereψ(xi ) = 〈φ(xi ),w〉 and ψ(xi , xj) = 〈φ(xi , xj),w〉Gradient of CRFs risk contains above expectations.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 11: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

Marginals

When do we need marginals? Marginals are used to compute

query for probabilities like in W4D example.

normalisation constantZ =

∑xiq(xi ) =

∑xjq(xj) ∀i , j = 1, . . . .

log loss in Conditional Random Fields (CRFs) is− logP(x1, . . . , xn) = log(Z ) + . . . .Here q(xi ) is a belief (not necessarily a probability) inmarginal inference.

expectations like EP(xi )[φ(xi )] and EP(xi ,xj )[φ(xi , xj)], whereψ(xi ) = 〈φ(xi ),w〉 and ψ(xi , xj) = 〈φ(xi , xj),w〉Gradient of CRFs risk contains above expectations.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 12: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

MAP

When do we need MAP?

find the most likely configuration for (xi )i∈V in testing.

find the most violated constraint generated by (x†i )i∈V intraining (i.e. learning), e.g. by cutting plane method (used inSVM-Struct) or by Bundle method for Risk Minimisation (TeoJMLR2010).

Qinfeng (Javen) Shi PGM 2 — Inference

Page 13: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

MAP

When do we need MAP?

find the most likely configuration for (xi )i∈V in testing.

find the most violated constraint generated by (x†i )i∈V intraining (i.e. learning), e.g. by cutting plane method (used inSVM-Struct) or by Bundle method for Risk Minimisation (TeoJMLR2010).

Qinfeng (Javen) Shi PGM 2 — Inference

Page 14: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

How to infer?

How to infer by hand for Bayesian Networks? (previous lecture).

Problems: hand-tiring for many variables, and it’s only forBayesian Networks.

How to infer for other graphical models and how to do it in acomputer program?

Qinfeng (Javen) Shi PGM 2 — Inference

Page 15: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

How to infer?

How to infer by hand for Bayesian Networks? (previous lecture).

Problems: hand-tiring for many variables, and it’s only forBayesian Networks.

How to infer for other graphical models and how to do it in acomputer program?

Qinfeng (Javen) Shi PGM 2 — Inference

Page 16: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Marginal and MAP QueriesMarginal and MAP InferenceHow to infer?

How to infer?

How to infer by hand for Bayesian Networks? (previous lecture).

Problems: hand-tiring for many variables, and it’s only forBayesian Networks.

How to infer for other graphical models and how to do it in acomputer program?

Qinfeng (Javen) Shi PGM 2 — Inference

Page 17: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination

Variable elimination: infer by eliminating variables (works for bothmarginal and MAP inference)

P(A) =∑S,G

P(A,S ,G )

=∑S,G

P(S)P(A|S)P(G |S)

=∑S

P(S)P(A|S)(∑G

P(G |S)) =∑S

P(S)P(A|S)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 18: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

VE for marginal inference

Step by step:

1 sum over missing variables (marginalisation) for the fulldistribution.

2 factorise the full distribution.

3 rearrange the sum operator to reduce the computation.

4 eliminate the variables.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 19: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

VE for marginal inference

Step by step:

1 sum over missing variables (marginalisation) for the fulldistribution.

2 factorise the full distribution.

3 rearrange the sum operator to reduce the computation.

4 eliminate the variables.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 20: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

VE for marginal inference

Step by step:

1 sum over missing variables (marginalisation) for the fulldistribution.

2 factorise the full distribution.

3 rearrange the sum operator to reduce the computation.

4 eliminate the variables.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 21: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

VE for marginal inference

Step by step:

1 sum over missing variables (marginalisation) for the fulldistribution.

2 factorise the full distribution.

3 rearrange the sum operator to reduce the computation.

4 eliminate the variables.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 22: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — BayesNets

A

CB

Marginal inference P(A)?

P(A) =∑B,C

P(A, B, C)

=∑B,C

P(B)P(C)P(A|B, C)

=∑B

P(B)∑C

P(C)P(A|B, C)

=∑B

P(B)m1(A, B) (C eliminated)

= m2(A) (B eliminated)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 23: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — MRFs

X3X2

X1

P(x1, x2, x3) =1

Zψ(x1, x2)ψ(x1, x3)ψ(x1)ψ(x2)ψ(x3)

ψ are given. Show example using the document camera.

P(x1) =∑x2,x3

1

Zψ(x1, x2)ψ(x1, x3)ψ(x1)ψ(x2)ψ(x3)

=1

Z

∑x2,x3

ψ(x1, x2)ψ(x1, x3)ψ(x1)ψ(x2)ψ(x3)

=1

Zψ(x1)

∑x2

(ψ(x1, x2)ψ(x2)

)∑x3

(ψ(x1, x3)ψ(x3)

)

=1

Zψ(x1)m2→1(x1)m3→1(x1)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 24: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — MRFs

X3X2

X1

P(x2) =∑x1,x3

1

Zψ(x1, x2)ψ(x1, x3)ψ(x1)ψ(x2)ψ(x3)

=1

Zψ(x2)

∑x1

(ψ(x1, x2)ψ(x1)

∑x3

[ψ(x1, x3)ψ(x3)

])

=1

Zψ(x2)

∑x1

ψ(x1, x2)ψ(x1)m3→1(x1)

=1

Zψ(x2)m1→2(x2)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 25: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — factor graphical models

Works too.Replace the ψ by factors f1, f2, ...

Qinfeng (Javen) Shi PGM 2 — Inference

Page 26: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

VE for MAP inference

MAP inference:

(x∗1 , x∗2 , x∗3 , ..., x

∗n ) = argmax

x1,x2,x3,...,xnP(x1, x2, x3, ..., xn)

Step by step:

1 max over the full distribution.

2 factorise the full distribution.

3 rearrange the max operator to reduce the computation.

4 eliminate the variables.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 27: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

VE for MAP inference

MAP inference:

(x∗1 , x∗2 , x∗3 , ..., x

∗n ) = argmax

x1,x2,x3,...,xnP(x1, x2, x3, ..., xn)

Step by step:

1 max over the full distribution.

2 factorise the full distribution.

3 rearrange the max operator to reduce the computation.

4 eliminate the variables.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 28: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

VE for MAP inference

MAP inference:

(x∗1 , x∗2 , x∗3 , ..., x

∗n ) = argmax

x1,x2,x3,...,xnP(x1, x2, x3, ..., xn)

Step by step:

1 max over the full distribution.

2 factorise the full distribution.

3 rearrange the max operator to reduce the computation.

4 eliminate the variables.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 29: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

VE for MAP inference

MAP inference:

(x∗1 , x∗2 , x∗3 , ..., x

∗n ) = argmax

x1,x2,x3,...,xnP(x1, x2, x3, ..., xn)

Step by step:

1 max over the full distribution.

2 factorise the full distribution.

3 rearrange the max operator to reduce the computation.

4 eliminate the variables.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 30: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — BayesNets

A

CB

MAP inference argmaxA,B,C P(A, B, C)?

maxA,B,C

P(A, B, C) = maxA,B,C

P(B)P(C)P(A|B, C)

= maxA

{maxB

[P(B) max

C

(P(C)P(A|B, C)

)]}= max

A

{maxB

[P(B)m1(A, B)

]}(C eliminated, record its best assignment)

= maxA

m2(A) (B eliminated, record its best assignment, and A’s best assignment)

MAP solution?

argmax = A,B,C ’s best assignments.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 31: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — BayesNets

A

CB

MAP inference argmaxA,B,C P(A, B, C)?

maxA,B,C

P(A, B, C) = maxA,B,C

P(B)P(C)P(A|B, C)

= maxA

{maxB

[P(B) max

C

(P(C)P(A|B, C)

)]}= max

A

{maxB

[P(B)m1(A, B)

]}(C eliminated, record its best assignment)

= maxA

m2(A) (B eliminated, record its best assignment, and A’s best assignment)

MAP solution? argmax = A,B,C ’s best assignments.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 32: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — MRFs

X2

X4

X1

X3

maxx1,x2,x3,x4

P(x1, x2, x3, x4)

= maxx1,x2,x3,x4

ψ(x1, x2)ψ(x2, x3)ψ(x2, x4)ψ(x1)ψ(x2)ψ(x3)ψ(x4)

= maxx1,x2

[. . .max

x3

(ψ(x2, x3)ψ(x3)

)maxx4

(ψ(x2, x4)ψ(x4)

)]= max

x1

[ψ(x1) max

x2

(ψ(x2)ψ(x1, x2)m3→2(x2)m4→2(x2)

)]= max

x1

(ψ(x1)m2→1(x1)

)

argmax = recorded best assignments.What if you didn’t (or don’t want to) record the assignments?How to get them back?

Hint: x∗1 = argmaxx1

(ψ(x1)m2→1(x1)

)x∗2 ? x∗2 = argmaxx2

(ψ(x2)ψ(x∗1 , x2)m3→2(x2)m4→2(x2)

)x∗3 , x

∗4 ? x∗3 = argmaxx3

(ψ(x∗2 , x3)ψ(x3)

)x∗4 = argmaxx4

(ψ(x∗2 , x4)ψ(x4)

)Answer: backtrack the best assignments (in the reversed the elimination order)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 33: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — MRFs

X2

X4

X1

X3

maxx1,x2,x3,x4

P(x1, x2, x3, x4)

= maxx1,x2,x3,x4

ψ(x1, x2)ψ(x2, x3)ψ(x2, x4)ψ(x1)ψ(x2)ψ(x3)ψ(x4)

= maxx1,x2

[. . .max

x3

(ψ(x2, x3)ψ(x3)

)maxx4

(ψ(x2, x4)ψ(x4)

)]= max

x1

[ψ(x1) max

x2

(ψ(x2)ψ(x1, x2)m3→2(x2)m4→2(x2)

)]= max

x1

(ψ(x1)m2→1(x1)

)

argmax = recorded best assignments.

What if you didn’t (or don’t want to) record the assignments?How to get them back?

Hint: x∗1 = argmaxx1

(ψ(x1)m2→1(x1)

)x∗2 ? x∗2 = argmaxx2

(ψ(x2)ψ(x∗1 , x2)m3→2(x2)m4→2(x2)

)x∗3 , x

∗4 ? x∗3 = argmaxx3

(ψ(x∗2 , x3)ψ(x3)

)x∗4 = argmaxx4

(ψ(x∗2 , x4)ψ(x4)

)Answer: backtrack the best assignments (in the reversed the elimination order)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 34: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — MRFs

X2

X4

X1

X3

maxx1,x2,x3,x4

P(x1, x2, x3, x4)

= maxx1,x2,x3,x4

ψ(x1, x2)ψ(x2, x3)ψ(x2, x4)ψ(x1)ψ(x2)ψ(x3)ψ(x4)

= maxx1,x2

[. . .max

x3

(ψ(x2, x3)ψ(x3)

)maxx4

(ψ(x2, x4)ψ(x4)

)]= max

x1

[ψ(x1) max

x2

(ψ(x2)ψ(x1, x2)m3→2(x2)m4→2(x2)

)]= max

x1

(ψ(x1)m2→1(x1)

)

argmax = recorded best assignments.What if you didn’t (or don’t want to) record the assignments?How to get them back?

Hint: x∗1 = argmaxx1

(ψ(x1)m2→1(x1)

)x∗2 ? x∗2 = argmaxx2

(ψ(x2)ψ(x∗1 , x2)m3→2(x2)m4→2(x2)

)x∗3 , x

∗4 ? x∗3 = argmaxx3

(ψ(x∗2 , x3)ψ(x3)

)x∗4 = argmaxx4

(ψ(x∗2 , x4)ψ(x4)

)Answer: backtrack the best assignments (in the reversed the elimination order)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 35: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — MRFs

X2

X4

X1

X3

maxx1,x2,x3,x4

P(x1, x2, x3, x4)

= maxx1,x2,x3,x4

ψ(x1, x2)ψ(x2, x3)ψ(x2, x4)ψ(x1)ψ(x2)ψ(x3)ψ(x4)

= maxx1,x2

[. . .max

x3

(ψ(x2, x3)ψ(x3)

)maxx4

(ψ(x2, x4)ψ(x4)

)]= max

x1

[ψ(x1) max

x2

(ψ(x2)ψ(x1, x2)m3→2(x2)m4→2(x2)

)]= max

x1

(ψ(x1)m2→1(x1)

)

argmax = recorded best assignments.What if you didn’t (or don’t want to) record the assignments?How to get them back?

Hint: x∗1 = argmaxx1

(ψ(x1)m2→1(x1)

)

x∗2 ? x∗2 = argmaxx2

(ψ(x2)ψ(x∗1 , x2)m3→2(x2)m4→2(x2)

)x∗3 , x

∗4 ? x∗3 = argmaxx3

(ψ(x∗2 , x3)ψ(x3)

)x∗4 = argmaxx4

(ψ(x∗2 , x4)ψ(x4)

)Answer: backtrack the best assignments (in the reversed the elimination order)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 36: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — MRFs

X2

X4

X1

X3

maxx1,x2,x3,x4

P(x1, x2, x3, x4)

= maxx1,x2,x3,x4

ψ(x1, x2)ψ(x2, x3)ψ(x2, x4)ψ(x1)ψ(x2)ψ(x3)ψ(x4)

= maxx1,x2

[. . .max

x3

(ψ(x2, x3)ψ(x3)

)maxx4

(ψ(x2, x4)ψ(x4)

)]= max

x1

[ψ(x1) max

x2

(ψ(x2)ψ(x1, x2)m3→2(x2)m4→2(x2)

)]= max

x1

(ψ(x1)m2→1(x1)

)

argmax = recorded best assignments.What if you didn’t (or don’t want to) record the assignments?How to get them back?

Hint: x∗1 = argmaxx1

(ψ(x1)m2→1(x1)

)x∗2 ?

x∗2 = argmaxx2

(ψ(x2)ψ(x∗1 , x2)m3→2(x2)m4→2(x2)

)x∗3 , x

∗4 ? x∗3 = argmaxx3

(ψ(x∗2 , x3)ψ(x3)

)x∗4 = argmaxx4

(ψ(x∗2 , x4)ψ(x4)

)Answer: backtrack the best assignments (in the reversed the elimination order)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 37: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — MRFs

X2

X4

X1

X3

maxx1,x2,x3,x4

P(x1, x2, x3, x4)

= maxx1,x2,x3,x4

ψ(x1, x2)ψ(x2, x3)ψ(x2, x4)ψ(x1)ψ(x2)ψ(x3)ψ(x4)

= maxx1,x2

[. . .max

x3

(ψ(x2, x3)ψ(x3)

)maxx4

(ψ(x2, x4)ψ(x4)

)]= max

x1

[ψ(x1) max

x2

(ψ(x2)ψ(x1, x2)m3→2(x2)m4→2(x2)

)]= max

x1

(ψ(x1)m2→1(x1)

)

argmax = recorded best assignments.What if you didn’t (or don’t want to) record the assignments?How to get them back?

Hint: x∗1 = argmaxx1

(ψ(x1)m2→1(x1)

)x∗2 ? x∗2 = argmaxx2

(ψ(x2)ψ(x∗1 , x2)m3→2(x2)m4→2(x2)

)x∗3 , x

∗4 ?

x∗3 = argmaxx3

(ψ(x∗2 , x3)ψ(x3)

)x∗4 = argmaxx4

(ψ(x∗2 , x4)ψ(x4)

)Answer: backtrack the best assignments (in the reversed the elimination order)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 38: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — MRFs

X2

X4

X1

X3

maxx1,x2,x3,x4

P(x1, x2, x3, x4)

= maxx1,x2,x3,x4

ψ(x1, x2)ψ(x2, x3)ψ(x2, x4)ψ(x1)ψ(x2)ψ(x3)ψ(x4)

= maxx1,x2

[. . .max

x3

(ψ(x2, x3)ψ(x3)

)maxx4

(ψ(x2, x4)ψ(x4)

)]= max

x1

[ψ(x1) max

x2

(ψ(x2)ψ(x1, x2)m3→2(x2)m4→2(x2)

)]= max

x1

(ψ(x1)m2→1(x1)

)

argmax = recorded best assignments.What if you didn’t (or don’t want to) record the assignments?How to get them back?

Hint: x∗1 = argmaxx1

(ψ(x1)m2→1(x1)

)x∗2 ? x∗2 = argmaxx2

(ψ(x2)ψ(x∗1 , x2)m3→2(x2)m4→2(x2)

)x∗3 , x

∗4 ? x∗3 = argmaxx3

(ψ(x∗2 , x3)ψ(x3)

)x∗4 = argmaxx4

(ψ(x∗2 , x4)ψ(x4)

)Answer: backtrack the best assignments (in the reversed the elimination order)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 39: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

VE for marginal inferenceVE for MAP inference

Variable elimination — factor graphical models

Works too.Replace the ψ by factors f1, f2, ...

Qinfeng (Javen) Shi PGM 2 — Inference

Page 40: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Sum-productMax-product

Message Passing

Reuse the intermediate results (called messages) of VE⇒ Message Passing:

VE for marginal inference ⇒ sum-product message passing

VE for MAP inference ⇒ max-product message passing

Qinfeng (Javen) Shi PGM 2 — Inference

Page 41: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Sum-productMax-product

Revisit VE for marginal

Assume P(x1, x2, x3) =1

Zψ(x1, x2)ψ(x1, x3)ψ(x1)ψ(x2)ψ(x3)

X3X2

X1

P(x1) =1

Zψ(x1)

∑x2

(ψ(x1, x2)ψ(x2)

)∑x3

(ψ(x1, x3)ψ(x3)

)

=1

Zψ(x1)m2→1(x1)m3→1(x1)

P(x2) =1

Zψ(x2)

∑x1

(ψ(x1, x2)ψ(x1)

∑x3

[ψ(x1, x3)ψ(x3)

])

=1

Zψ(x2)

∑x1

ψ(x1, x2)ψ(x1)m3→1(x1)

=1

Zψ(x2)m1→2(x2)

m3→1(x1) can be reused instead of computing twice.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 42: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Sum-productMax-product

Sum-product

Can we compute all messages first, and then use them to computeall marginal distributions?

Yes, it’s called sum-product.

In general,

P(xi ) =1

Z

(ψ(xi )

∏j∈Ne(i)

mj→i (xi ))

mj→i (xi ) =∑xj

(ψ(xj)ψ(xi , xj)

∏k∈Ne(j)\{i}

mk→j(xj))

Ne(i): neighbouring nodes of i (i.e. nodes that connect with i).

Qinfeng (Javen) Shi PGM 2 — Inference

Page 43: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Sum-productMax-product

Sum-product

Can we compute all messages first, and then use them to computeall marginal distributions?

Yes, it’s called sum-product.

In general,

P(xi ) =1

Z

(ψ(xi )

∏j∈Ne(i)

mj→i (xi ))

mj→i (xi ) =∑xj

(ψ(xj)ψ(xi , xj)

∏k∈Ne(j)\{i}

mk→j(xj))

Ne(i): neighbouring nodes of i (i.e. nodes that connect with i).

Qinfeng (Javen) Shi PGM 2 — Inference

Page 44: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Sum-productMax-product

Sum-product

Can we compute all messages first, and then use them to computeall marginal distributions?

Yes, it’s called sum-product.

In general,

P(xi ) =1

Z

(ψ(xi )

∏j∈Ne(i)

mj→i (xi ))

mj→i (xi ) =∑xj

(ψ(xj)ψ(xi , xj)

∏k∈Ne(j)\{i}

mk→j(xj))

Ne(i): neighbouring nodes of i (i.e. nodes that connect with i).

Qinfeng (Javen) Shi PGM 2 — Inference

Page 45: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Sum-productMax-product

Revisit VE for MAP

X2

X4

X1

X3

maxx1,x2,x3,x4

P(x1, x2, x3, x4)

= maxx1,x2,x3,x4

ψ(x1, x2)ψ(x2, x3)ψ(x2, x4)ψ(x1)ψ(x2)ψ(x3)ψ(x4)

= maxx1,x2

[. . .max

x3

(ψ(x2, x3)ψ(x3)

)maxx4

(ψ(x2, x4)ψ(x4)

)]= max

x1

[ψ(x1) max

x2

(ψ(x2)ψ(x1, x2)m3→2(x2)m4→2(x2)

)]= max

x1

(ψ(x1)m2→1(x1)

)

x∗1 = argmaxx1

(ψ(x1)m2→1(x1)

)x∗2 = argmaxx2

(ψ(x2)ψ(x∗1 , x2)m3→2(x2)m4→2(x2)

)x∗3 = argmaxx3

(ψ(x∗2 , x3)ψ(x3)

)x∗4 = argmaxx4

(ψ(x∗2 , x4)ψ(x4)

)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 46: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Sum-productMax-product

Max-product

Variable elimination for MAP ⇒ Max-product:

x∗i = argmaxxi

(ψ(xi )

∏j∈Ne(i)

mj→i (xi ))

mj→i (xi ) = maxxj

(ψ(xj)ψ(xi , xj)

∏k∈Ne(j)\{i}

mk→j(xj))

Ne(i): neighbouring nodes of i (i.e. nodes that connect with i).

Ne(j)\{i} = ∅ if j has only one edge connecting it. e.g. x1, x3, x4.For such node j ,

mj→i (xi ) = maxxj

(ψ(xj)ψ(xi , xj)

)Easier computation!

Qinfeng (Javen) Shi PGM 2 — Inference

Page 47: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Sum-productMax-product

Max-product

Order matters: message m2→3(x3) requires m1→2(x2) and m4→2(x2).

X2

X4

X1

X3 X4X3

X2

X1

m1->2(X2)

m3->2(X2) m4->2(X2)

X4X3

X2

X1

m2->1(X1)

m2->3(X3) m2->4(X4)

Alternatively, leaves to root, and root to leaves.

X2

X4

X1

X3 X4X3

X2

X1

m2->1(X1)

m3->2(X2) m4->2(X2)

X4X3

X2

X1

m1->2(X2)

m2->3(X3) m2->4(X4)

Qinfeng (Javen) Shi PGM 2 — Inference

Page 48: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Sum-productMax-product

Extension

To avoid over/under flow, often operate in the log space.

Max/sum-product is also known as Message Passing and BeliefPropagation (BP).

In graphs with loops, running BP for several iterations is known asLoopy BP (neither convergence nor optimal guarantee in general).

Extend to Junction Tree Algorithm (exact, but expensive) andClusters-based BP.

Qinfeng (Javen) Shi PGM 2 — Inference

Page 49: PGM 2 | InferenceMarginal and MAP Inference How to infer? MAP When do we need MAP? nd the most likely con guration for (x i) i2V intesting. nd the most violated constraint generated

What are MAP and Marginal Inferences?Variable elimination

Message Passing

Sum-productMax-product

That’s all

Thanks!

Qinfeng (Javen) Shi PGM 2 — Inference