I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference...

72
INFERENCE IN BAYESIAN NETWORKS

Transcript of I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference...

Page 1: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

INFERENCE IN BAYESIAN NETWORKS

Page 2: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

AGENDA

Reading off independence assumptions Efficient inference in Bayesian Networks

Top-down inference Variable elimination Monte-Carlo methods

Page 3: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

SOME APPLICATIONS OF BN

Medical diagnosis Troubleshooting of hardware/software

systems Fraud/uncollectible debt detection Data mining Analysis of genetic sequences Data interpretation, computer vision, image

understanding

Page 4: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

MORE COMPLICATED SINGLY-CONNECTED BELIEF NET

Radio

Battery

SparkPlugs

Starts

Gas

Moves

Page 5: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

Region = {Sky, Tree, Grass, Rock}

R2

R4R3

R1

Above

Page 6: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

BN to evaluate insurance risks

Page 7: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

BN FROM LAST LECTURE

Burglary Earthquake

Alarm

MaryCallsJohnCalls

causes

effects

Directed acyclic graph

Intuitive meaning of arc from x to y:

“x has direct influence on y”

Page 8: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

ARCS DO NOT NECESSARILY ENCODE CAUSALITY!

A

B

C

C

B

A

2 BN’s that can encode the same joint probability distribution

Page 9: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

READING OFF INDEPENDENCE RELATIONSHIPS

Given B, does the value of A affect the probability of C? P(C|B,A) = P(C|B)?

No! C parent’s (B) are

given, and so it is independent of its non-descendents (A)

Independence is symmetric:C A | B => A C | B

A

B

C

Page 10: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

WHAT DOES THE BN ENCODE?

Burglary EarthquakeJohnCalls MaryCalls | AlarmJohnCalls Burglary | AlarmJohnCalls Earthquake | AlarmMaryCalls Burglary | AlarmMaryCalls Earthquake | Alarm

Burglary Earthquake

Alarm

MaryCallsJohnCalls

A node is independent of its non-descendents, given its parents

Page 11: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

READING OFF INDEPENDENCE RELATIONSHIPS

How about Burglary Earthquake | Alarm ? No! Why?

Burglary Earthquake

Alarm

MaryCallsJohnCalls

Page 12: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

READING OFF INDEPENDENCE RELATIONSHIPS

How about Burglary Earthquake | Alarm ? No! Why? P(BE|A) = P(A|B,E)P(BE)/P(A) = 0.00075 P(B|A)P(E|A) = 0.086

Burglary Earthquake

Alarm

MaryCallsJohnCalls

Page 13: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

READING OFF INDEPENDENCE RELATIONSHIPS

How about Burglary Earthquake | JohnCalls? No! Why? Knowing JohnCalls affects the probability of

Alarm, which makes Burglary and Earthquake dependent

Burglary Earthquake

Alarm

MaryCallsJohnCalls

Page 14: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

INDEPENDENCE RELATIONSHIPS

Rough intuition (this holds for tree-like graphs, polytrees): Evidence on the (directed) road between two

variables makes them independent Evidence on an “A” node makes descendants

independent Evidence on a “V” node, or below the V, makes

the ancestors of the variables dependent (otherwise they are independent)

Formal property in general case : D-separation independence (see R&N)

Page 15: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

BENEFITS OF SPARSE MODELS

Modeling Fewer relationships need to be encoded (either

through understanding or statistics) Large networks can be built up from smaller ones

Intuition Dependencies/independencies between variables

can be inferred through network structures Tractable inference

Page 16: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

TOP-DOWN INFERENCESuppose we want to compute P(Alarm)

Page 17: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

TOP-DOWN INFERENCESuppose we want to compute P(Alarm)1. P(Alarm) = Σb,e P(A,b,e)2. P(Alarm) = Σb,e P(A|b,e)P(b)P(e)

Page 18: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

TOP-DOWN INFERENCESuppose we want to compute P(Alarm)1. P(Alarm) = Σb,e P(A,b,e)2. P(Alarm) = Σb,e P(A|b,e)P(b)P(e)3. P(Alarm) = P(A|B,E)P(B)P(E) +

P(A|B, E)P(B)P(E) +P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)

Page 19: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

TOP-DOWN INFERENCESuppose we want to compute P(Alarm)1. P(A) = Σb,e P(A,b,e)2. P(A) = Σb,e P(A|b,e)P(b)P(e)3. P(A) = P(A|B,E)P(B)P(E) +

P(A|B, E)P(B)P(E) +P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)

4. P(A) = 0.95*0.001*0.002 +0.94*0.001*0.998 +0.29*0.999*0.002 +0.001*0.999*0.998= 0.00252

Page 20: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

TOP-DOWN INFERENCENow, suppose we want to compute P(MaryCalls)

Page 21: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

TOP-DOWN INFERENCENow, suppose we want to compute P(MaryCalls)1. P(M) = P(M|A)P(A) + P(M| A) P(A)

Page 22: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

TOP-DOWN INFERENCENow, suppose we want to compute P(MaryCalls)1. P(M) = P(M|A)P(A) + P(M| A) P(A)2. P(M) = 0.70*0.00252 + 0.01*(1-0.0252)

= 0.0117

Page 23: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

TOP-DOWN INFERENCE WITH EVIDENCE

Suppose we want to compute P(Alarm|Earthquake)

Page 24: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

TOP-DOWN INFERENCE WITH EVIDENCE

Suppose we want to compute P(A|e)1. P(A|e) = Σb P(A,b|e)2. P(A|e) = Σb P(A|b,e)P(b)

Page 25: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

TOP-DOWN INFERENCE WITH EVIDENCE

Suppose we want to compute P(A|e)1. P(A|e) = Σb P(A,b|e)2. P(A|e) = Σb P(A|b,e)P(b)3. P(A|e) = 0.95*0.001 +

0.29*0.999 += 0.29066

Page 26: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

TOP-DOWN INFERENCE

Only works if the graph of ancestors of a variable is a polytree

Evidence given on ancestor(s) of the query variable

Efficient: O(d 2k) time, where d is the number of ancestors

of a variable, with k a bound on # of parents Evidence on an ancestor cuts off influence of

portion of graph above evidence node

Page 27: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

QUERYING THE BN

The BN gives P(T|C) What about P(C|T)?

Cavity

Toothache

P(C)

0.1

C P(T|C)

TF

0.40.01111

Page 28: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

BAYES’ RULE

P(AB) = P(A|B) P(B)= P(B|A) P(A)

So… P(A|B) = P(B|A) P(A) / P(B)

Page 29: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

APPLYING BAYES’ RULE Let A be a cause, B be an effect, and let’s say we

know P(B|A) and P(A) (conditional probability tables)

What’s P(B)?

Page 30: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

APPLYING BAYES’ RULE Let A be a cause, B be an effect, and let’s say we

know P(B|A) and P(A) (conditional probability tables)

What’s P(B)? P(B) = Sa P(B,A=a) [marginalization]

P(B,A=a) = P(B|A=a)P(A=a) [conditional probability]

So, P(B) = Sa P(B | A=a) P(A=a)

Page 31: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

APPLYING BAYES’ RULE Let A be a cause, B be an effect, and let’s say we

know P(B|A) and P(A) (conditional probability tables)

What’s P(A|B)?

Page 32: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

APPLYING BAYES’ RULE Let A be a cause, B be an effect, and let’s say we

know P(B|A) and P(A) (conditional probability tables)

What’s P(A|B)? P(A|B) = P(B|A)P(A)/P(B) [Bayes

rule] P(B) = Sa P(B | A=a) P(A=a) [Last

slide] So, P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)]

Page 33: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

HOW DO WE READ THIS?

P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)] [An equation that holds for all values A can take on,

and all values B can take on] P(A=a|B=b) =

Page 34: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

HOW DO WE READ THIS?

P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)] [An equation that holds for all values A can take on,

and all values B can take on] P(A=a|B=b) = P(B=b|A=a)P(A=a) /

[Sa P(B=b | A=a) P(A=a)]

Are these the same a?

Page 35: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

HOW DO WE READ THIS?

P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)] [An equation that holds for all values A can take on,

and all values B can take on] P(A=a|B=b) = P(B=b|A=a)P(A=a) /

[Sa P(B=b | A=a) P(A=a)]

Are these the same a?

NO!

Page 36: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

HOW DO WE READ THIS?

P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)] [An equation that holds for all values A can take on,

and all values B can take on] P(A=a|B=b) = P(B=b|A=a)P(A=a) /

[Sa’ P(B=b | A=a’) P(A=a’)]

Be careful about indices!

Page 37: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

QUERYING THE BN The BN gives P(T|C) What about P(C|T)? P(Cavity|Toothache) =

P(Toothache|Cavity) P(Cavity)

P(Toothache)

[Bayes’ rule]

Querying a BN is just applying Bayes’ rule on a larger scale…

Cavity

Toothache

P(C)

0.1

C P(T|C)

TF

0.40.01111 Denominator computed by

summing out numerator over Cavity and Cavity

Page 38: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

PERFORMING INFERENCE

Variables X Have evidence set E=e, query variable Q Want to compute the posterior probability

distribution over Q, given E=e Let the non-evidence variables be Y (= X \ E) Straight forward method:

1. Compute joint P(YE=e)2. Marginalize to get P(Q,E=e)3. Divide by P(E=e) to get P(Q|E=e)

Page 39: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

INFERENCE IN THE ALARM EXAMPLE

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

P(J|M) = ??

Query Q

Evidence E=e

Page 40: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

INFERENCE IN THE ALARM EXAMPLE

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

P(J|MaryCalls) = ??

1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi))

full joint distribution table

24 entries

Page 41: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

INFERENCE IN THE ALARM EXAMPLE

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

P(J|MaryCalls) = ??

1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)

2 entries:one for JohnCalls,the other for JohnCalls

Page 42: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

INFERENCE IN THE ALARM EXAMPLE

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

P(J|MaryCalls) = ??

1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)

3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls)= P(J,MaryCalls)/(SjP(j,MaryCalls))

Page 43: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

HOW EXPENSIVE?

P(X) = P(x1x2…xn) = Pi=1,…,n P(xi|parents(Xi))

Straightforward method:1. Use above to compute P(Y,E=e)2. P(Q,E=e) = Sy1 … Syk P(Y,E=e)

3. P(E=e) = Sq P(Q,E=e) Step 1: O( 2n-|E| ) entries!

Normalization factor – no big deal once we have P(Q,E=e)

Can we do better?

Page 44: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

VARIABLE ELIMINATION

Consider linear network X1X2X3

P(X) = P(X1) P(X2|X1) P(X3|X2)

P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)

Page 45: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

VARIABLE ELIMINATION

Consider linear network X1X2X3

P(X) = P(X1) P(X2|X1) P(X3|X2)

P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)

= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)

Rearrange equation…

Page 46: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

VARIABLE ELIMINATION

Consider linear network X1X2X3

P(X) = P(X1) P(X2|X1) P(X3|X2)

P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)

= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)

= Σx2 P(X3|x2) P(x2)Computed for each value of X2

Cache P(x2) for both values of X3!

Page 47: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

VARIABLE ELIMINATION

Consider linear network X1X2X3

P(X) = P(X1) P(X2|X1) P(X3|X2)

P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)

= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)

= Σx2 P(X3|x2) P(x2)Computed for each value of X2

How many * and + saved?*: 2*4*2=16 vs 4+4=8+ 2*3=8 vs 2+1=3

Can lead to huge gains in larger networks

Page 48: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

VE IN ALARM EXAMPLE

P(E|j,m)=P(E,j,m)/P(j,m) P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

Page 49: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

VE IN ALARM EXAMPLE

P(E|j,m)=P(E,j,m)/P(j,m) P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)

Page 50: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

VE IN ALARM EXAMPLE

P(E|j,m)=P(E,j,m)/P(j,m) P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) P(j,m|E,b) Compute for all values of E,b

Page 51: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

VE IN ALARM EXAMPLE

P(E|j,m)=P(E,j,m)/P(j,m) P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) P(j,m|E,b)

= P(E) P(j,m|E) Compute for all values of E

Page 52: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

WHAT ORDER TO PERFORM VE?

For tree-like BNs (polytrees), order so parents come before children # of variables in each intermediate probability

table is 2^(# of parents of a node) If the number of parents of a node is

bounded, then VE is linear time!

Other networks: intermediate factors may become large

Page 53: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

NON-POLYTREE NETWORKS

P(D) = Σa Σb Σc P(A)P(B|A)P(C|A)P(D|B,C) = Σb Σc P(D|B,C) Σa P(A)P(B|A)P(C|A)

A

B C

D

No more simplifications…

Page 54: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

APPROXIMATE INFERENCE TECHNIQUES

Based on the idea of Monte Carlo simulation Basic idea:

To estimate the probability of a coin flipping heads, I can flip it a huge number of times and count the fraction of heads observed

Conditional simulation: To estimate the probability P(H) that a coin picked

out of bucket B flips heads, I can:1. Pick a coin C out of B (occurs with probability P(C))2. Flip C and observe whether it flips heads (occurs

with probability P(H|C))3. Put C back and repeat from step 1 many times4. Return the fraction of heads observed (estimate of

P(H))

Page 55: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

APPROXIMATE INFERENCE: MONTE-CARLO SIMULATION

Sample from the joint distribution

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

B=0E=0A=0J=1M=0

Page 56: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

APPROXIMATE INFERENCE: MONTE-CARLO SIMULATION

As more samples are generated, the distribution of the samples approaches the joint distribution!

B=0E=0A=0J=1M=0

B=0E=0A=0J=0M=0

B=0E=0A=0J=0M=0

B=1E=0A=1J=1M=0

Page 57: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

APPROXIMATE INFERENCE: MONTE-CARLO SIMULATION

Inference: given evidence E=e (e.g., J=1) Remove the samples that conflict

B=0E=0A=0J=1M=0

B=0E=0A=0J=0M=0

B=0E=0A=0J=0M=0

B=1E=0A=1J=1M=0

Distribution of remaining samples approximates the conditional distribution!

Page 58: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

HOW MANY SAMPLES?

Error of estimate, for n samples, is on average

Variance-reduction techniques

Page 59: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

RARE EVENT PROBLEM:

What if some events are really rare (e.g., burglary & earthquake ?)

# of samples must be huge to get a reasonable estimate

Solution: likelihood weighting Enforce that each sample agrees with evidence While generating a sample, keep track of the

ratio of(how likely the sampled value is to occur in the real world)

(how likely you were to generate the sampled value)

Page 60: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

LIKELIHOOD WEIGHTING

Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

w=1

Page 61: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

LIKELIHOOD WEIGHTING

Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

B=0E=1

w=0.008

Page 62: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

LIKELIHOOD WEIGHTING

Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

B=0E=1A=1

w=0.0023

A=1 is enforced, and the weight updated to reflect the likelihood that this occurs

Page 63: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

LIKELIHOOD WEIGHTING

Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

B=0E=1A=1M=1J=1

w=0.0016

Page 64: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

LIKELIHOOD WEIGHTING

Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

B=0E=0

w=3.988

Page 65: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

LIKELIHOOD WEIGHTING

Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

B=0E=0A=1

w=0.004

Page 66: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

LIKELIHOOD WEIGHTING

Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

B=0E=0A=1M=1J=1

w=0.0028

Page 67: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

LIKELIHOOD WEIGHTING

Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

B=1E=0A=1

w=0.00375

Page 68: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

LIKELIHOOD WEIGHTING

Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

B=1E=0A=1M=1J=1

w=0.0026

Page 69: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

LIKELIHOOD WEIGHTING

Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)

0.001

P(E)

0.002

A P(J|…)

TF

0.900.05

A P(M|…)

TF

0.700.01

B=1E=1A=1M=1J=1

w=5e-7

Page 70: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

LIKELIHOOD WEIGHTING

Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5

N=4 gives P(B|A,M)~=0.371 Exact inference gives P(B|A,M) = 0.375

B=0E=1A=1M=1J=1

w=0.0016

B=0E=0A=1M=1J=1

w=0.0028

B=1E=0A=1M=1J=1

w=0.0026

B=1E=1A=1M=1J=1

w~=0

Page 71: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

RECAP

Efficient inference in BNs Variable elimination Approximate methods: Monte-Carlo sampling

Page 72: I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination.

NEXT LECTURE

Statistical learning: from data to distributions R&N 20.1-2