Bayesian Networks

BAYESIAN NETWORKS

SOME APPLICATIONS OF BN Medical diagnosis Troubleshooting of hardware/software

systems Fraud/uncollectible debt detection Data mining Analysis of genetic sequences Data interpretation, computer vision, image

understanding

MORE COMPLICATED SINGLY-CONNECTED BELIEF NET

Radio

Battery

SparkPlugs

Starts

Gas

Moves

Region = {Sky, Tree, Grass, Rock}

R2

R4R3

R1

Above

CALCULATION OF JOINT PROBABILITY

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)0.001

P(E)0.002

A P(J|…)TF

0.900.05

A P(M|…)

TF

0.700.01

P(JMABE)= P(J|A)P(M|A)P(A|B,E)P(B)P(E)= 0.9 x 0.7 x 0.001 x 0.999 x 0.998= 0.00062

P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi)) full joint distribution table

WHAT DOES THE BN ENCODE?

Burglary EarthquakeJohnCalls MaryCalls | AlarmJohnCalls Burglary | AlarmJohnCalls Earthquake | AlarmMaryCalls Burglary | AlarmMaryCalls Earthquake | Alarm

Burglary Earthquake

Alarm

MaryCallsJohnCalls

A node is independent of its non-descendents, given its parents

PROBABILISTIC INFERENCE Is the following problem…. Given:

A belief state P(X1,…,Xn) in some form (e.g., a Bayes net or a joint probability table)

A query variable indexed by q Some subset of evidence variables indexed by

e1,…,ek

Find: P(Xq | Xe1 ,…, Xek)

TOP-DOWN INFERENCE Only works if the graph of ancestors of a

variable is a polytree Evidence given on ancestor(s) of the query

variable Efficient:

O(d 2k) time, where d is the number of ancestors of a variable, with k a bound on # of parents

Evidence on an ancestor cuts off influence of portion of graph above evidence node

QUERYING THE BN The BN gives P(T|C) P(C|T) can be computed using

Bayes rule:

P(A|B) = P(B|A) P(A) / P(B)

Cavity

Toothache

P(C)0.1

C P(T|C)TF

0.40.01111

QUERYING THE BN The BN gives P(T|C) What about P(C|T)? P(Cavity|Toothache) =

P(Toothache|Cavity) P(Cavity)

P(Toothache)

[Bayes’ rule]

Querying a BN is just applying Bayes’ rule on a larger scale…

Cavity

Toothache

P(C)0.1

C P(T|C)TF

0.40.01111 Denominator computed by

summing out numerator over Cavity and Cavity

NAÏVE BAYES MODELS P(Cause,Effect1,…,Effectn)

= P(Cause) Pi P(Effecti | Cause)

Cause

Effect1 Effect2 Effectn

NAÏVE BAYES CLASSIFIER P(Class,Feature1,…,Featuren)

= P(Class) Pi P(Featurei | Class)

Class

Feature1 Feature2 Featuren

P(C|F1,….,Fk) = P(C,F1,….,Fk)/P(F1,….,Fk)

= 1/Z P(C) Pi P(Fi|C)

Given features, what class?

Spam / Not SpamEnglish / French/ Latin

…

Word occurrences

COMMENTS ON NAÏVE BAYES MODELS Very scalable (thousands or millions of

features!), easy to implement Easily handles missing data: just ignore the

feature Conditional independence of features is main

weakness. What if two features were actually correlated? Many features?

VARIABLE ELIMINATION: PROBABILISTIC INFERENCE IN GENERAL NETWORKS

Coherence

Difficulty Intelligence

Happy

Grade SAT

Letter

Job

Basic idea: Eliminate “nuisance” variables one

at a time via marginalization

Example: P(J)

Elimination order: C,D,I,H,G,S,L

Coherence


Happy

Grade SAT

Letter

Job

P(D|C)

P(C)

ELIMINATING C


Happy

Grade SAT

Letter

Job

P(D)=cP(D|C)P(C)

C IS ELIMINATED, GIVING A NEW FACTOR OVER D


Happy

Grade SAT

Letter

Job

P(D)

ELIMINATING D

P(G|I,D)

Intelligence

Happy

Grade SAT

Letter

Job

D IS ELIMINATED, GIVING A NEW FACTOR OVER G, I

P(G|I)=dP(G|I,d)P(d)

Intelligence

Happy

Grade SAT

Letter

Job

ELIMINATING I

P(G|I) P(S|I)

P(I)

Happy

Grade SAT

Letter

Job

I IS ELIMINATED, PRODUCING A NEW FILL EDGE AND FACTOR OVER G AND S

P(G,S)=iP(i)P(G|i)P(S|i)

New undirected fill edge

Happy

Grade SAT

Letter

Job

ELIMINATING H

P(H|G,J)

Happy

Grade SAT

Letter

Job

ELIMINATING H

P(H|G,J)

fGJ(G,J)=hP(h|G,J)=1

Grade SAT

Letter

Job

H IS ELIMINATED, PRODUCING A NEW FILL EDGE AND FACTOR OVER G, J

fGJ(G,J)

Grade SAT

Letter

Job

ELIMINATING G

fGJ(G,J)

P(G,S)

P(L|G)

Grade SAT

Letter

Job

G IS ELIMINATED, MAKING A NEW TRINARY FACTOR OVER S,L,J AND A NEW FILL EDGE

fGJ(G,J)

P(G,S)

P(L|G)

fSLJ(S,L,J) = g P(g,S) P(L|g) fGJ(g,J)

SAT

Letter

Job

ELIMINATING S

fSLJ(S,L,J)

P(J|S,L)

SAT

Letter

Job

S IS ELIMINATED, CREATING A NEW FACTOR OVER L, J

fSLJ(S,L,J)

P(J|S,L)

fLJ(L,J) = s fSLJ(s,L,J) P(J|s ,L)

Letter

Job

ELIMINATING L

fLJ(L,J)

Letter

Job

L IS ELIMINATED, GIVING A NEW FACTOR OVER J (WHICH TURNS OUT TO BE P(J))

fLJ(L,J)

P(J)=l fLJ(l,J)

Job

L IS ELIMINATED, GIVING A NEW FACTOR OVER J (WHICH TURNS OUT TO BE P(J))

P(J)

GOING THROUGH VE P(X) = P(C)P(D|C)P(I)P(G|I,D)P(S|I)P(L|G) P(J|


fD(D)=CP(C)P(D|C)

GOING THROUGH VE CP(X) = fD(D)P(I)P(G|I,D)P(S|I)P(L|G) P(J|


fD(D)=CP(C)P(D|C)

GOING THROUGH VE CP(X) = fD(D)P(I)P(G|I,D)P(S|I)P(L|G) P(J|


fGI(G,I)=DfD(D)P(G|I,D)

GOING THROUGH VE C,DP(X) = fGI(G,I)P(I)P(S|I)P(L|G) P(J|


fGI(G,I)=DfD(D)P(G|I,D)

GOING THROUGH VE C,DP(X) = fGI(G,I)P(I)P(S|I)P(L|G) P(J|


fGS(G,S)=IfGI(G,I)P(I)P(S|I)

GOING THROUGH VE C,D,IP(X) = fGS(G,S)P(L|G)P(J|L,S)P(H|G,J) Apply elimination ordering C,D,I,H,G,S,L

fGS(G,S)=IfGI(G,I)P(I)P(S|I)

GOING THROUGH VE C,D,IP(X) = fGS(G,S)P(L|G)P(J|L,S)P(H|G,J) Apply elimination ordering C,D,I,H,G,S,L

fGJ(G,J)=HP(H|G,J)

What values does this factor store?

GOING THROUGH VE C,D,I,HP(X) = fGS(G,S)P(L|G)P(J|L,S)fGJ(G,J) Apply elimination ordering C,D,I,H,G,S,L

fGJ(G,J)=HP(H|G,J)

GOING THROUGH VE C,D,I,HP(X) = fGS(G,S)P(L|G)P(J|L,S)fGJ(G,J) Apply elimination ordering C,D,I,H,G,S,L

fSLJ(S,L,J)=G fGS(G,S)P(L|G)fGJ(G,J)

GOING THROUGH VE C,D,I,H,GP(X) = fSLJ(S,L,J)P(J|L,S) Apply elimination ordering C,D,I,H,G,S,L

fSLJ(S,L,J)=G fGS(G,S)P(L|G)fGJ(G,J)

GOING THROUGH VE C,D,I,H,GP(X) = fSLJ(S,L,J)P(J|L,S) Apply elimination ordering C,D,I,H,G,S,L

fLJ(L,J)=S fSLJ(S,L,J)P(J|L,S)

GOING THROUGH VE C,D,I,H,G,SP(X) = fLJ(L,J) Apply elimination ordering C,D,I,H,G,S,L

fLJ(L,J)=S fSLJ(S,L,J)

GOING THROUGH VE C,D,I,H,G,SP(X) = fLJ(L,J) Apply elimination ordering C,D,I,H,G,S,L

fJ(J)=L fLJ(L,J)

GOING THROUGH VE C,D,I,H,G,S,LP(X) = fJ(J) Apply elimination ordering C,D,I,H,G,S,L

fJ(J)=L fLJ(L,J)

ORDER-DEPENDENCE

ORDER MATTERSCoherence


Happy

Grade SAT

Letter

Job

If we were to eliminate G first, we’d create a factor over D, I, L, and H (their distribution becomes coupled)

ELIMINATION ORDER MATTERSCoherence


Happy

SAT

Letter

Job

If we were to eliminate G first, we’d create a factor over D, I, L, and H (their distribution becomes coupled)

fDILH(D,I,L,H) = g P(g|D,I)*P(L|g)*P(H|g)

COMPLEXITY In polytree networks where each node has at

most k parents, O(n2k) with top-down ordering

In other networks, intermediate factors may involve more than k terms Worst case O(n) Good ordering heuristics exist, e.g. min-

neighbors, min-fill

Exact inference on non-polytree networks is NP-hard!

VARIABLE ELIMINATION WITH EVIDENCE

Coherence


Happy

Grade SAT

Letter

Job

Two-step process:1. Find P(X,e) with VE2. Normalize by P(e)

Example: P(J|H)

1. Run VE, enforcing H=T when H is eliminated.

2. This produces P(J,H=T) (a factor over J)

3. P(J=T|H=T) = P(J=T,H=T) / (P(J=T,H=T)+P(J=F,H=T))

RECAP Exact inference techniques Top-down inference: linear time when

ancestors of query variable are polytree, evidence is on ancestors

Bottom-up inference in Naïve Bayes models General inference using Variable Elimination

(We’ll come back to approximation techniques in a week.)

NEXT TIME Learning Bayes nets R&N 20.1-2

Bayesian Networks

Documents

Transcript of Bayesian Networks