Bayesian Networks

55
BAYESIAN NETWORKS

description

Bayesian Networks. Some Applications of BN. Medical diagnosis Troubleshooting of hardware/software systems Fraud/uncollectible debt detection Data mining Analysis of genetic sequences Data interpretation, computer vision, image understanding. Battery. Gas. Radio. SparkPlugs. Starts. - PowerPoint PPT Presentation

Transcript of Bayesian Networks

Page 1: Bayesian Networks

BAYESIAN NETWORKS

Page 2: Bayesian Networks

SOME APPLICATIONS OF BN Medical diagnosis Troubleshooting of hardware/software

systems Fraud/uncollectible debt detection Data mining Analysis of genetic sequences Data interpretation, computer vision, image

understanding

Page 3: Bayesian Networks

MORE COMPLICATED SINGLY-CONNECTED BELIEF NET

Radio

Battery

SparkPlugs

Starts

Gas

Moves

Page 4: Bayesian Networks

Region = {Sky, Tree, Grass, Rock}

R2

R4R3

R1

Above

Page 5: Bayesian Networks
Page 6: Bayesian Networks

CALCULATION OF JOINT PROBABILITY

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)0.001

P(E)0.002

A P(J|…)TF

0.900.05

A P(M|…)

TF

0.700.01

P(JMABE)= P(J|A)P(M|A)P(A|B,E)P(B)P(E)= 0.9 x 0.7 x 0.001 x 0.999 x 0.998= 0.00062

P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi)) full joint distribution table

Page 7: Bayesian Networks

WHAT DOES THE BN ENCODE?

Burglary EarthquakeJohnCalls MaryCalls | AlarmJohnCalls Burglary | AlarmJohnCalls Earthquake | AlarmMaryCalls Burglary | AlarmMaryCalls Earthquake | Alarm

Burglary Earthquake

Alarm

MaryCallsJohnCalls

A node is independent of its non-descendents, given its parents

Page 8: Bayesian Networks

PROBABILISTIC INFERENCE Is the following problem…. Given:

A belief state P(X1,…,Xn) in some form (e.g., a Bayes net or a joint probability table)

A query variable indexed by q Some subset of evidence variables indexed by

e1,…,ek

Find: P(Xq | Xe1 ,…, Xek)

Page 9: Bayesian Networks

B E P(A|…)

TTFF

TFTF

0.950.940.290.001

Burglary Earthquake

Alarm

MaryCallsJohnCalls

P(B)0.001

P(E)0.002

A P(J|…)TF

0.900.05

A P(M|…)

TF

0.700.01

TOP-DOWN INFERENCE: RECURSIVE COMPUTATION OF ALL MARGINALS DOWNSTREAM OF EVIDENCE

P(A|E) = P(A|B,E)P(B) +P(A| B,E)P(B)

P(J|E) = P(J|A,E)P(A) +P(J| A,E)P(A) P(M|E) = P(M|A,E)P(A) +

P(M| A,E)P(A)

Page 10: Bayesian Networks

TOP-DOWN INFERENCE Only works if the graph of ancestors of a

variable is a polytree Evidence given on ancestor(s) of the query

variable Efficient:

O(d 2k) time, where d is the number of ancestors of a variable, with k a bound on # of parents

Evidence on an ancestor cuts off influence of portion of graph above evidence node

Page 11: Bayesian Networks

QUERYING THE BN The BN gives P(T|C) P(C|T) can be computed using

Bayes rule:

P(A|B) = P(B|A) P(A) / P(B)

Cavity

Toothache

P(C)0.1

C P(T|C)TF

0.40.01111

Page 12: Bayesian Networks

QUERYING THE BN The BN gives P(T|C) What about P(C|T)? P(Cavity|Toothache) =

P(Toothache|Cavity) P(Cavity)

P(Toothache)

[Bayes’ rule]

Querying a BN is just applying Bayes’ rule on a larger scale…

Cavity

Toothache

P(C)0.1

C P(T|C)TF

0.40.01111 Denominator computed by

summing out numerator over Cavity and Cavity

Page 13: Bayesian Networks

NAÏVE BAYES MODELS P(Cause,Effect1,…,Effectn)

= P(Cause) Pi P(Effecti | Cause)

Cause

Effect1 Effect2 Effectn

Page 14: Bayesian Networks

NAÏVE BAYES CLASSIFIER P(Class,Feature1,…,Featuren)

= P(Class) Pi P(Featurei | Class)

Class

Feature1 Feature2 Featuren

P(C|F1,….,Fk) = P(C,F1,….,Fk)/P(F1,….,Fk)

= 1/Z P(C) Pi P(Fi|C)

Given features, what class?

Spam / Not SpamEnglish / French/ Latin

Word occurrences

Page 15: Bayesian Networks

COMMENTS ON NAÏVE BAYES MODELS Very scalable (thousands or millions of

features!), easy to implement Easily handles missing data: just ignore the

feature Conditional independence of features is main

weakness. What if two features were actually correlated? Many features?

Page 16: Bayesian Networks

VARIABLE ELIMINATION: PROBABILISTIC INFERENCE IN GENERAL NETWORKS

Coherence

Difficulty Intelligence

Happy

Grade SAT

Letter

Job

Basic idea: Eliminate “nuisance” variables one

at a time via marginalization

Example: P(J)

Elimination order: C,D,I,H,G,S,L

Page 17: Bayesian Networks

Coherence

Difficulty Intelligence

Happy

Grade SAT

Letter

Job

P(D|C)

P(C)

P(I)

P(G|I,D)

P(H|G,J)

P(J|S,L)

P(S|I)

Page 18: Bayesian Networks

Coherence

Difficulty Intelligence

Happy

Grade SAT

Letter

Job

P(D|C)

P(C)

ELIMINATING C

Page 19: Bayesian Networks

Difficulty Intelligence

Happy

Grade SAT

Letter

Job

P(D)=cP(D|C)P(C)

C IS ELIMINATED, GIVING A NEW FACTOR OVER D

Page 20: Bayesian Networks

Difficulty Intelligence

Happy

Grade SAT

Letter

Job

P(D)

ELIMINATING D

P(G|I,D)

Page 21: Bayesian Networks

Intelligence

Happy

Grade SAT

Letter

Job

D IS ELIMINATED, GIVING A NEW FACTOR OVER G, I

P(G|I)=dP(G|I,d)P(d)

Page 22: Bayesian Networks

Intelligence

Happy

Grade SAT

Letter

Job

ELIMINATING I

P(G|I) P(S|I)

P(I)

Page 23: Bayesian Networks

Happy

Grade SAT

Letter

Job

I IS ELIMINATED, PRODUCING A NEW FILL EDGE AND FACTOR OVER G AND S

P(G,S)=iP(i)P(G|i)P(S|i)

New undirected fill edge

Page 24: Bayesian Networks

Happy

Grade SAT

Letter

Job

ELIMINATING H

P(H|G,J)

Page 25: Bayesian Networks

Happy

Grade SAT

Letter

Job

ELIMINATING H

P(H|G,J)

fGJ(G,J)=hP(h|G,J)=1

Page 26: Bayesian Networks

Grade SAT

Letter

Job

H IS ELIMINATED, PRODUCING A NEW FILL EDGE AND FACTOR OVER G, J

fGJ(G,J)

Page 27: Bayesian Networks

Grade SAT

Letter

Job

ELIMINATING G

fGJ(G,J)

P(G,S)

P(L|G)

Page 28: Bayesian Networks

Grade SAT

Letter

Job

G IS ELIMINATED, MAKING A NEW TRINARY FACTOR OVER S,L,J AND A NEW FILL EDGE

fGJ(G,J)

P(G,S)

P(L|G)

fSLJ(S,L,J) = g P(g,S) P(L|g) fGJ(g,J)

Page 29: Bayesian Networks

SAT

Letter

Job

ELIMINATING S

fSLJ(S,L,J)

P(J|S,L)

Page 30: Bayesian Networks

SAT

Letter

Job

S IS ELIMINATED, CREATING A NEW FACTOR OVER L, J

fSLJ(S,L,J)

P(J|S,L)

fLJ(L,J) = s fSLJ(s,L,J) P(J|s ,L)

Page 31: Bayesian Networks

Letter

Job

ELIMINATING L

fLJ(L,J)

Page 32: Bayesian Networks

Letter

Job

L IS ELIMINATED, GIVING A NEW FACTOR OVER J (WHICH TURNS OUT TO BE P(J))

fLJ(L,J)

P(J)=l fLJ(l,J)

Page 33: Bayesian Networks

Job

L IS ELIMINATED, GIVING A NEW FACTOR OVER J (WHICH TURNS OUT TO BE P(J))

P(J)

Page 34: Bayesian Networks

JOINT DISTRIBUTION P(X) = P(C)P(D|C)P(I)P(G|I,D)P(S|I)P(L|G) P(J|

L,S)P(H|G,J) Apply elimination ordering C,D,I,H,G,S,L

Page 35: Bayesian Networks

GOING THROUGH VE P(X) = P(C)P(D|C)P(I)P(G|I,D)P(S|I)P(L|G) P(J|

L,S)P(H|G,J) Apply elimination ordering C,D,I,H,G,S,L

fD(D)=CP(C)P(D|C)

Page 36: Bayesian Networks

GOING THROUGH VE CP(X) = fD(D)P(I)P(G|I,D)P(S|I)P(L|G) P(J|

L,S)P(H|G,J) Apply elimination ordering C,D,I,H,G,S,L

fD(D)=CP(C)P(D|C)

Page 37: Bayesian Networks

GOING THROUGH VE CP(X) = fD(D)P(I)P(G|I,D)P(S|I)P(L|G) P(J|

L,S)P(H|G,J) Apply elimination ordering C,D,I,H,G,S,L

fGI(G,I)=DfD(D)P(G|I,D)

Page 38: Bayesian Networks

GOING THROUGH VE C,DP(X) = fGI(G,I)P(I)P(S|I)P(L|G) P(J|

L,S)P(H|G,J) Apply elimination ordering C,D,I,H,G,S,L

fGI(G,I)=DfD(D)P(G|I,D)

Page 39: Bayesian Networks

GOING THROUGH VE C,DP(X) = fGI(G,I)P(I)P(S|I)P(L|G) P(J|

L,S)P(H|G,J) Apply elimination ordering C,D,I,H,G,S,L

fGS(G,S)=IfGI(G,I)P(I)P(S|I)

Page 40: Bayesian Networks

GOING THROUGH VE C,D,IP(X) = fGS(G,S)P(L|G)P(J|L,S)P(H|G,J) Apply elimination ordering C,D,I,H,G,S,L

fGS(G,S)=IfGI(G,I)P(I)P(S|I)

Page 41: Bayesian Networks

GOING THROUGH VE C,D,IP(X) = fGS(G,S)P(L|G)P(J|L,S)P(H|G,J) Apply elimination ordering C,D,I,H,G,S,L

fGJ(G,J)=HP(H|G,J)

What values does this factor store?

Page 42: Bayesian Networks

GOING THROUGH VE C,D,I,HP(X) = fGS(G,S)P(L|G)P(J|L,S)fGJ(G,J) Apply elimination ordering C,D,I,H,G,S,L

fGJ(G,J)=HP(H|G,J)

Page 43: Bayesian Networks

GOING THROUGH VE C,D,I,HP(X) = fGS(G,S)P(L|G)P(J|L,S)fGJ(G,J) Apply elimination ordering C,D,I,H,G,S,L

fSLJ(S,L,J)=G fGS(G,S)P(L|G)fGJ(G,J)

Page 44: Bayesian Networks

GOING THROUGH VE C,D,I,H,GP(X) = fSLJ(S,L,J)P(J|L,S) Apply elimination ordering C,D,I,H,G,S,L

fSLJ(S,L,J)=G fGS(G,S)P(L|G)fGJ(G,J)

Page 45: Bayesian Networks

GOING THROUGH VE C,D,I,H,GP(X) = fSLJ(S,L,J)P(J|L,S) Apply elimination ordering C,D,I,H,G,S,L

fLJ(L,J)=S fSLJ(S,L,J)P(J|L,S)

Page 46: Bayesian Networks

GOING THROUGH VE C,D,I,H,G,SP(X) = fLJ(L,J) Apply elimination ordering C,D,I,H,G,S,L

fLJ(L,J)=S fSLJ(S,L,J)

Page 47: Bayesian Networks

GOING THROUGH VE C,D,I,H,G,SP(X) = fLJ(L,J) Apply elimination ordering C,D,I,H,G,S,L

fJ(J)=L fLJ(L,J)

Page 48: Bayesian Networks

GOING THROUGH VE C,D,I,H,G,S,LP(X) = fJ(J) Apply elimination ordering C,D,I,H,G,S,L

fJ(J)=L fLJ(L,J)

Page 49: Bayesian Networks

ORDER-DEPENDENCE

Page 50: Bayesian Networks

ORDER MATTERSCoherence

Difficulty Intelligence

Happy

Grade SAT

Letter

Job

If we were to eliminate G first, we’d create a factor over D, I, L, and H (their distribution becomes coupled)

Page 51: Bayesian Networks

ELIMINATION ORDER MATTERSCoherence

Difficulty Intelligence

Happy

SAT

Letter

Job

If we were to eliminate G first, we’d create a factor over D, I, L, and H (their distribution becomes coupled)

fDILH(D,I,L,H) = g P(g|D,I)*P(L|g)*P(H|g)

Page 52: Bayesian Networks

COMPLEXITY In polytree networks where each node has at

most k parents, O(n2k) with top-down ordering

In other networks, intermediate factors may involve more than k terms Worst case O(n) Good ordering heuristics exist, e.g. min-

neighbors, min-fill

Exact inference on non-polytree networks is NP-hard!

Page 53: Bayesian Networks

VARIABLE ELIMINATION WITH EVIDENCE

Coherence

Difficulty Intelligence

Happy

Grade SAT

Letter

Job

Two-step process:1. Find P(X,e) with VE2. Normalize by P(e)

Example: P(J|H)

1. Run VE, enforcing H=T when H is eliminated.

2. This produces P(J,H=T) (a factor over J)

3. P(J=T|H=T) = P(J=T,H=T) / (P(J=T,H=T)+P(J=F,H=T))

Page 54: Bayesian Networks

RECAP Exact inference techniques Top-down inference: linear time when

ancestors of query variable are polytree, evidence is on ancestors

Bottom-up inference in Naïve Bayes models General inference using Variable Elimination

(We’ll come back to approximation techniques in a week.)

Page 55: Bayesian Networks

NEXT TIME Learning Bayes nets R&N 20.1-2