Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian...

36
Bayesian Statistics and Belief Networks

Transcript of Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian...

Page 1: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Bayesian Statistics and Belief Networks

Page 2: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Overview

• Book: Ch 13,14

• Refresher on Probability

• Bayesian classifiers

• Belief Networks / Bayesian Networks

Page 3: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Why Should We Care?

• Theoretical framework for machine learning, classification, knowledge representation, analysis

• Bayesian methods are capable of handling noisy, incomplete data sets

• Bayesian methods are commonly in use today

Page 4: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Bayesian Approach To Probability and Statistics

• Classical Probability : Physical property of the world (e.g., 50% flip of a fair coin). True probability.

• Bayesian Probability : A person’s degree of belief in event X. Personal probability.

• Unlike classical probability, Bayesian probabilities benefit from but do not require repeated trials - only focus on next event; e.g. probability Seawolves win next game?

Page 5: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Uncertainty

Page 6: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Methods for Handling Uncertainty

Page 7: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Probability

Page 8: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Making Decisions Under Uncertainty

Page 9: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Probability Basics

Page 10: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Random Variables

Page 11: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Prior Probability

Page 12: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Conditional Probability

Page 13: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Inference by Enumeration

Page 14: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Inference by Enumeration

Page 15: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Bayes Rule

Product Rule:

P A B P A B P B

P A B P B A P A

|

|

Equating Sides: P B A

P A B P B

P A|

( | ) ( )

( )

P Class evidenceP evidence Class P Class

P evidence|

( | ) ( )

( )i.e.

All classification methods can be seen as estimates of Bayes’ Rule, with different techniques to estimate P(evidence|Class).

Page 16: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Inference by Enumeration

Page 17: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Simple Bayes Rule ExampleProbability your computer has a virus, V, = 1/1000.

If virused, the probability of a crash that day, C, = 4/5.

Probability your computer crashes in one day, C, = 1/10.

P(C|V)=0.8P(V)=1/1000P(C)=1/10

P V CP C V P V

P C( | )

( | ) ( )

( )

( . )( . )

( . ).

08 0 001

010 008

Even though a crash is a strong indicator of a virus, we expect only

8/1000 crashes to be caused by viruses.

Why not compute P(V|C) from direct evidence? Causal vs.

Diagnostic knowledge; (consider if P(C) suddenly drops).

Page 18: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Bayesian Classifiers

P Class evidenceP evidence Class P Class

P evidence|

( | ) ( )

( )

If we’re selecting the single most likely class, we only

need to find the class that maximizes P(e|Class)P(Class).

Hard part is estimating P(e|Class).

Evidence e typically consists of a set of observations:

E e e en( , ,..., )1 2

Usual simplifying assumption is conditional independence:

P e C P e Cii

n

( | ) ( | )

1

P C e

P C P e C

P e

ii

n

( | )( ) ( | )

( )

1

Page 19: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Bayesian Classifier ExampleProbability C=Virus C=Bad DiskP(C) 0.4 0.6P(crashes|C) 0.1 0.2P(diskfull|C) 0.6 0.1

Given a case where the disk is full and computer crashes,

the classifier chooses Virus as most likely since

(0.4)(0.1)(0.6) > (0.6)(0.2)(0.1).

Page 20: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Beyond Conditional Independence

• Include second-order dependencies; i.e. pairwise combination of variables via joint probabilities:

Linear Classifier: C1

C2

P e c P e c P e c2 1 11( | ) ( | )[ ( | )] Correction factor - Difficult to compute -

n

2

joint probabilities to consider

Page 21: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Belief Networks

• DAG that represents the dependencies between variables and specifies the joint probability distribution

• Random variables make up the nodes• Directed links represent causal direct influences• Each node has a conditional probability table

quantifying the effects from the parents• No directed cycles

Page 22: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Burglary Alarm Example

Burglary Earthquake

Alarm

John Calls Mary Calls

P(B)0.001

P(E)0.002

B E P(A)T T 0.95T F 0.94F T 0.29F F 0.001

A P(J)T 0.90F 0.05

A P(M)T 0.70F 0.01

Page 23: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Sample Bayesian Network

Page 24: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Using The Belief NetworkBurglary Earthquake

Alarm

John Calls Mary Calls

P(B)0.001

P(E)0.002

B E P(A)T T 0.95T F 0.94F T 0.29F F 0.001

A P(J)T 0.90F 0.05

A P(M)T 0.70F 0.01

P x x x P x Parents Xn i ii

n

( , ,... ) ( | ( ))1 21

Probability of alarm, no burglary or earthquake, both John and Mary call:

P J A P M A P A B E P B P E( | ) ( | ) ( | ) ( ) ( ) ( . )( . )( . )( . )( . ) .0 9 0 7 0 001 0 999 0 998 0 00062

Page 25: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Belief Computations• Two types; both are NP-Hard• Belief Revision

– Model explanatory/diagnostic tasks– Given evidence, what is the most likely hypothesis to

explain the evidence?– Also called abductive reasoning

• Belief Updating– Queries– Given evidence, what is the probability of some other

random variable occurring?

Page 26: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Belief Revision• Given some evidence variables, find the state of all other

variables that maximize the probability.• E.g.: We know John Calls, but not Mary. What is the most likely

state? Only consider assignments where J=T and M=F, and maximize. Best:

049.0)99.0)(05.0)(999.0)(998.0)(999.0(

)|()|()|()()(

AMPAJPEBAPEPBP

Page 27: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Belief Updating

• Causal Inferences

• Diagnostic Inferences

• Intercausal Inferences

• Mixed Inferences

Q E

Q

E

E EQ

E Q

Page 28: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Causal InferencesInference from cause to effect.

E.g. Given a burglary, what is P(J|B)?

85.0

)05.0)(06.0()9.0)(94.0()|(

)05.0)(()9.0)(()|(

94.0)|(

)95.0)(002.0(1)94.0)(998.0(1)|(

)95.0)(()()94.0)(()()|(

?)|(

BJP

APAPBJP

BAP

BAP

EPBPEPBPBAP

BJP

P(M|B)=0.67 via similar calculations

Burglary Earthquake

Alarm

John Calls Mary Calls

P(B)0.001

P(E)0.002

B E P(A)T T 0.95T F 0.94F T 0.29F F 0.001

A P(J)T 0.90F 0.05

A P(M)T 0.70F 0.01

Page 29: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Diagnostic InferencesFrom effect to cause. E.g. Given that John calls, what is the P(burglary)?

)(

)()|()|(

JP

BPBJPJBP

002517.0)(

)001.0)(999.0)(998.0()94.0)(998.0)(001.0(

)29.0)(002.0)(999.0()95.0)(002.0)(001.0()(

)001.0)(()()94.0)(()(

)29.0)(()()95.0)(()()(

AP

AP

EPBPEPBP

EPBPEPBPAP

What is P(J)? Need P(A) first:

052.0)(

)05.0)(9975.0()9.0)(002517.0()(

)05.0)(()9.0)(()(

JP

JP

APAPJP 016.0)052.0(

)001.0)(85.0()|( JBP

Many false positives.

Page 30: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Intercausal InferencesExplaining Away Inferences.

Given an alarm, P(B|A)=0.37. But if we add the evidence that

earthquake is true, then P(B|A^E)=0.003.

Even though B and E are independent, the presence of

one may make the other more/less likely.

Page 31: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Mixed Inferences

Simultaneous intercausal and diagnostic inference.

E.g., if John calls and Earthquake is false:

017.0)^|(

03.0)^|(

EJBP

EJAP

Computing these values exactly is somewhat complicated.

Page 32: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Exact Computation - Polytree Algorithm

• Judea Pearl, 1982• Only works on singly-connected networks - at

most one undirected path between any two nodes. • Backward-chaining Message-passing algorithm for

computing posterior probabilities for query node X– Compute causal support for X, evidence variables

“above” X

– Compute evidential support for X, evidence variables “below” X

Page 33: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Polytree Computation

U(1) U(m)

X

Z(1,j) Z(n,j)

Y(1)

Y(n)

...

...

xE

xE

zj jyzijijjiiy

i yix

u ixuiix

xx

iEzPzXyPyEPXEP

EUPuXPEXP

XEPEXPEXP

)|(),|()|()|(

)|()|()|(

)|()|()|(

\

\

Algorithm recursive, message

passing chain

Page 34: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Other Query Methods• Exact Algorithms

– Clustering• Cluster nodes to make single cluster, message-pass along that cluster

– Symbolic Probabilistic Inference• Uses d-separation to find expressions to combine

• Approximate Algorithms– Select sampling distribution, conduct trials sampling from

root to evidence nodes, accumulating weight for each node. Still tractable for dense networks.

• Forward Simulation• Stochastic Simulation

Page 35: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

Summary• Bayesian methods provide sound theory and

framework for implementation of classifiers• Bayesian networks a natural way to represent

conditional independence information. Qualitative info in links, quantitative in tables.

• NP-complete or NP-hard to compute exact values; typical to make simplifying assumptions or approximate methods.

• Many Bayesian tools and systems exist

Page 36: Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.

References

• Russel, S. and Norvig, P. (1995). Artificial Intelligence, A Modern Approach. Prentice Hall.

• Weiss, S. and Kulikowski, C. (1991). Computer Systems That Learn. Morgan Kaufman.

• Heckerman, D. (1996). A Tutorial on Learning with Bayesian Networks. Microsoft Technical Report MSR-TR-95-06.

• Internet Resources on Bayesian Networks and Machine Learning: http://www.cs.orst.edu/~wangxi/resource.html