Belief Networks & Bayesian Classification

37
AN INTRODUCTION TO BAYESIAN BELIEF NETWORKS AND NAÏVE BAYESIAN CLASSIFICATION ADNAN MASOOD SCIS.NOVA.EDU/~ADNAN [email protected] Belief Networks & Bayesian Classification

description

An Introduction to Bayesian Belief Networks and Naïve Bayesian Classification

Transcript of Belief Networks & Bayesian Classification

Page 1: Belief Networks & Bayesian Classification

AN INTRODUCTION TO BAYESIAN BELIEF NETWORKS AND NAÏVE BAYESIAN

CLASSIFICATION

ADNAN MASOODSCIS.NOVA.EDU/~ADNAN

[email protected]

Belief Networks & Bayesian Classification

Page 2: Belief Networks & Bayesian Classification

Overview

Probability and Uncertainty Probability Notation Bayesian Statistics Notation of Probability Axioms of Probability Probability Table Bayesian Belief Network Joint Probability Table Probability of Disjunctions Conditional Probability Conditional Independence Bayes' Rule Classification with Bayes rule Bayesian Classification Conclusion & Further Reading

Page 3: Belief Networks & Bayesian Classification

Probability and Uncertainty

Probability provide a way of summarizing the uncertainty. 60% chance of rain today 85% chance of alarm in case of a burglary

Probability is calculated based upon past performance, or degree of belief.

Page 4: Belief Networks & Bayesian Classification

Bayesian Statistics

Three approaches to Probability Axiomatic

Probability by definition and properties Relative Frequency

Repeated trials Degree of belief (subjective)

Personal measure of uncertaintyExamples

The chance that a meteor strikes earth is 1% The probability of rain today is 30% The chance of getting an A on the exam is 50%

Page 5: Belief Networks & Bayesian Classification

Notation of Probability

Random Variables (RV): are capitalized (usually) e.g. Sky, RoadCurvature, Temperature Refer to attributes of the world whose “status” is unknown have one and only value at a time Have a domain of values that are possible states of the world:

Boolean: domain abbreviated as abbreviated as

Discrete: domain is countable (includes Boolean)

Values are exhaustive and mutually exclusive e.g. Sky domain = abbreviated as also abbrv. as

Continuous: domain is real numbers

Page 6: Belief Networks & Bayesian Classification

Notation of Probability

Uncertainty is represented by: or simply this is: degree of belief that variable takes on value given no

other information relating to a single probability called an unconditional or prior

probability

Properties of

Sum over all values in the domain of variable X is 1 because domain is exhaustive and mutually exclusive.

Page 7: Belief Networks & Bayesian Classification

Axioms of Probability

S – Sample Space (set of possible outcomes)E – Some Event (some subset of outcomes)Axioms

For any sequence of mutually exclusive events, , , … …

Page 8: Belief Networks & Bayesian Classification

Probability Table

P(Weather= sunny)=P(sunny)=5/13P(Weather)={5/14, 4/14, 5/14}Calculate probabilities from data

sunny overcast

rainy

5/14 4/14 5/14

Outlook

Page 9: Belief Networks & Bayesian Classification

An expert built belief network using weather dataset(Mitchell; Witten & Frank)

Bayesian inference can help answer questions like probability of game play if

a. Outlook=sunny, Temperature=cool, Humidity=high, Wind=strong

b. Outlook=overcast, Temperature=cool, Humidity=high, Wind=strong

Page 10: Belief Networks & Bayesian Classification

Bayesian Belief Network

Bayesian belief network allows a subset of the variables conditionally independent

A graphical model of causal relationshipsSeveral cases of learning Bayesian belief

networks• Given both network structure and all the variables:

easy• Given network structure but only some variables• When the network structure is not known in

advance

Page 11: Belief Networks & Bayesian Classification

Bayesian Belief Network

FamilyHistory

Smoker

Lung Cancer Emphysema

Positive X Ray Dyspnea

LC 0.8 0.5 0.7 0.1

~LC 0.2 0.5 0.3 0.9

(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)

Bayesian Belief Network

The conditional probability tablefor the variable Lung Cancer

Page 12: Belief Networks & Bayesian Classification

A Hypothesis for playing tennis

Page 13: Belief Networks & Bayesian Classification

Joint Probability Table

P(Outlook= sunny, Temperature=hot)=P(sunny, hot)=2/14

P(Temperature=hot)=P(hot)=2/14+2/14+0/14=4/14

With N random variables that can take k values the full joint probability table size is

2/14 2/14 0/14

2/14 1/14 3/14

1/14 1/14 2/14

Outlook

Sunny overcast rainyHot

mild

cool

Temperature

Page 14: Belief Networks & Bayesian Classification

Example: Calculating Global Probabilistic Beliefs

P(PlayTennis) = 9/14 = 0.64P(~PlayTennis) = 5/14 = 0.36P(Outlook=sunny|PlayTennis) = 2/9 = 0.22P(Outlook=sunny|~PlayTennis) = 3/5 = 0.60P(Temperature=cool|PlayTennis) = 3/9 =

0.33P(Temperature=cool|~PlayTennis) = 1/5

= .20P(Humidity=high|PlayTennis) = 3/9 = 0.33P(Humidity=high|~PlayTennis) = 4/5 = 0.80P(Wind=strong|PlayTennis) = 3/9 = 0.33P(Wind=strong|~PlayTennis) = 3/5 = 0.60

Page 15: Belief Networks & Bayesian Classification

Probability of Disjunctions

Page 16: Belief Networks & Bayesian Classification

Conditional Probability

Probabilities discussed so far are called prior probabilities or unconditional probabilities Probabilities depend only on the data, not on any other

variableBut what if you have some evidence or knowledge

about the situation? You know have a toothache. Now what is the probability of having a cavity?

Page 17: Belief Networks & Bayesian Classification

Conditional Probability

Calculate conditional probabilities from data as follows:

What is

Page 18: Belief Networks & Bayesian Classification

Conditional Probability

You can think of P(B) as just a normalization constant to make P(A|B) adds up to 1.

Product rule: P(A, B)=P(A|B)P(B)=P(B|A)P(A)Chain Rules is successive applications of product rule:

¿

Page 19: Belief Networks & Bayesian Classification

Conditional Independence

What if I know Weather=cloudy today. Now what is the P(cavity)?

If knowing some evidence doesn’t change the probability of some other random variable then we say the two random variables are independent

A and B are independent if P(A|B)=P(A).Other ways of seeing this (all are equivalent):

Absolute Independence is powerful but rare!

Page 20: Belief Networks & Bayesian Classification

The independence hypothesis…

… makes computation possible… yields optimal classifiers when satisfied… but is seldom satisfied in practice, as

attributes (variables) are often correlated.Attempts to overcome this limitation:• Bayesian networks, that combine Bayesian

reasoning with causal relationships between attributes

• Decision trees, that reason on one attribute at the time, considering most important attributes first

Page 21: Belief Networks & Bayesian Classification

Conditional Independence

P(Toothache, Cavity, Catch) has independent entries If I have a cavity, the probability that the probe catches in it

doesn’t depend on whether I have a toothache:

The same independent holds if I haven’t got a cavity:(

Catch is conditionally independent of toothache given cavity:

Equivalent statements:

Page 22: Belief Networks & Bayesian Classification

Bayes’ Rule

Remember Conditional Probabilities: P(A|B)=P(A,B)/P(B) P(B)P(A|B)=P(A.B) P(B|A)=P(B,A)/P(A) P(A)P(B|A)=P(B,A) P(B,A)=P(A,B) P(B)P(A|B)=P(A)P(B|A)

Bayes’ Rule: P(A|B)=P(B|A)P(A)/P(B)

Page 23: Belief Networks & Bayesian Classification

Bayes’ Rule

A more general form is:

Bayes’ rule allows you to conditional probabilities on their head: Useful for assessing diagnostics probability from causal probability

E.g., let M be meningitis, S be stiff neck:=0.80.0001/0.1=0.0008

Note posterior probability of meningitis still very small!

Page 24: Belief Networks & Bayesian Classification

Classification with Bayes Rule

Naïve Bayes Model

Classify with highest probability One of the most widely used classifiers Very Fast to train and to classify

• One pass over all data to train• One lookup for each features/ class combination to classify

Assuming the features are independent given the class (conditional independence)

Page 25: Belief Networks & Bayesian Classification

Naïve Bayes Classifier

Simplified assumption: features are conditionally independent:

Page 26: Belief Networks & Bayesian Classification

Bayesian Classification: Why?

Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems

Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data.

Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities

Benchmark: Even if Bayesian methods are computationally intractable, they can provide a benchmark for other algorithms

Page 27: Belief Networks & Bayesian Classification

Classification with Bayes Rule

Courtesy, Simafore - http://www.simafore.com/blog/bid/100934/Beware-of-2-facts-when-using-Naive-Bayes-classification-for-analytics

Page 28: Belief Networks & Bayesian Classification

Issues with naïve Bayes

Change in Classifier Data (on the fly, during classification)

Conditional independence assumption is violated Consider the task of classifying whether or not a certain

word is corporation name E.g. “Google,” “Microsoft,”” “IBM,” and “ACME”

Two useful features we might want to use are capitalized, and all-capitals

Native Bayes will assume that these two features are independent given the class, but this clearly isn’t the case (things that are all-caps must also be capitalized )!!

However naïve Bayes seems to work well in practice even when this assumption is violated

Page 29: Belief Networks & Bayesian Classification

Naïve Bayes Classifier

Page 30: Belief Networks & Bayesian Classification

Naive Bayesian Classifier

Given a training set, we can compute the probabilities

Outlook P N

Sunny 2/9 3/5

Overcast 4/9 0

rain 3/9 2/5

Temperature

Hot 2/9 2/5

Mild 4/9 2/5

cool 3/9 1/5

Humidity

P N

High 3/9 4/5

normal 6/9 1/5

Windy

true 3/9 3/5

false 6/9 2/5

Page 31: Belief Networks & Bayesian Classification

Bayesian classification

The classification problem may be formalized using a-posteriori probabilities :

probability that the sample tuple ,……., is of class C

E.g., P(class=N | outlook=sunny, windy=true,…)

Idea: assign sample X class label C such that is maximal

Page 32: Belief Networks & Bayesian Classification

Estimating a-posteriori probabilities

Bayes theorem: /

is constant for all classes relative frequency of class C samplesC such that is maximum =C such that is maximumProblem: computing P(X|C) is infeasible!

Page 33: Belief Networks & Bayesian Classification

Naïve Bayesian Classification

Naïve assumption: attribute independence

If i-th attribute is categorical: is estimated as the relative freq of samples having value as i-th attribute in class C

If i-th attribute is continuous: is estimated thru a Gaussian density function

Computationally easy in both cases

Page 34: Belief Networks & Bayesian Classification

Play-tennis example: estimating

P(p) = 9/14

P(n) = 5/14

outlook

P(sunny|p) = 2/9 P(sunny|n) = 3/5

P(overcast|p) =4/9 P(overcast|n) = 0

P(rain|p) = 3/9 P(rain|n) = 2/5

temperature

P(hot|p) = 2/9 P(hot|n) = 2/5

P(mild|p) = 4/9 P(mild|n) = 2/5

P(cool|p) = 3/9 P(cool|n) = 1/5

humidity

P(high|p) = 3/9 P(high|n) = 4/5

P(normal|p) = 6/9 P(normal|n) = 2/5

windy

P(true|p) = 3/9 P(true|n) = 3/5

P(false|p) = 6/9 P(false|n) = 2/5

Page 35: Belief Networks & Bayesian Classification

Play Tennis example

An unseen sample X = <rain, hot, high, false>

=

=Sample X is classified in class n (don’t play)

Page 36: Belief Networks & Bayesian Classification

Conclusion & Future Reading

ProbabilitiesJoint ProbabilitiesConditional ProbabilitiesIndependence, Conditional IndependenceNaïve Bayes Classifier