Lecture9 - Bayesian-Decision-Theory

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 9Lecture 9Bayesian decision theory – An introduction

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull

Recap of Lecture 5-8

LET’S START WITH DATA CLASSIFICATIONCLASSIFICATION

Slide 2Artificial Intelligence Machine Learning

Recap of Lectures 5-8We want to build decision trees

How can I automatically generate these typesgenerate these types of trees?

Decide which attribute weDecide which attribute weshould put in each node

Decide a split pointDecide a split point

Rely on information theory

We also saw many other improvements


Recap of Lecture 5-8From kNN to CBR

15-NN 1-NN

Key aspectsValue of kValue of k

Distance functions


Today’s Agenda

Could we use probability to classify?p y yWhere all beganSome anecdotes on the correct use of

b bilitiprobabilities

Slide 5Artificial Intelligence Introduction to C++

Why Bother about Prob.?

The world is a very uncertain place

Almost 40 years of AI and ML dealing with uncertain domains

Some researchers decided to employ ideas from probability to model concepts

Before saying more let’s go to the beginningBefore saying more… let s go to the beginning


Meeting the Reverend Thomas Bayes

Two main works:Divine Benevolence or an Attempt toDivine Benevolence, or an Attempt toProve That the Principal End of the DivineProvidence and Government is theH i f Hi C t (1731)Happiness of His Creatures (1731)

An Introduction to the Doctrine of Fluxions,and a Defence of the Mathematiciansand a Defence of the MathematiciansAgainst the Objections of the Author of theAnalyst (published anonymously in 1736)

But we are especially interested in:But we are especially interested in:Essay Towards Solving a Problem in the Doctrine of Chances (1764)which was actually published posthumously by Richard Price

Slide 7

y p p y y

Artificial Intelligence Machine Learning

Where These Ideas Came From?

Bayes build his theory upon several ideasy y p

Immanuel Kant (1724-1804)Copernican revolution: our understandingCopernican revolution: our understanding of the external world had its foundations not merely in experience, but in both experience

d i i t th ff iand a priori concepts, thus offering a non-empiricist critique of rationalist philosophy

Isaac Newton (1643-1727)Universal gravitation

three laws of motion which dominated the scientific view of the physical universe for the next three centuries


What Was Bayes’ PointBayesian probabilityy p y

Notion of probability interpreted as partial belief rather than as frequency

Bayesian estimation

Calculate the validity of a propositionCalculate the validity of a proposition

On the basis of a prior estimate of its probability and new relevant evidencerelevant evidence

E.g.:B f B f d b bilitBefore Bayes, forward probability

given a specified number of white and black balls in an urn, what is the probability of drawing a black ball?p y g

Bayes turned its attention to the converse problemgiven that one or more balls have been drawn, what can be said

Slide 9

about the number of white and black balls in the urn?


Bayes’ TheoremOutputs the most probable hypothesis h∈H, given the data D + knowledge about prior probabilities of hypotheses in H

Terminology:

P(h|D): probability that h holds given data D. Posterior probability of h; confidence that h holds given D.

P(h): prior probability of h (background knowledge we have about that h is a correct hypothesis)

P(D): prior probability that training data D will be observedP(D): prior probability that training data D will be observed

P(D|h): probability of observing D given h holds

)()()|()|(

DPhPhDPDhP =

)()|(

DP


Is the Pope the Pope?The chances that a randomly chosen human being is the Pope y g pare about 1 in 6 billion

Benedict XVI is the Pope p

What are the chances that Benedict XVI is human? (Beck Bornholdt and Dubben 1996)(Beck-Bornholdt and Dubben, 1996)

Slide 12

Analogy to syllogistic reasoning: 1 in 6 billion


So, Is the Pope an Alien?Where is the trick?

Probability of the data given ahypothesis H: P(D|H)ypo es s ( | )

Probability of the hypothesis given the data: P(H|D)g e e da a ( | )

P(D|H) is different from P(H|D)

S i th P A li ?So, is the Pope An alien?Probability of being an alien P(A)

Probability of being human P(H)

Probability that the pope is an alienProbability that the pope is an alien

)()|()()|()()|()|(

AliPAliPPHPHPPAlienPAlienPopePPopeAlienP =


)()|()()|()|(

AlienPAlienPopePHumanPHumanPopePp

+

So, Is the Pope an Alien?What’s missing?g

P(Pope|Alien)

P(H )P(Human)

P(Alien)

ConsideringConsideringLow values of P(Alien) and P(Pope|Alien)

f ( )And large values of P(Human)

We could “probably” say that the pope is not an alien!


More examples: Monty HallStick or switch


Stick or SwitchI chose door number 3

Door 2 is uncovered and contains a sheepa d co a s a s eep

They give me the chance to change the door Should I?Should I?

Use probability, not faith,to give an answer!to give an answer!


Stick or Switch

I should switch!Slide 17

I should switch!Artificial Intelligence Machine Learning

Yet Another Example: The Defendant’s Fallacy

The history of a murderA hA suspect was caught

DNA test was positive

DNA test fails only 1 over 1 million times

So, my suspect must be guilty, right?More specifically, it will be guilty with p = 0.999999. Agree?


The Defendant’s FallacyWhere is the trick now?

P(coincides | innocent) as opposed to P(innocent|coincides)P(coincides | innocent) commonly misused as the probabilityP(coincides | innocent) commonly misused as the probability of being innocent P(innocent | coincides) is the probability of being guilty ( | ) p y g g yhaving that the test was positive!

Does this really matter?L t’ it f 10 illi i h bit tLet’s assume a city of 10 million inhabitants

We apply the test to all the 10 million inhabitants

How many of them will be positive?10


The Defendant’s FallacyTwo argumentsg

The prosecutor: There is 0.000001 that the suspect is innocent

Th d f d t I thi it f 10M l th b bilit f thThe defendant: In this city of 10M people, the probability of the suspect being innocent is approximately 90%

Who is right?Th d f d tThe defendant

Prove for that? You do the math


Next Class

How we can use these concepts in machine learning

Slide 21Artificial Intelligence Introduction to C++

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 9Lecture 9Bayesian decision theory – An introduction

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull

Lecture9 - Bayesian-Decision-Theory

Education

Transcript of Lecture9 - Bayesian-Decision-Theory