Lecture9 - Bayesian-Decision-Theory

22
Introduction to Machine Introduction to Machine Learning Learning Lecture 9 Lecture 9 Bayesian decision theory – An introduction Albert Orriols i Puig il@ ll l d aorriols@salle.url.edu Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle Universitat Ramon Llull

Transcript of Lecture9 - Bayesian-Decision-Theory

Page 1: Lecture9 - Bayesian-Decision-Theory

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 9Lecture 9Bayesian decision theory – An introduction

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull

Page 2: Lecture9 - Bayesian-Decision-Theory

Recap of Lecture 5-8

LET’S START WITH DATA CLASSIFICATIONCLASSIFICATION

Slide 2Artificial Intelligence Machine Learning

Page 3: Lecture9 - Bayesian-Decision-Theory

Recap of Lectures 5-8We want to build decision trees

How can I automatically generate these typesgenerate these types of trees?

Decide which attribute weDecide which attribute weshould put in each node

Decide a split pointDecide a split point

Rely on information theory

We also saw many other improvements

Slide 3Artificial Intelligence Machine Learning

Page 4: Lecture9 - Bayesian-Decision-Theory

Recap of Lecture 5-8From kNN to CBR

15-NN 1-NN

Key aspectsValue of kValue of k

Distance functions

Slide 4Artificial Intelligence Machine Learning

Page 5: Lecture9 - Bayesian-Decision-Theory

Today’s Agenda

Could we use probability to classify?p y yWhere all beganSome anecdotes on the correct use of

b bilitiprobabilities

Slide 5Artificial Intelligence Introduction to C++

Page 6: Lecture9 - Bayesian-Decision-Theory

Why Bother about Prob.?

The world is a very uncertain place

Almost 40 years of AI and ML dealing with uncertain domains

Some researchers decided to employ ideas from probability to model concepts

Before saying more let’s go to the beginningBefore saying more… let s go to the beginning

Slide 6Artificial Intelligence Machine Learning

Page 7: Lecture9 - Bayesian-Decision-Theory

Meeting the Reverend Thomas Bayes

Two main works:Divine Benevolence or an Attempt toDivine Benevolence, or an Attempt toProve That the Principal End of the DivineProvidence and Government is theH i f Hi C t (1731)Happiness of His Creatures (1731)

An Introduction to the Doctrine of Fluxions,and a Defence of the Mathematiciansand a Defence of the MathematiciansAgainst the Objections of the Author of theAnalyst (published anonymously in 1736)

But we are especially interested in:But we are especially interested in:Essay Towards Solving a Problem in the Doctrine of Chances (1764)which was actually published posthumously by Richard Price

Slide 7

y p p y y

Artificial Intelligence Machine Learning

Page 8: Lecture9 - Bayesian-Decision-Theory

Where These Ideas Came From?

Bayes build his theory upon several ideasy y p

Immanuel Kant (1724-1804)Copernican revolution: our understandingCopernican revolution: our understanding of the external world had its foundations not merely in experience, but in both experience

d i i t th ff iand a priori concepts, thus offering a non-empiricist critique of rationalist philosophy

Isaac Newton (1643-1727)Universal gravitation

three laws of motion which dominated the scientific view of the physical universe for the next three centuries

Slide 8Artificial Intelligence Machine Learning

Page 9: Lecture9 - Bayesian-Decision-Theory

What Was Bayes’ PointBayesian probabilityy p y

Notion of probability interpreted as partial belief rather than as frequency

Bayesian estimation

Calculate the validity of a propositionCalculate the validity of a proposition

On the basis of a prior estimate of its probability and new relevant evidencerelevant evidence

E.g.:B f B f d b bilitBefore Bayes, forward probability

given a specified number of white and black balls in an urn, what is the probability of drawing a black ball?p y g

Bayes turned its attention to the converse problemgiven that one or more balls have been drawn, what can be said

Slide 9

about the number of white and black balls in the urn?

Artificial Intelligence Machine Learning

Page 10: Lecture9 - Bayesian-Decision-Theory

Bayes’ TheoremOutputs the most probable hypothesis h∈H, given the data D + knowledge about prior probabilities of hypotheses in H

Terminology:

P(h|D): probability that h holds given data D. Posterior probability of h; confidence that h holds given D.

P(h): prior probability of h (background knowledge we have about that h is a correct hypothesis)

P(D): prior probability that training data D will be observedP(D): prior probability that training data D will be observed

P(D|h): probability of observing D given h holds

)()()|()|(

DPhPhDPDhP =

)()|(

DP

Slide 10Artificial Intelligence Machine Learning

Page 11: Lecture9 - Bayesian-Decision-Theory

Bayes’ Theorem

Given H the space of possible hypothesis

Th b bl h h i i h h i i P(h|D)The most probable hypothesis is the one that maximizes P(h|D):

)()|(maxarg)(

)()|(maxarg)|(maxarg hPhDPDP

hPhDPDhPhHh

MAP ==≡∈

Slide 11Artificial Intelligence Machine Learning

Page 12: Lecture9 - Bayesian-Decision-Theory

Is the Pope the Pope?The chances that a randomly chosen human being is the Pope y g pare about 1 in 6 billion

Benedict XVI is the Pope p

What are the chances that Benedict XVI is human? (Beck Bornholdt and Dubben 1996)(Beck-Bornholdt and Dubben, 1996)

Slide 12

Analogy to syllogistic reasoning: 1 in 6 billion

Artificial Intelligence Machine Learning

Page 13: Lecture9 - Bayesian-Decision-Theory

So, Is the Pope an Alien?Where is the trick?

Probability of the data given ahypothesis H: P(D|H)ypo es s ( | )

Probability of the hypothesis given the data: P(H|D)g e e da a ( | )

P(D|H) is different from P(H|D)

S i th P A li ?So, is the Pope An alien?Probability of being an alien P(A)

Probability of being human P(H)

Probability that the pope is an alienProbability that the pope is an alien

)()|()()|()()|()|(

AliPAliPPHPHPPAlienPAlienPopePPopeAlienP =

Slide 13Artificial Intelligence Machine Learning

)()|()()|()|(

AlienPAlienPopePHumanPHumanPopePp

+

Page 14: Lecture9 - Bayesian-Decision-Theory

So, Is the Pope an Alien?What’s missing?g

P(Pope|Alien)

P(H )P(Human)

P(Alien)

ConsideringConsideringLow values of P(Alien) and P(Pope|Alien)

f ( )And large values of P(Human)

We could “probably” say that the pope is not an alien!

Slide 14Artificial Intelligence Machine Learning

Page 15: Lecture9 - Bayesian-Decision-Theory

More examples: Monty HallStick or switch

Slide 15Artificial Intelligence Machine Learning

Page 16: Lecture9 - Bayesian-Decision-Theory

Stick or SwitchI chose door number 3

Door 2 is uncovered and contains a sheepa d co a s a s eep

They give me the chance to change the door Should I?Should I?

Use probability, not faith,to give an answer!to give an answer!

Slide 16Artificial Intelligence Machine Learning

Page 17: Lecture9 - Bayesian-Decision-Theory

Stick or Switch

I should switch!Slide 17

I should switch!Artificial Intelligence Machine Learning

Page 18: Lecture9 - Bayesian-Decision-Theory

Yet Another Example: The Defendant’s Fallacy

The history of a murderA hA suspect was caught

DNA test was positive

DNA test fails only 1 over 1 million times

So, my suspect must be guilty, right?More specifically, it will be guilty with p = 0.999999. Agree?

Slide 18Artificial Intelligence Machine Learning

Page 19: Lecture9 - Bayesian-Decision-Theory

The Defendant’s FallacyWhere is the trick now?

P(coincides | innocent) as opposed to P(innocent|coincides)P(coincides | innocent) commonly misused as the probabilityP(coincides | innocent) commonly misused as the probability of being innocent P(innocent | coincides) is the probability of being guilty ( | ) p y g g yhaving that the test was positive!

Does this really matter?L t’ it f 10 illi i h bit tLet’s assume a city of 10 million inhabitants

We apply the test to all the 10 million inhabitants

How many of them will be positive?10

Slide 19Artificial Intelligence Machine Learning

Page 20: Lecture9 - Bayesian-Decision-Theory

The Defendant’s FallacyTwo argumentsg

The prosecutor: There is 0.000001 that the suspect is innocent

Th d f d t I thi it f 10M l th b bilit f thThe defendant: In this city of 10M people, the probability of the suspect being innocent is approximately 90%

Who is right?Th d f d tThe defendant

Prove for that? You do the math

Slide 20Artificial Intelligence Machine Learning

Page 21: Lecture9 - Bayesian-Decision-Theory

Next Class

How we can use these concepts in machine learning

Slide 21Artificial Intelligence Introduction to C++

Page 22: Lecture9 - Bayesian-Decision-Theory

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 9Lecture 9Bayesian decision theory – An introduction

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull