Lecture9 - Bayesian-Decision-Theory
-
Upload
albert-orriols-puig -
Category
Education
-
view
6.095 -
download
0
Transcript of Lecture9 - Bayesian-Decision-Theory
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 9Lecture 9Bayesian decision theory – An introduction
Albert Orriols i Puigi l @ ll l [email protected]
Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q
Universitat Ramon Llull
Recap of Lecture 5-8
LET’S START WITH DATA CLASSIFICATIONCLASSIFICATION
Slide 2Artificial Intelligence Machine Learning
Recap of Lectures 5-8We want to build decision trees
How can I automatically generate these typesgenerate these types of trees?
Decide which attribute weDecide which attribute weshould put in each node
Decide a split pointDecide a split point
Rely on information theory
We also saw many other improvements
Slide 3Artificial Intelligence Machine Learning
Recap of Lecture 5-8From kNN to CBR
15-NN 1-NN
Key aspectsValue of kValue of k
Distance functions
Slide 4Artificial Intelligence Machine Learning
Today’s Agenda
Could we use probability to classify?p y yWhere all beganSome anecdotes on the correct use of
b bilitiprobabilities
Slide 5Artificial Intelligence Introduction to C++
Why Bother about Prob.?
The world is a very uncertain place
Almost 40 years of AI and ML dealing with uncertain domains
Some researchers decided to employ ideas from probability to model concepts
Before saying more let’s go to the beginningBefore saying more… let s go to the beginning
Slide 6Artificial Intelligence Machine Learning
Meeting the Reverend Thomas Bayes
Two main works:Divine Benevolence or an Attempt toDivine Benevolence, or an Attempt toProve That the Principal End of the DivineProvidence and Government is theH i f Hi C t (1731)Happiness of His Creatures (1731)
An Introduction to the Doctrine of Fluxions,and a Defence of the Mathematiciansand a Defence of the MathematiciansAgainst the Objections of the Author of theAnalyst (published anonymously in 1736)
But we are especially interested in:But we are especially interested in:Essay Towards Solving a Problem in the Doctrine of Chances (1764)which was actually published posthumously by Richard Price
Slide 7
y p p y y
Artificial Intelligence Machine Learning
Where These Ideas Came From?
Bayes build his theory upon several ideasy y p
Immanuel Kant (1724-1804)Copernican revolution: our understandingCopernican revolution: our understanding of the external world had its foundations not merely in experience, but in both experience
d i i t th ff iand a priori concepts, thus offering a non-empiricist critique of rationalist philosophy
Isaac Newton (1643-1727)Universal gravitation
three laws of motion which dominated the scientific view of the physical universe for the next three centuries
Slide 8Artificial Intelligence Machine Learning
What Was Bayes’ PointBayesian probabilityy p y
Notion of probability interpreted as partial belief rather than as frequency
Bayesian estimation
Calculate the validity of a propositionCalculate the validity of a proposition
On the basis of a prior estimate of its probability and new relevant evidencerelevant evidence
E.g.:B f B f d b bilitBefore Bayes, forward probability
given a specified number of white and black balls in an urn, what is the probability of drawing a black ball?p y g
Bayes turned its attention to the converse problemgiven that one or more balls have been drawn, what can be said
Slide 9
about the number of white and black balls in the urn?
Artificial Intelligence Machine Learning
Bayes’ TheoremOutputs the most probable hypothesis h∈H, given the data D + knowledge about prior probabilities of hypotheses in H
Terminology:
P(h|D): probability that h holds given data D. Posterior probability of h; confidence that h holds given D.
P(h): prior probability of h (background knowledge we have about that h is a correct hypothesis)
P(D): prior probability that training data D will be observedP(D): prior probability that training data D will be observed
P(D|h): probability of observing D given h holds
)()()|()|(
DPhPhDPDhP =
)()|(
DP
Slide 10Artificial Intelligence Machine Learning
Bayes’ Theorem
Given H the space of possible hypothesis
Th b bl h h i i h h i i P(h|D)The most probable hypothesis is the one that maximizes P(h|D):
)()|(maxarg)(
)()|(maxarg)|(maxarg hPhDPDP
hPhDPDhPhHh
MAP ==≡∈
Slide 11Artificial Intelligence Machine Learning
Is the Pope the Pope?The chances that a randomly chosen human being is the Pope y g pare about 1 in 6 billion
Benedict XVI is the Pope p
What are the chances that Benedict XVI is human? (Beck Bornholdt and Dubben 1996)(Beck-Bornholdt and Dubben, 1996)
Slide 12
Analogy to syllogistic reasoning: 1 in 6 billion
Artificial Intelligence Machine Learning
So, Is the Pope an Alien?Where is the trick?
Probability of the data given ahypothesis H: P(D|H)ypo es s ( | )
Probability of the hypothesis given the data: P(H|D)g e e da a ( | )
P(D|H) is different from P(H|D)
S i th P A li ?So, is the Pope An alien?Probability of being an alien P(A)
Probability of being human P(H)
Probability that the pope is an alienProbability that the pope is an alien
)()|()()|()()|()|(
AliPAliPPHPHPPAlienPAlienPopePPopeAlienP =
Slide 13Artificial Intelligence Machine Learning
)()|()()|()|(
AlienPAlienPopePHumanPHumanPopePp
+
So, Is the Pope an Alien?What’s missing?g
P(Pope|Alien)
P(H )P(Human)
P(Alien)
ConsideringConsideringLow values of P(Alien) and P(Pope|Alien)
f ( )And large values of P(Human)
We could “probably” say that the pope is not an alien!
Slide 14Artificial Intelligence Machine Learning
More examples: Monty HallStick or switch
Slide 15Artificial Intelligence Machine Learning
Stick or SwitchI chose door number 3
Door 2 is uncovered and contains a sheepa d co a s a s eep
They give me the chance to change the door Should I?Should I?
Use probability, not faith,to give an answer!to give an answer!
Slide 16Artificial Intelligence Machine Learning
Stick or Switch
I should switch!Slide 17
I should switch!Artificial Intelligence Machine Learning
Yet Another Example: The Defendant’s Fallacy
The history of a murderA hA suspect was caught
DNA test was positive
DNA test fails only 1 over 1 million times
So, my suspect must be guilty, right?More specifically, it will be guilty with p = 0.999999. Agree?
Slide 18Artificial Intelligence Machine Learning
The Defendant’s FallacyWhere is the trick now?
P(coincides | innocent) as opposed to P(innocent|coincides)P(coincides | innocent) commonly misused as the probabilityP(coincides | innocent) commonly misused as the probability of being innocent P(innocent | coincides) is the probability of being guilty ( | ) p y g g yhaving that the test was positive!
Does this really matter?L t’ it f 10 illi i h bit tLet’s assume a city of 10 million inhabitants
We apply the test to all the 10 million inhabitants
How many of them will be positive?10
Slide 19Artificial Intelligence Machine Learning
The Defendant’s FallacyTwo argumentsg
The prosecutor: There is 0.000001 that the suspect is innocent
Th d f d t I thi it f 10M l th b bilit f thThe defendant: In this city of 10M people, the probability of the suspect being innocent is approximately 90%
Who is right?Th d f d tThe defendant
Prove for that? You do the math
Slide 20Artificial Intelligence Machine Learning
Next Class
How we can use these concepts in machine learning
Slide 21Artificial Intelligence Introduction to C++
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 9Lecture 9Bayesian decision theory – An introduction
Albert Orriols i Puigi l @ ll l [email protected]
Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q
Universitat Ramon Llull