CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’...

37
CS1800 Probability 3: Bayes’ Rule Professor Kevin Gold

Transcript of CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’...

Page 1: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

CS1800 Probability 3: Bayes’ Rule

Professor Kevin Gold

Page 2: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Brief Review of Probability - Counting Outcomes and Axioms• Recall from before that we can calculate probabilities by counting “success”

outcomes and dividing by the total number of outcomes.

• Chance that a deck of 5 cards numbered 1-5 ends up in order after shuffling: 1/5! = 1/120

• Chance that it’s in order or reverse order: 2/120 = 1/60

• Much of the discussion today will just take some probabilities as a given, like “you think there’s a 1% chance your friend is lying.” This is fine too, even though we’re not counting anything, as long as we obey the axioms of probability:

• Number in range [0,1], generally representing our degree of belief

• Union of all outcomes has probability 1

• Pr(A v B) = Pr(A) + Pr(B) - Pr(A ^ B)

Page 3: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Brief Review of Probability - Independence

• Events are independent if learning the outcome of one, does not affect our belief about the other

• Two rolls of a die, events happening far from each other…

• The definition of independence also tells us something useful we can do: Pr(A ^ B) = Pr(A)Pr(B) for all outcomes of A and B iff A and B are independent

• Probability that two rolls of a 6-sided die are both 6:(1/6)*(1/6) = 1/36, same result as counting 1 success (6,6) out of 36 roll pairs

• Pr(6-sided die is even) = 1/2 and Pr(6-sided die is odd) = 1/2,but Pr(particular 6-sided die roll is both even and odd) = 0, so those events are not independent (if they’re about the same roll)

Page 4: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Brief Review of Probability - Conditional Probability

• Pr(A | B) is the probability that event A happens given that we know B is true

• Pr(rains today | dark clouds in the sky) > Pr(rains today)

• The definition is Pr(A | B) = Pr(A ^ B)/Pr(B)

• We’re shrinking the sample space to include just outcomes where B is true

• Pr(exactly 2 out of 3 coin flips heads | at least one head flipped) = (C(2,3)/2

3)/(1-(1/2)

3) = (3/8)/(7/8) = 3/7

• Which we could also get by observing actual at-least-one-H outcomes:TTH, THT, HTT, THH, HTH, THH, HHH

• From the definition, it follows that Pr(A ^ B) = Pr(A|B)Pr(B), which is a generalization of the multiplication rule used for independent events

• Pr (rains today ^ dark clouds) = Pr(dark clouds)Pr(rains today | dark clouds)

• We can calculate the probability of both events happening even though they aren’t independent, as long as we know the conditional probability

Page 5: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

The Uses of Bayes’ Rule• Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool

that allows you to decide which of several explanations for some data is best

• It explains how to combine:

• your prior degree of belief in each hypothesis (probabilities)

• some evidence

• a model of how likely it is that each hypothesis would produce the available evidence (conditional probabilities)

• And arrive at a posterior likelihood of each hypothesis, in light of the data

Page 6: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Specific Applications of Bayes’ Rule

Medical Diagnosis

• Evidence: Symptoms

• Hypotheses: Underlying diseases

Speech Recognition

• Evidence: Sound signal

• Hypotheses: Words

Page 7: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Specific Applications of Bayes’ Rule

Robot Navigation

• Evidence: Local imagesand depth camera results

• Hypotheses: Locations

• Evidence: Multiple polls

• Hypotheses: True underlying likely-voter counts

Political Polling

Page 8: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Bayes’ Rule in Lay Terms• The likelihood of the hypothesis in light of the evidence is proportional to two

factors:

• Our prior degree of belief in the hypothesis, before we obtained the evidence

• The likelihood that we would see this evidence if the hypothesis were true

• For each competing hypothesis, we can multiply these two factors and compare the results to determine which hypothesis is the most likely explanation.

Page 9: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Bayes’ Rule as a Formula• The likelihood of the hypothesis in light of the evidence is proportional to two

factors:

• Our prior degree of belief in the hypothesis, before we obtained the evidence

• The likelihood that we would see this evidence if the hypothesis were true

• For each competing hypothesis, we can multiply these two factors and compare the results to determine which hypothesis is the most likely explanation.

Pr(hypothesis | evidence) ∝

Pr(hypothesis)*Pr(evidence|hypothesis)

“proportional to”

prior likelihood of evidence

Page 10: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Bayes’ Rule as an Equality• The “proportional to” symbol means that the left side

is equal to the right, multiplied by some constant.

• This constant is the same for all hypotheses, which lets us compare the results of the prior*likelihood calculation across different hypotheses.

Pr(hypothesis | evidence) =

αPr(hypothesis)*Pr(evidence|hypothesis)

Page 11: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Bayes’ Rule as an Equality• We can be more precise about the equation and remove

the “propotional to” — we know what the scaling factor is (it’s 1/Pr(evidence))

• But for many practical purposes, we don’t know this factor at first, and we may not need to calculate it exactly

Pr(hypothesis | evidence) = Pr(hypothesis)*Pr(evidence|hypothesis)

Pr(evidence)

Page 12: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Bayes’ Rule as an Abstraction

• Lastly, we don’t particularly need to talk about hypotheses and evidence — Bayes’ Rule is true generally

• However, replacing everything with A’s and B’s makes it much less clear what we’re usually trying to do

Pr(A | B) = Pr(A)*Pr(B|A) Pr(B)

Page 13: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

A Quick Proof of the Abstract Version of Bayes’ Rule

• Bayes’ rule follows naturally from the definition of conditional probability.

• Pr(A ^ B) = Pr(A | B) Pr(B) = Pr(B | A)Pr(A) by the definition of conditional probability

• Drop the Pr(A ^ B) and divide both sides by Pr(B) to get Pr(A | B) = Pr(B | A)Pr(A) / Pr(B)(which is Bayes’ rule)

Page 14: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Rewind a Bit

• I think the best balance of being memorable and accurate is the second way:

Pr(hypothesis | evidence) ∝ Pr(hypothesis)*Pr(evidence|hypothesis)

the likelihood of the evidence (the first probability “flipped”)

the prior

is proportional tothe posterior

times

Page 15: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Using Bayes’ Rule: A Basic Example

• I just rolled a die that was drawn at random from a bag containing an equal number of 6-sided dice (numbered 1-6) and 20-sided dice (numbered 1-20). I tell you the die roll is a 6. Which hypothesis is more likely — that it’s 6-sided, or 20-sided?

• Pr(“6” | 6-sided die) = 1/6

• Pr(“6” | 20-sided die) = 1/20

• Pr(6-sided die | “6”) ∝Pr(“6” | 6-sided)Pr(6-sided) = 1/6 * 1/2 = 1/12

• Pr(20-sided | “6”) ∝Pr(“6” | 20-sided)Pr(20-sided) = 1/20*1/2 = 1/40The die is more likely to be 6-sided since 1/12 > 1/40.

Page 16: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Changing the Prior• Suppose that the bag contained 4 20-sided dice and just 2

6-sided dice before I drew a die and rolled a “6”. Which die is more likely now?

• Pr(“6” | 6-sided die) = 1/6

• Pr(“6” | 20-sided die) = 1/20

• Pr(6-sided die | “6”) ∝Pr(“6” | 6-sided)Pr(6-sided) = 1/6 * 1/3 = 1/18

• Pr(20-sided | “6”) ∝Pr(“6” | 20-sided)Pr(20-sided) = 1/20*2/3 = 1/30 Despite the larger number of 20-sided dice, the die is still more likely to be 6-sided due to how much more unlikely a “6” is on a 20-sided die.

Page 17: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Changing the Prior, II• Suppose that the bag contained 9 20-sided dice and just 1

6-sided die before I drew a die and rolled a “6”. Which die is more likely now?

• Pr(“6” | 6-sided die) = 1/6

• Pr(“6” | 20-sided die) = 1/20

• Pr(6-sided die | “6”) ∝Pr(“6” | 6-sided)Pr(6-sided) = 1/6 * 1/10 = 1/60 = 0.0167

• Pr(20-sided | “6”) ∝Pr(“6” | 20-sided)Pr(20-sided) = 1/20*9/10 = 9/200 = 0.045With enough 20-sided dice to begin with, it finally becomes more likely that the die was actually 20-sided, despite the evidence.

Page 18: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Finding Exact Probabilities• The numbers we derived tell us which hypothesis is most likely, but

they are not the true probabilities.

• We need to divide each quantity by Pr(evidence) to get the true probabilities

• Pr(evidence) =Pr(evidence ^ hypothesis1) + Pr(evidence ^ hypothesis2) + … + Pr(evidence ^ hypothesisN) =Pr(evidence | hypothesis1)Pr(hypothesis1) + Pr(evidence | hypothesis2)Pr(hypothesis2) + …Pr(evidence | hypothesisN)Pr(hypothesisN)

• In other words, we need to divide each of our results by the sum of the results to get the true probabilities

(this assumes we’ve exhausted the

possible explanations for the evidence)

these are the numbers we calculated before!

Page 19: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Finding Exact Probabilities - Example

• I just rolled a die that was drawn at random from a bag containing an equal number of 6-sided dice (numbered 1-6) and 20-sided dice (numbered 1-20). I tell you the die roll is a 6. What is the probability of each hypothesis in light of the evidence (6-sided vs 20-sided)?

• Pr(6-sided die | “6”) ∝Pr(“6” | 6-sided)Pr(6-sided) = 1/6 * 1/2 = 1/12

• Pr(20-sided | “6”) ∝Pr(“6” | 20-sided)Pr(20-sided) = 1/20*1/2 = 1/40

• Likelihood of the evidence (overall chance of a 6) = Pr(“6” | 6-sided)Pr(6-sided) + Pr(“6” | 20-sided)Pr(20-sided) = 1/12 + 1/40

• Pr(6-sided die | “6”) = 1/12/(1/12 + 1/40) = 0.76

• Pr(20-sided die | “6”) = 1/40/(1/12 + 1/40) = 0.24

• Notice how dividing by the sum forces the actual probabilities to sum to 1.

Page 20: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

A Science-Themed Example• Bayes’ rule can be used in scientific applications to balance prior knowledge

with new observations.

• Scientists estimate that an asteroid is 60% likely to be composition A, and 40% likely to be composition B. Then they get some readings that would have had a 20% probability of being generated under composition A, and a 50% probability of being generated under composition B. Which hypothesis is more likely now?

• Pr(A | readings) ∝ Pr(readings | A) Pr(A) = 0.2 * 0.6 = 0.12

• Pr(B | readings) ∝ Pr(readings | B) Pr(B) = 0.5 * 0.4 = 0.2

• So hypothesis B is more likely - the new readings shifted our uncertain opinion

• Specifically, we should think Pr(B | readings) = 0.2/(0.12 + 0.2) = 20/32 = 0.625 (still pretty uncertain)

Page 21: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

This Really Happened the Other Day

• I was playing a board game with friends (Gloomhaven) when a friend drew a “curse” card.

• “I’m not sure whether I shuffled since adding this to the [top of the] deck,” my friend said.

• There were 20 other cards in the deck (all not “curse” cards), and I trust my friend to be honest about his uncertainty.

• Pr(no shuffle) ∝Pr(curse | no shuffle)Pr(no shuffle) = 1*0.5 = 0.5

• Pr(shuffle) ∝ Pr(curse | shuffle)Pr(shuffle) = 1/21*0.5 = 0.024

• “I’m pretty sure you didn’t shuffle, then,” I said.

• If I’d actually done the math, I could have 0.5/(0.5+0.024) = 95% confidence.

Page 22: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

A Practice Problem• A friend flips a coin to determine who will pay for lunch, and it

comes up heads — you pay.

• The first time this happens, it seems normal. But the coin comes up heads 8 times in a row.

• Before, you would have said there was only a 1% chance that your friend would use a two-headed coin. But what should the probability be in light of all these “heads” results?

• Bayes’ Rule reminder: Pr(hypothesis | evidence) ∝ Pr(evidence|hypothesis)Pr(hypothesis)

Page 23: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

A Practice Problem• A friend flips a coin to determine who will pay for lunch, and it comes up heads —

you pay.

• The first time this happens, it seems normal. But the coin comes up heads 8 times in a row.

• Before, you would have said there was only a 1% chance that your friend would use a two-headed coin. But what’s the more likely hypothesis now? What are the probabilities?

• Pr(two-headed | results) ∝Pr(results | two-headed)Pr(two-headed) = 1*0.01 = 0.01

• Pr(not two-headed | results) ∝Pr(results | not two-headed)Pr(not two-headed) = 1/256*0.99 = 0.00387

• It’s now more likely that the coin is two-headed. Pr(two-headed | results) = 0.01/(0.01+0.00387) = 0.72

Page 24: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Bayes’ Rule Tells Us It’s Hard to Detect Rare Events Confidently

• Bayes’ rule tells us to pay attention to not just the immediate evidence, but base rates of events (our prior belief before receiving evidence)

• Our methods for detecting rare events — rare diseases, terrorists, etc. — all have “false positive” rates where “false alarms” happen.

• Combined with Bayes’ rule, our prior belief that the event is rare should make it extremely difficult for an even slightly unreliable technology to convince us otherwise

• Bayes’ rule tells us that most detections of rare events will be false positives, unless the method being used is extremely accurate.

General Population

bad stuff

F a l s e

P o s i t i v e

Page 25: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Security Example• Suppose we have an airport face-detection based security measure

for catching terrorists that has a 99% chance of going “boop” if a terrorist passes through, but has a 2% chance of going “boop” on a regular person. Suppose that 1 in 10,000 people is a terrorist. Our detector is going “boop” - what is the chance that it’s a false alarm?

• Pr (terrorist | boop) ∝ P(boop | terrorist) P(terrorist) = 0.99 * 0.0001 = 0.000099

• Pr (not terrorist | boop) ∝ P(boop | not terrorist) P(not terrorist) = 0.02 * 0.9999 = basically 0.02

• .02/0.020099 = 0.995 or 99.5% chance the boop is a false alarm.

Page 26: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

A Real Example of the Difficulty of Reliable Detection• From Nate Silver’s The Signal and the Noise (p. 245):

• “Studies show that if a woman does not have cancer, a mammogram will incorrectly claim that she does only about 10 percent of the time. If she does have cancer, on the other hand, they will detect it about 75 percent of the time. When you see those statistics, a positive mammogram seems like very bad news indeed. But if you apply Bayes’s theorem to these numbers, you’ll come to a different conclusion: the chance that a woman in her forties has breast cancer given that she’s had a positive mammogram is still only about 10 percent….For this reason, many doctors recommend that women do not begin getting regular mammograms until they are in their fifties and the prior probability of having breast cancer is higher.”

Page 27: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Updating Beliefs With More Evidence

• We can sometimes get around the problem of strong priors by incorporating multiple pieces of evidence.

• As long as the pieces of evidence are “conditionally independent” (independent besides their shared dependence on the hypothesis), we can treat the probabilities derived after one Bayesian calculation as our new prior going forward.

• This approach allows scientists to incorporate multiple experimental results, robots to combine observations across different sensors, and law enforcement to use multiple lines of evidence

Page 28: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Updating Beliefs Example• Besides the face recognition software, suppose we also employ voice

recognition software that has a 1% false positive rate and 5% false negative rate - and it goes “bing” (positive) on our suspect from before

• From the face recognition results we have new priors of Pr(normal) = 0.995, Pr(terrorist) = 0.005

• Pr(terrorist | bing) ∝ Pr(bing | terrorist)Pr(terrorist) = 0.95*0.005 = 0.00475

• Pr(not terrorist | bing) ∝ Pr(bing | not terrorist)Pr(not terrorist) = 0.01*0.995 = 0.00995

• Pr(terrorist | bing) = 0.00475/(0.00475 + 0.00995) = 0.32 or 32%, which maybe at least makes this worth investigating further

Page 29: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Nate Silver and Being Bayesian

• Nate Silver, founder of fivethirtyeight.com, rose to fame in 2008 for successfully predicting 49 of 50 states’ outcomes in that presidential election

• He’s argued that his primary innovation was simply being “Bayesian” about the polls - instead of choosing one poll to believe, he would constantly update his belief with the evidence of each new poll

• His book The Signal and the Noise is all about how different scientific fields seem to be progressing or not, depending on how thoroughly they’ve adopted Bayesian methods

Page 30: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

A Quote from The Signal and the Noise (p. 452)

• “Bayes’s theorem encourages us to be disciplined about how we weigh new information….Most of the time, we do not appreciate how noisy the data is, and so our bias is to place too much weight on the newest data point….

• “But we can have the opposite bias when we become too personally or professionally invested in a problem, failing to change our minds when the facts do…The more often you are willing to test your ideas, the sooner you can begin to avoid these problems and learn from your mistakes.”

Page 31: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Application to the Monty Hall Problem

• Speaking of being too attached to priors: here’s a classic probability problem that seems slightly paradoxical, named after the host of the show “Let’s Make a Deal”

• A game show host offers you the choice of three doors. Behind one is a new car you’d like to win. Behind the other two doors are goats.

• The host lets you pick a door, which you basically do at random.

• Then the host says, “I’m going to choose a door that you didn’t pick, and show you what’s behind that door.” He opens a door and reveals a goat. (You’re certain he would not reveal the car at this stage, and that he must reveal a door you did not pick.)

• “Knowing what you know now, would you like to switch doors?”

pick

Page 32: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Bayes and the Monty Hall Problem

• For the sake of clarity, let’s assume (without loss of generality) we picked door 2 and the host opened door 3.

• There are now only two valid hypotheses - car behind 1, car behind 2.

• Our evidence is that door 3 was opened.

• What was the likelihood door 3 was opened if 2 had the car, and we are right?

• What was the likelihood door 3 was opened if 1 had the car, and we were wrong?

pick

Page 33: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Bayes and the Monty Hall Problem

• What was the likelihood door 3 was opened if 2 had the car, and we are right? 1/2 since the host could open either door 1 or door 3 with equal likelhood

• What was the likelihood door 3 was opened if 1 had the car, and we were wrong?1 since the host is forced to open the non-pick, non-car door

pick

“Door 1 Has a Car” Worldcan’t reveal

pick

“Door 2 Has a Car” Worldcan can can’t can

Page 34: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Bayes and the Monty Hall Problem

• Pr(door 1 has car | door 3 revealed) ∝Pr(door 3 revealed | door 1 car)Pr(door 1 car)=1*1/3 = 1/3

• Pr(door 2 has car | door 3 revealed) ∝Pr(door 3 revealed | door 2 has car)Pr(door 2 car)= 1/2*1/3 = 1/6

• So in fact, door 1 is the more likely hypothesis (with probability (1/3)/(1/3 + 1/6) = 2/3), and we should switch

pick

Page 35: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Monty Hall Without Bayes’ Rule

• Bayes’ rule is just a convenient shortcut when reasoning about conditional probabilities

• We could see the same results by mapping out all the possibilities

car door

1

2

3

123123123

your pick

switch loses

switch loses

switch loses

switch wins

switch wins

switch wins

switch winsswitch winsswitch wins

Page 36: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Bayes is Old News But Also Currently Hot

• Bayes’ original work appeared in 1763, two years after his death

• Laplace, competitor with Newton for the development of calculus, developed the theory further (1774)

• But when statistics took off, its inventors emphasized the importance of a single experiment, and didn’t like the idea of incorporating priors (especially Fisher, 1890-1962)

• Non-Bayesian “frequentist” statistics is still the most common kind taught in undergraduate curricula, though Bayesian statistics is gaining traction

• Artificial intelligence got a major boost in the 80’s thanks to the efforts of Judea Pearl, who showed how to use Bayes’ rule to incorporate evidence over time and across sources of evidence

• Sebastian Thrun demonstrated how useful Bayesian reasoning could be to self-driving cars in the 2005 DARPA Grand Challenge; Bayesian reasoning drives Google’s self-driving cars today

Page 37: CS1800 Probability 3: Bayes’ Rule · The Uses of Bayes’ Rule • Bayes’ Rule or Bayes’ Theorem is a powerful mathematical tool that allows you to decide which of several explanations

Summary• Bayes’ rule tells us that the likelihood of a hypothesis in light of the evidence is

proportional to the product of:

• our prior probability of the hypothesis

• the likelihood of the evidence if the hypothesis is true

• In other words, Pr(hypothesis | evidence) ∝ Pr(evidence | hypothesis)Pr(hypothesis)

• Given a complete set of hypotheses, we can calculate their probabilities by dividing these results by their sum, thus scaling them to sum to 1

• Bayes’ rule is a useful way to take into account everything we know and believe, including prior beliefs, base rates of uncommon events, and experiment results