Business Statistics 41000: Probability...

104
Business Statistics 41000: Probability 1 Drew D. Creal University of Chicago, Booth School of Business Week 3: January 24 and 25, 2014 1

Transcript of Business Statistics 41000: Probability...

Page 1: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Business Statistics 41000:

Probability 1

Drew D. Creal

University of Chicago, Booth School of Business

Week 3: January 24 and 25, 2014

1

Page 2: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Class information

I Drew D. Creal

I Email: [email protected]

I Office: 404 Harper Center

I Office hours: email me for an appointment

I Office phone: 773.834.5249

Course homepage

http://faculty.chicagobooth.edu/drew.creal/teaching/index.html

2

Page 3: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Course schedule

I Week # 1: Plotting and summarizing univariate data

I Week # 2: Plotting and summarizing bivariate data

I Week # 3: Probability 1

I Week # 4: Probability 2

I Week # 5: Probability 3

I Week # 6: In-class exam

I Week # 7: Statistical inference 1

I Week # 8: Statistical inference 2

I Week # 9: Simple linear regression

I Week # 10: Multiple linear regression

3

Page 4: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Outline of today’s topics

I. Discrete random variables (AWZ p. 215-216)

I Discrete probability distributions (AWZ p. 949-950)

I The Bernoulli distributionI Computing the probabilities of subsets of

outcomes

II. Expectation and variance of a discrete randomvariable

III. Mode of a discrete random variable

IV. Conditional, marginal, and joint distributions (AWZ

p. 230-236)

V. Several random variables

4

Page 5: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Why probability?

5

Page 6: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Why probability?

I In lectures #1 and #2, we looked at various types of datain different ways.

I We learned to use plots and numerical summary statisticsto identify patterns in the data and see how variablesrelated to one another.

I If we find patterns, we can use them to predict.

I For example, we used regression to predict the sales priceof a house given its size.

6

Page 7: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Why probability?

I To make predictions, we use a mathematical model forthe relationship.

I However, in business and economic applications, thesespecifications are rarely exact.

Instead of saying:“if x is this, then y must be that”

we want to say:“if x is this, then y will probably be within thisrange of values.”

Probability is a way of modelling uncertaintymathematically.

7

Page 8: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example: Gallup poll

Gallup (1/22/14): In U.S., 65% Dissatisfied With HowGov’t System Works

“ Sixty-five percent of Americans are dissatisfied with the nation’s systemof government and how well it works, the highest percentage in Gallup’strend since 2001. Dissatisfaction is up five points since last year, and hasedged above the previous high from 2012 (64%)....

Results:....are based on telephone interviews conducted Jan. 5-8, 2014,with a random sample of 1,018 adults....the margin of sampling error is±3 percentage points at the 95% confidence level.”

Source: www.gallop.com/poll

8

Page 9: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example: Rasmussen poll

Rasmussen: 68% Expect NSA Phone Spying To Stay theSame or Increase

“Despite President Obama’s announcement of tighter controls on theNational Security Agency’s domestic spying efforts, two-out-of-three U.S.voters think spying on the phone calls of ordinary Americans will stay thesame or increase.

.....The margin of sampling error for the full sample of 1,000 LikelyVoters is ± 3 percentage points with a 95% level of confidence. .”

Source: www.rasmussenreports.com

9

Page 10: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Why probability?

I In the previous examples, they mention the sampling“error?”

I What do they mean by this?

I How are they estimating the “error?”

10

Page 11: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Why probability?

Answer: they took a random sample and computed

20.5√1018

= 0.0313 ≈ 3%

20.5√1000

= 0.0316 ≈ 3%

These calculations come from a probability model, which wewill study extensively!!

Importantly, this model is based on a set of assumptionsthat could be wrong!

You need to understand these assumptions and be able tothink critically about them!

11

Page 12: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Discrete Random Variables

12

Page 13: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Discrete random variables

Suppose you are a manager trying to estimate the number ofunits of a product you will sell next quarter.

Suppose you know (unrealistically) that sales will be 1, 2, 3, or4 (thousand) units.

But, you are not sure which one it will be.

First, why is sales a discrete random variable?

13

Page 14: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Discrete random variables

Let the random variable S denote sales.

Since S can only take on the values 1, 2, 3, or, 4 it is adiscrete random variable.

A probability distribution is a way to express this uncertaintymathematically.

s p(s) ←− probability of each value

list of possible values ↗ 1 0.095or outcomes 2 0.230

3 0.4404 0.235

14

Page 15: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Discrete random variables

In a probability distribution, the probabilities always sum toone by definition.

s p(s) ←− probability of each value

list of possible values ↗ 1 0.095or outcomes 2 0.230

3 0.4404 0.235

1.0

p(s) = Prob(S = s)

15

Page 16: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Remarks on notation

I In words, the notation Prob(X = x) means “theprobability that the random variable X takes on thenumber x .”

I It is common convention to use capital letters (or words)such as X or Z to denote a random variable.

I The possible values that a random variable can take onare also known as outcomes.

I It is common for lower case letters such as x or z todenote the outcomes.

I It is common to abbreviate random variable as r.v..

16

Page 17: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

A picture of the discrete random variable’s

distribution

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

0.2

0.4

0.6

0.8

1.0

s

p(s)

17

Page 18: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Discrete random variable

A discrete random variable is a numeric quantitythat can take on any one of a countable number

of possible values. However, it is unknown inadvance which value will occur.

Remarks:

I This is how we quantify or model uncertainty when a random eventor experiment can take on a countable number of values.

I We list the possible values the variable can take on (i.e. theoutcomes).

I We assign to each number (or outcome) a probability.

18

Page 19: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Discrete random variable

Remarks continued:

I A probability is a number between 0 and 1.

I When the probabilities are summed up over the possibleoutcomes, the probabilities always sum to one.

I The word “discrete” emphasizes that the number ofoutcomes is finite (we can create a list of them).

I In our sales example, there were only 4 possible outcomesfor the r.v. S .

I Later, we will study continuous random variables whichmay take on a continuous range of values.

19

Page 20: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example: coin tossing

I Imagine a random experiment where we toss two coins.

I We define the random variable X to be the number ofheads in two tosses.

I We assume each coin is “fair” so that the probability oftossing a head or tail is 1

2.

I Before tossing the coins, we know that there are 3possible outcomes: x = 0, 1, and 2.

x p(x)0 0.251 0.502 0.25

The probability distribution of therandom variable X .

20

Page 21: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Probability distribution of a discrete r.v.

The probability distribution of a discreterandom variable has two parts:

1.) a list of the possible outcomes.

2.) a list of the probabilitiesfor each outcome.

x p(x)x1 p1

x2 p2...

...

For a discrete r.v., we canthink of a probabilitydistribution as a table.

21

Page 22: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Remarks on notation

I You will often see probabilities written as p(x) orProb(X = x) or Pr(X = x) or P(X = x) or pX (x).

I These are all common notation for the same thing. It justdepends on the author’s preferences.

I With the notation p(x) it should be understood from thecontext that you are talking about the random variable Xwhich may take on an outcome x .

I In our sales example, p(1) is the probability that our salesduring the next quarter is 1,000 units.

22

Page 23: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Interpreting probabilities

The easiest way to interpret probabilities are...

I Probability is a measure of uncertainty with valuesbetween 0 and 1.

I An outcome with a probability of 0 will basically neverhappen.

I An outcome with a probability of 1 will basically alwayshappen.

23

Page 24: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Interpreting probabilities

There are more “philosophical” ways of interpretingprobabilities.

Two common ways are: frequentist and subjective (Bayesian).

Consider again the example where we toss two fair coins.

x p(x)0 0.251 0.502 0.25

Frequentist: In the long run, if Itoss the two coins over and over andover..., I will get 1 head 50% of thetime.

Subjective: I am indifferent between

betting on the event “1 head” or the

event “0 or 2 heads.”

24

Page 25: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Interpreting probabilities

Consider the sales example again.

s p(s)1 0.0952 0.2303 0.4404 0.235

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

0.2

0.4

0.6

0.8

1.0

s

p(s)

“Its’s about twice as likely that we will sell 3,000 units as it isthat we will sell 2,000 or 4,000 units.”

“If all our quarters were like this, we would see sales of 1,000units about once in every 10 quarters.”

25

Page 26: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Assigning Probabilities to Categorial Variables

I Remember that part of our definition of a randomvariable is that it’s value always takes on a “number.”

I What about situations where we have a randomexperiment and the variable of interest is a categoricalvariable (which is typically not a number)?

I In this case, we just assign a number to each category.

I We can then assign a probability to each number.

26

Page 27: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Assigning Probabilities to Categorial Variables

Example: For the variable “Reg” from the British marketingdata set, we assigned each region a number.

1 “Scotland”2 “North West”3 “North”4 “Yorkshire & Humberside”5 “East Midlands”6 “East Anglia”7 “South East”8 “Greater London”9 “South West”

10 “Wales”11 “West Midlands”

27

Page 28: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Assigning Probabilities to Categorial Variables

Example: Who will win the MVP at the Super Bowl?

I We let the random variable M be the football player thatwins the MVP.

I We label each player with a number.

1 “Peyton Manning” 4 “Russell Wilson”2 “Wes Welker” 5 “Marshawn Lynch”3 “Eric Decker” 6 “Richard Sherman”

I The outcomes are h = 1, 2, 3, 4, 5, 6.

I We then assign probabilities to each outcome.

I P(M = 1) = p(1) = P(Peyton Manning wins the MVP)

28

Page 29: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

The Bernoulli (and uniform) distributions

29

Page 30: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

The Bernoulli distribution

Our fundamental discrete random variable is the dummyvariable, with two outcomes 0 and 1.

Some examples are.

Clinical trials: T is a r.v. describing a test for a disease. T = 0 ifthe person does not have the disease. T = 1 if they do.

Marketing example: B is a r.v. describing whether a person buys aproduct. B = 0 if the person does not buy the product. B = 1 if they do.

Sports: Rafael Nadal is about to hit a first serve. A is a r.v.

describing whether he hits an ace. A = 0 if he does not hit an ace. A = 1

if he does.

30

Page 31: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

The Bernoulli distribution

I In general, we say a discrete random variable taking ononly two values (such as a dummy variable) has aBernoulli distribution.

I You may often hear it called a “Bernoulli Trial.”

I The value 1 is often called a “success.”

I Suppose we label this random variable X , then we have

x p(x)0 1− p1 p

Pr(X = 1) = p

I Notation: X ∼ Bernoulli(p)

31

Page 32: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

The Bernoulli distribution

X ∼ Bernoulli(p) means that X is a discreter. v. with the following probability distribution:

x p(x)

0 1− p1 p

where p is the probability that X equals 1.

I In words, “the random variable X is distributed as Bernoulli withparameter p.”

I The “Bernoulli” is a family of probability distributions, where eachprobability distribution is indexed by the parameter p.

32

Page 33: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

The Bernoulli distribution: further examples

Example: Tossing a fair coin.

Let X = 1 if the toss is heads and 0otherwise.

Then X ∼ Bernoulli(0.5).

x p(x)0 0.51 0.5

33

Page 34: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

The discrete Uniform distribution

X ∼ Discrete Uniform means that X is a discreter. v. taking on a finite number of

values with equal probabilities.

I If there are N outcomes, the probabilities are all 1N .

34

Page 35: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Probabilities of Subsets of Outcomes

Example: Suppose we toss a fair six-sided die. Let Z denotethe outcome of the toss.

What is the probability thatPr(2 < Z < 5)?

In other words, what is theprobability that we roll a 3 or 4?

z p(z)1 1/62 1/63 1/64 1/65 1/66 1/6

35

Page 36: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Probabilities of Subsets of Outcomes

To compute the probability that any one of a group ofoutcomes occurs we sum up their probabilities.

Pr(a < X < b) =∑

a<X<b p(x)

Example: Tossing a die.

Pr(2 < Z < 5) = p(3) + p(4) = 16

+ 16

= 13

36

Page 37: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Probabilities of Subsets of Outcomes

Sometimes, we may also want to know if something is greater(less) than or equal to!

Example: Tossing a die.

Pr(2 ≤ Z < 5) = p(2) + p(3) + p(4) = 16

+ 16

+ 16

= 12

37

Page 38: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Probabilities of Subsets of Outcomes

Example: Let’s return to our sales example where S denotesthe sales of units of our product (in thousands).

s p(s)1 0.0952 0.2303 0.4404 0.235

What is the probability that we sellmore than 1,000 units next quarter?

Pr(S > 1) = p(2) + p(3) + p(4)

= 0.23 + 0.44 + 0.235

= 0.905

38

Page 39: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Probabilities of Subsets of Outcomes

Example: let’s do it again!

s p(s)1 0.0952 0.2303 0.4404 0.235

What is the probability that we sellmore than 1,000 units next quarter?

We could have done it like this:

Pr(S > 1) = Pr(S 6= 1)

= 1− p(1)

= 0.905

39

Page 40: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Probabilities of Subsets of Outcomes

Example: one more time!

s p(s)1 0.0952 0.2303 0.4404 0.235

What is the probability that we sell3000 units or less next quarter?

Pr(S ≤ 3) = Pr(1) + Pr(2) + Pr(3)

= 0.095 + 0.23 + 0.44

= 0.765

40

Page 41: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Probabilities of Subsets of Outcomes

Here are two helpful reminders

1. “OR means ADD”Pr(X = a OR X = b) = p(a) + p(b)

As long as two events cannot both happen, theprobability of either is the sum of theprobabilities.

2. “NOT means ONE MINUS”Pr(X 6= a) = 1− p(a)

The probability that something does NOT happenis one minus the probability that it does.

41

Page 42: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Expectation and Variance of a Random

Variable

42

Page 43: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Expectation of a discrete random variable

Example: consider again the random variable S denoting sales.

s p(s)1 0.0952 0.2303 0.4404 0.235

Now, imagine your boss asks you topredict sales next quarter.

You have to come up with onenumber (a “guess”) even though youare not sure.

What number would you choose?

43

Page 44: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Expectation of a discrete random variable

One option (and not the only one) is to report the expectedvalue.

The expected value of a discrete random variable is:

E [X ] =∑

all x x ∗ p(x)

In words, the expected value is “the sum of the possibleoutcomes x where each one is weighted by its probabilityp(x).”

IMPORTANT: This is similar to the sample mean but this is NOT the

same thing. We will discuss this later on below.44

Page 45: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Computing the expected value

Example: consider again the random variable S which denotesthe sales of our product.

s p(s)1 0.0952 0.2303 0.4404 0.235

E (S) = .095 ∗ 1 + .23 ∗ 2 + .44 ∗ 3 + .235 ∗ 4

= 2.815

Yes. It does seem weird that 2.815 which is our “guess” for sales is not

one of the possible values. Think of this as saying “we think sales is likely

to be somewhere around 3 thousand units, but it’s more likely to be

under 3 thousand than over.”

45

Page 46: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Notation for the expected value

I Different authors use different notation for the expectedvalue E [X ] including E (X ) and E [X ].

I It is common notation in statistics to use the Greeksymbol

µ or µx

which is pronounced as “mu.”

I We often say “mean” instead of “expected value.” Whatwe mean by this is that the “expected value of X ” is the“mean of the r.v. X .”

46

Page 47: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Sample Mean vs. the Expected Value

The Sample Mean The Expected Value

variable: in a data set, it random variable: a mathematicalis the observed set of values model for an uncertain

quantity

sample mean: of a variable in expected value (mean):

our data is 1n

n∑i=1

xi of a r.v. is

E [X ] =∑

all x x ∗ p(x)

It is the average of the Average of the possible valuesobserved values in the data set. taken by a r.v. weighted by

their probabilities.

47

Page 48: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Expected Value of a Function of a Discrete

Random VariableSometimes we will be interested in the expected value of some functionof a random variable. For example, let W be the prize a game showcontestant ends up with.

Example: Deal or No Deal

George has cases worth $5, $400, $10,000, and $1,000,000remaining. There are 4 outcomes and each is equally likely.The banker’s offer is $189,000.

The expected value is

E [W ] = .25 ∗ 5 + .25 ∗ 400 + .25 ∗ 10, 000 + .25 ∗ 1, 000, 000

= $252, 601.30

Is the banker’s offer a good or bad deal?48

Page 49: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Expected Value of a Function of a Discrete

Random Variable

BUT, this assumes people choose based on expected values.Economists believe in diminishing marginal utility of income.The more wealth you have the less utility you get from eachadditional $1.

This is often modeledwith a utility functionover wealth.

Let’s assumesomething simplesuch asU(W ) =

√W .

0 25000 50000 75000 100000 125000 150000 175000 200000 225000 250000 275000 300000

100

200

300

400

500

U(W)

Wealth

Utility function

49

Page 50: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Expected Value of a Function of a Discrete

Random Variable

To compute the expected value E [f (W )] = E[√

W]

in the

case of a discrete random variable, just take the function f (.)of each possible outcome, then multiply by the probability andadd them together.

What is George’s expected utility?

E[√

W]

= .25√

5 + .25√

400 + .25√

10, 000 + .25√

1, 000, 000

= 288.56

Compare this to the utility of the banker’s offer:√

189, 000 = 434.74

50

Page 51: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Variance of a Discrete Random Variable

To understand how much the discrete random variable X variesabout its mean (expected value), we define the variance.

The variance of a discrete random variable X is:

Var [X ] =∑

all x p(x) (x − µx)2 = E[

(X − µx)2]

I In words, the variance “is the expected squared distanceof the r.v. X from its mean.”

I If we take µx to be our prediction for X , you can think ofit as a weighted average of the “squared prediction error.”

51

Page 52: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Variance of a Discrete Random Variable

Example: consider again the random variable S denoting salesnext quarter.

s p(s)1 0.0952 0.2303 0.4404 0.235

Imagine your boss asks you to alsoreport the uncertainty associated withyour predicted sales next quarter.

E [S ] = 2.815

V [S ] = .095 (1− 2.815)2 + .23 (2− 2.815)2

+.44 (3− 2.815)2 + .235 (4− 2.815)2

= 0.811

The units are in squared thousands

of units sold.52

Page 53: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Remarks on Notation

I For the variance Var [X ] of X , it also common to use theabbreviated version V [X ].

I It is common notation in statistics to use the Greeksymbol

σ2 or σ2x

which is pronounced as “sigma squared.”

53

Page 54: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Standard deviation of a Discrete Random Variable

The standard deviation of a discrete random variable X is:

σX =√σ2X

The standard deviation of a random variable X is the squareroot of the variance of X .

54

Page 55: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example: consider again the random variable S denoting salesnext quarter.

Consider two different distributions for sales denoted by p1(s)and p2(s).

s p1(s) p2(s)1 0.01 0.302 0.10 0.303 0.80 0.204 0.09 0.20

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

0.2

0.4

0.6

0.8

1.0

S

p(s)p2

p1

Which distribution (p1(s) or p2(s)) has the larger expectedvalue and/or variance? (Answers on the next slide.)

55

Page 56: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example: If these were the distributions, what are theexpected values and variances?

E1 (S) = .01 ∗ 1 + .1 ∗ 2 + .8 ∗ 3 + .09 ∗ 4 = 2.97

E2 (S) = .3 ∗ 1 + .3 ∗ 2 + .2 ∗ 3 + .2 ∗ 4 = 2.3

V1 (S) = .01 (1− 2.97)2 + .1 (2− 2.97)2

+.8 (3− 2.97)2 + .09 (4− 2.97)2 = 0.2291

V2 (S) = .3 (1− 2.3)2 + .3 (2− 2.3)2

+.2 (3− 2.3)2 + .2 (4− 2.3)2 = 1.21

(NOTE: the notation E1 (S) and V1 (S) are the mean and variance of the first

probability distribution p1(s).)

56

Page 57: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example: consider again the random variable S denoting salesnext quarter.

Consider three more distributions for sales denoted byp3(s), p4(s), and p5(s). (NOTE: p4(s) is the same as the original distribution above.)

s p3(s) p4(s) p5(s)1 0.20 0.095 0.052 0.30 0.230 0.203 0.30 0.440 0.504 0.20 0.235 0.25

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

S

p(s)

p3(s)

p4(s)

p5(s)

The means are E3 (S) = 2.5,E4 (S) = 2.815,E5 (S) = 2.95 while the

variances are V3 (S) = 1.05,V4 (S) = 0.811,V5 (S) = 0.648.

57

Page 58: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Mean and Variance of a Bernoulli Distribution

Suppose X ∼ Bernoulli(p) then the mean and variance are

E (X ) = p ∗ 1 + (1− p) ∗ 0 = p

V (X ) = p (1− p)2 + (1− p) (0− p)2

= p (1− p) [(1− p) + p] = p (1− p)

I For what value of p is the mean the smallest (biggest)?

I For what value of p is the variance the smallest (biggest)?

58

Page 59: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Final Comments on the Mean and Variance

I The sample mean, sample variance, and sample standarddeviation of a set of numbers are sample statisticscomputed from observed data.

I The mean, variance, and standard deviation of a randomvariable are properties of its probability distribution whichis a mathematical model of uncertainty.

I They do share a lot of the same properties.

I The distinction between them is subtle but important forlater on in the course!!

59

Page 60: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

The mode of a discrete distribution

For a discrete r.v. X , the mode of its probabilitydistribution is the most likely value.

I In other words, the mode is the outcome x that has thelargest probability.

I The mode does not have to be unique because there couldbe multiple outcomes that share the largest probability.

60

Page 61: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

The mode of a discrete distribution

Consider the two different distributions for sales S denoted byp1(s) and p2(s).

s p1(s) p2(s)1 0.01 0.302 0.10 0.303 0.80 0.204 0.09 0.20

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

0.2

0.4

0.6

0.8

1.0

S

p(s)p2

p1

What are the modes of the distributions p1(s) and p2(s)?

61

Page 62: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Remarks on the mode

I The mode of a probability distribution is not the samething as the sample mode.

I The sample mode is the value that occurs mostfrequently in a dataset.

I For discrete numeric data, you may occasionally see thesample mode reported.

62

Page 63: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Conditional, Marginal, and Joint

Distributions

63

Page 64: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Conditional, Marginal, and Joint Distributions

I What happens when there are two (or more) variablesthat we are uncertain about?

I How do we describe them probabilistically?

I We want to use probability to understand how two (ormore) variables are related.

I In this section, we extend the results above to more thanone variable.

64

Page 65: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Extending the sales example to include economic

conditions

Example: consider a slightly more complicated but(potentially) more realistic version of our sales example wherewe also take into account the condition of the economy

We want to think about the economy and our sales“together,” that is jointly.

For simplicity, our model thinks of the economy next quarteras either up or down.

It is a Bernoulli random variable!

65

Page 66: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Extending the sales example to include economic

conditions

Example continued:

Again, the random variable S denotes sales (in thousands ofunits) next quarter.

Let E denote the economy next quarter where E = 1 if theeconomy is up and E = 0 if it is down.

How can we think about E and S together?

66

Page 67: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example continued:

First: what do we think will happen with the economy? Up ordown?

Second: given the economy is up (down), what will happen tosales?

Suppose we know

p(E = 1) = p(Up) = 0.7

which of course implies that

p(E = 0) = p(Down) = 0.3

Our model for the economy is

E ∼ Bernoulli(0.7)

67

Page 68: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example continued:

Question: If the economy is up, will it be more or less likelythat sales will take on higher values? How can we representthis mathematically?

68

Page 69: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example continued:

Answer: Specify two different probability distributions for S ,one for each possible value that E can take on!

p(S = s|E = 1): the distribution of sales given that theeconomy is up.

p(S = s|E = 0): the distribution of sales given that theeconomy is down.

69

Page 70: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example continued: Suppose we decide

s p(s|E = 1) s p(s|E = 0)

1 0.05 1 0.202 0.20 2 0.303 0.50 3 0.304 0.25 4 0.20

These are called conditional probability distributions. (NOTE: These

are the same as the earlier distributions p3(s) and p5(s).)

Conditional on the economy being up (E = 1), sales of ourproduct are more likely to be higher than when the economy isdown (E = 0).

If our product is actually procyclical, then this is likely to be abetter model of reality than our earlier model.

70

Page 71: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

We just defined two different probability distributions for therandom variable S depending on the value of E .

We can easily compute the expected value and variance ofeach of these distributions.

s p(s|E = 1) s p(s|E = 0)

1 0.05 1 0.202 0.20 2 0.303 0.50 3 0.304 0.25 4 0.20

E (S |E = 1) = .05 ∗ 1 + .2 ∗ 2 + .5 ∗ 3 + .25 ∗ 4 = 2.95

E (S |E = 0) = .2 ∗ 1 + .3 ∗ 2 + .3 ∗ 3 + .2 ∗ 4 = 2.5

These are called conditional means.

71

Page 72: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

We can also compute the variances of the conditionalprobability distributions

V (S|E = 1) = .05 (1− 2.95)2 + .2 (2− 2.95)2 + .5 (3− 2.95)2 + .25 (4− 2.95)2

= .05 ∗ 2.8025 + .2 ∗ 0.9025 + .5 ∗ 0.0025 + .25 ∗ 1.1025

= 0.1901 + 0.1805 + 0.00125 + 0.2756 = 0.6475

V (S|E = 0) = .2 (1− 2.5)2 + .3 (2− 2.5)2 + .3 (3− 2.5)2 + .2 (4− 2.5)2

= .2 ∗ 2.25 + .3 ∗ 0.25 + .3 ∗ 0.25 + .2 ∗ 2.25

= 0.45 + 0.075 + 0.075 + 0.45 = 1.05

These are called conditional variances.

72

Page 73: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Conditional means and variances

I The mean (variance) of the conditional distribution iscalled a conditional mean (variance).

I The distributions p (E |S) and p (S |E ) are bothconditional distributions.

I Both of these distributions have a conditional mean.

E [E |S ] =∑all e

e ∗ p (E = e|S)

E [S |E ] =∑all s

s ∗ p (S = s|E )

I The conditional mean of p (E |S) depends on the outcomeof the random variable S .

73

Page 74: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example continued:

I We’ve said what we think will happen for the economy E .

I We’ve said what we think will happen for sales S givenwe know E .

I What will happen for E and S jointly?

70% of the time the economy goes up, and 1/4 of those timessales = 4.

25% of 70% is 17.5%

Pr(S = 4 and E = 1) = Pr(E = 1) ∗ Pr(S = 4|E = 1)

= .7 ∗ .25

= .175

74

Page 75: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Computing joint probabilities

There are eight possible outcomes for (S ,E ).

E = 1 (UP)

E = 0

(DOWN)

0.3

0.7

0.25

0.5

0.2

0.05

0.2

0.3

0.3

0.2

S = 4 P(S = 4 and E = 1) = 0.7 * 0.25 = 0.175

S = 3 P(S = 3 and E = 1) = 0.7 * 0.5 = 0.35

S = 2 P(S = 2 and E = 1) = 0.7 * 0.20 = 0.14

S = 1 P(S = 1 and E = 1) = 0.7 * 0.05 = 0.035

S = 4 P(S = 4 and E = 0) = 0.3 * 0.25 = 0.06

S = 3 P(S = 3 and E = 0) = 0.3 * 0.3 = 0.09

S = 2 P(S = 2 and E = 0) = 0.3 * 0.3 = 0.09

S = 1 P(S = 1 and E = 0) = 0.3 * 0.2 = 0.06

75

Page 76: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

When both variables are discrete, we can display the jointprobability distribution of (E , S) in a table:

(e, s) Pr(E = e and S = s)(1,4) 0.175(1,3) 0.350(1,2) 0.140(1,1) 0.035(0,4) 0.060(0,3) 0.090(0,2) 0.090(0,1) 0.060

I There are eight possible values for the pair of randomvariables (E , S).

I We list the eight outcomes.

I Then, we list the probability of each outcome (which wecalculated on the previous slide).

76

Page 77: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

When there are only two discrete random variables, we can alsodisplay the joint distribution of E and S in a different table.

Rows are values of E , columns are values of S .

S1 2 3 4

E 0 0.060 0.090 0.090 0.0601 0.035 0.140 0.350 0.175

I What is the probability that Pr(E = 1 and S = 4)?

I Answer: 0.175

I If we don’t know anything about E , what is Pr(S = 4)?

I Answer: .06 + .175 = .235 = pS(4)

77

Page 78: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Marginal distributionsWhat is the probability of S if we know nothing about E ?

S1 2 3 4

E 0 0.060 0.090 0.090 0.0601 0.035 0.140 0.350 0.175

pS(s) 0.095 0.230 0.440 0.235

I To obtain the probability distribution pS(s), we add thejoint probabilities for each outcome (i.e. add downwards).

I For example:

PS(1) = P(S = 1,E = 0) + P(S = 1,E = 1)

= 0.06 + 0.035 = 0.09578

Page 79: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Marginal distributionsWhat is the probability of E if we know nothing about S?

S1 2 3 4 pE (e)

E 0 0.060 0.090 0.090 0.060 0.3001 0.035 0.140 0.350 0.175 0.700

pS(s) 0.095 0.230 0.440 0.235

I To obtain the probability distribution pE (e), we add thejoint probabilities for each outcome (i.e. add sideways).

I For example:

PE (0) = P(S = 1,E = 0) + P(S = 2,E = 0) + P(S = 3,E = 0) + P(S = 4,E = 0)

= 0.060 + 0.090 + 0.090 + 0.060 = 0.3

79

Page 80: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Marginal distributions

S1 2 3 4 pE (e)

E 0 0.060 0.090 0.090 0.060 0.3001 0.035 0.140 0.350 0.175 0.700

pS(s) 0.095 0.230 0.440 0.235

I The distributions pE (e) and pS(s) are called marginaldistributions.

I Why?

80

Page 81: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Conditional versus Marginals

Remember the three distributions p3(s), p4(s), and p5(s) fromabove?

It turns out that:

p3(s) = p (s|E = 0)

p4(s) = pS(s)

p5(s) = p (s|E = 1)

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

S

p(s)

p3(s)

p4(s)

p5(s)

I p4(s) is the marginal distribution.

I Notice that it lies “in-between” the two conditionaldistributions.

81

Page 82: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Conditional Probability Distribution

The conditional probability that Y turns out to be ygiven you know that X = x is denoted by

Pr(Y = y |X = x)

I In words, the “conditional prob. dist. of the randomvariable Y conditional on X is the probability that Y = ygiven that we know X = x .”

I A conditional probability distribution is a new probabilitydistribution for the random variable Y given that weknow X = x .

(NOTE: In our example, S was analagous to Y and E to X .)82

Page 83: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Joint Probability Distribution

The joint probability that Y turns out to be yand that X turns out to be x is denoted by

Pr(Y = y ,X = x) = Pr(Y = y and X = x)

I In words, “a joint probability distribution specifies theprobability that Y = y AND X = x .”

I It describes our uncertainty over both Y and X at thesame time.

(NOTE: In our example, S was analagous to Y and E to X .)

83

Page 84: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Remarks on Notation

I The notation for the conditional, marginal, and jointdistributions often gets abused and may be confusing.

I For the joint distribution, you may often see:P(Y = y ,X = x) = Pr(Y = y and X = x) = p(y , x).

I The order in which the variables are written does notmatter. Pr(Y = y and X = x) is the same asPr(X = x and Y = y).

I For the conditional distribution, authors often write:P(Y = y |X = x) = p(y |x)

I For the marginal distribution, we can use pX (x) or p(x)or P(X = x) as before. These are all the same.

84

Page 85: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Two Important Relationships

Relationship between Joint and Conditional

p(y , x) = p(x) ∗ p(y |x) = p(y) ∗ p(x |y)

Relationship between Joint and Marginal

p(x) =∑

y p(y , x)

p(y) =∑

x p(y , x)

85

Page 86: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example: consider again the sales example with the r.v.s(S ,E ).

JOINT: p(4, 1) = 0.175

In words, “What’s the chance the economy is up AND sales is4 units?”

CONDITIONAL: p(4|1) = 0.25

In words, “GIVEN you know the economy is up, what is thechance sales turns out to be 4 units?”

86

Page 87: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example continued:

MARGINAL: p(4) = pS(4) = .235 = .175 + .06

In words, “What’s the chance sales turns out to be 4 units?”

MARGINAL: p(1) = pE (1) = .7 = .175 + .35 + .14 + .035

In words, “What’s the chance the economy will be up?”

(NOTE: This last one can be a bit confusing because p(1) is ambiguous as both E

and S can take on the value 1.)

87

Page 88: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Conditionals from Joints

We derived the joint distribution of (E , S) by first consideringthe marginal of E and then thinking about the conditionaldistribution of S |E .

An alternative approach is to start with a joint distributionp(y , x) and the marginal pX (x) and then obtain theconditional distribution.

p(y , x) = pX (x)p(y |x)

=>

p(y |x) =p(y , x)

pX (x)

(Note: in the expression on the

left, the denominator is the

marginal probability.)

88

Page 89: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example: given that the economy is up (E = 1), what is theprobability that sales is 4?

S1 2 3 4 pE (e)

E 0 0.060 0.090 0.090 0.060 0.3001 0.035 0.140 0.350 0.175 0.700

pS(s) 0.095 0.230 0.440 0.235

Using the marginal P(E = 1) and joint probabilitiesP(S = 4,E = 1) we have

P(S = 4|E = 1) =P(S = 4,E = 1)

P(E = 1)=

0.175

0.7= 0.25

89

Page 90: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example: given that sales is (S = 4), what is the probabilitythat the economy is up?

S1 2 3 4 pE (e)

E 0 0.060 0.090 0.090 0.060 0.3001 0.035 0.140 0.350 0.175 0.700

pS(s) 0.095 0.230 0.440 0.235

Using the marginal P(S = 4) and joint probabilitiesP(S = 4,E = 1) we have

P(E = 1|S = 4) =P(S = 4,E = 1)

P(S = 4)=

0.175

0.235= 0.745

(NOTE: even though we started with distributions for E and S|E , we can still

calculate p(E |S).)90

Page 91: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

In general, you can compute the joint from marginals andconditionals and the other way around.

Which way you do it will depend on the problem.

Example: suppose you toss two fair coins: X is the first, Y isthe second. (NOTE : X = 1 is a head).

What is P(X = 1 and Y = 1) = P(two heads)?

I There are 4 possible outcomes for the two coins and eachis equally likely so it is 1

4.

I P(X = 1 and Y = 1) = P(X = 1)P(Y = 1|X = 1) =12

12

= 1/4.

91

Page 92: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Bayes Theorem

92

Page 93: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Bayes Theorem

In many situations, you will know one conditional distributionp(y |x) and the marginal distribution pX (x) but you are reallyinterested in the other conditional distribution p(x |y).

Given that we know p(y |x) and pX (x), can we computep(x |y)?

93

Page 94: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example: Testing for a Disease

Let D = 1 indicate you have a certain (rare) disease and letT = 1 indicate that you tested positive for it.

Suppose we know the marginal probabilities, P(D = 1), andthe conditional probabilities P(T = 1|D = 1) andP(T = 1|D = 0).

D = 1

D = 0

T = 1 P(D = 1 and T = 1) = 0.02 * 0.95 = 0.019

T= 0 P(T = 0 and D = 1) = 0.02 * 0.05 = 0.001

T= 0 P(T = 0 and D = 0) = 0.98 * 0.99 = 0.9702

T= 1 P(T = 1 and D = 0) = 0.98 * 0.01 = 0.0098

0.02

0.98

0.95

0.05

0.01

0.99

94

Page 95: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

We start with info about D and T |D. But if you are thepatient who tests positive for a disease you care aboutP(D = 1|T = 1)!

Given that you have tested positive, what is the probabilitythat you have the disease?

D0 1

T 0 0.9702 0.0011 0.0098 0.019

P(D = 1|T = 1) =P(D = 1,T = 1)

P(T = 1)=

.019

(.019 + .0098)= 0.66

95

Page 96: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Bayes Theorem

Computing p(x |y) from pX (x) and p(y |x) is called BayesTheorem.

p(x |y) =p(y , x)

pY (y)=

p(y , x)∑allx p(y , x)

=pX (x)p(y |x)∑allx pX (x)p(y |x)

Example: (from the last slide...)

p(D = 1|T = 1) =p(T = 1|D = 1)p(D = 1)

p(T = 1|D = 1)p(D = 1) + p(T = 1|D = 0)p(D = 0)

96

Page 97: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Bayes Theorem

Suppose that 52% of the U.S. population is currently Democrat and theremainder is Republican.

Let the r.v. D = 1 if a person is a Democrat and zero otherwise.

Recently, a poll was taken asking each voter their party and whether ornot they would vote for the healthcare bill.

Let the r.v. H = 1 if they would vote for the bill and zero otherwise.

The results of the poll indicated that 55% of Democrats would vote forthe healthcare bill while only 10% of Republicans would.

A distant friend of yours said that if given the chance she would vote for

the bill. For her, what is P(D = 1|H = 1)?

97

Page 98: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Bayes Theorem

We can apply Bayes’ Theorem

p(D = 1|H = 1) =p(H = 1|D = 1)p(D = 1)

p(H = 1|D = 1)p(D = 1) + p(H = 1|D = 0)p(D = 0)

=(0.55) ∗ (0.52)

(0.55) ∗ (0.52) + (0.10) ∗ (0.48)

=0.286

0.286 + 0.048= 0.856

98

Page 99: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Many Random Variables

99

Page 100: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Many Random variables

As we have seen in looking at data, we often want to thinkabout more than two variables at a time.

We can extend the approach we used with two variables.

Suppose we have three random variables (Y1,Y2,Y3).

p(y1, y2, y3) = p(y3|y2, y1)p(y2|y1)p(y1)

The joint distribution of all three variables can be broken downinto the marginal and conditionals distributions.

100

Page 101: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Sampling without Replacement

Example:

Suppose we have 10 voters. 4 are Republican and 6 areDemocrat.

We randomly choose 3. Let Yi = 1 if the i -th voter chosen isa Democratic and 0 if they are Republican for i = 1, 2, 3.

What is the probability of three Democrats?

In other words, P(Y1 = 1,Y2 = 1,Y3 = 1) = p(1, 1, 1)?

101

Page 102: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Sampling without Replacement

The answer is

p(Y1 = 1)p(Y2 = 1|Y1 = 1)p(Y3 = 1|Y1 = 1,Y2 = 1) =6

10

5

9

4

8=

1

6

Step 1: p(Y1 = 1) = 610 because 6 out of 10 voters are Democrats.

Step 2: p(Y2 = 1|Y1 = 1) = 59 because conditional on Y1 = 1 there are

now only 9 voters remaining and 5 are Democrats.

Step 3: p(Y3 = 1|Y2 = 1,Y1 = 1) = 48 because conditional on Y1 = 1

and Y2 = 1 there are only 8 voters remaining and 4 are Democrats.

Key Point: If Y1 = 1 then we do not “replace” the Democratthat was chosen first (and so on). This person can’t be chosenagain.

102

Page 103: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Example continued: There are a total of 8 outcomes.

The logic behind howeach probability iscalculated is the same ason the last slide.

(y1, y2, y3) p(y1, y2, y3)(0,0,0) 1/30(0,0,1) 1/10(0,1,0) 1/10(1,0,0) 1/10(0,1,1) 1/6(1,0,1) 1/6(1,1,0) 1/6(1,1,1) 1/6

What is the marginal distribution of Y1? Find all the outcomeswhere Y1 = 1 and add the probabilities.

p(Y1 = 1) = p(1, 0, 0) + p(1, 0, 1) + p(1, 1, 0) + p(1, 1, 1)

=1

10+

1

6+

1

6+

1

6=

6

10

103

Page 104: Business Statistics 41000: Probability 1faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/... · Business Statistics 41000: Probability 1 Drew D. Creal University of

Many Random variables

Above, we had three random variables (Y1,Y2,Y3).

Then, we decomposed their joint distribution as

p(y1, y2, y3) = p(y3|y2, y1)p(y2|y1)p(y1)

This is true for as many variables as you want.

p(y1, y2, . . . , yn) = p(yn|yn−1, yn−2, . . . , y2, y1) . . . p(y3|y2, y1)p(y2|y1)p(y1)

This is important because it allows us to extend our results ton random variables.

104