Download - Binomial distributions for sample counts Binomial distributions in statistical sampling Finding binomial probabilities Binomial mean and standard.

Transcript

5.2 Sampling Distributions for Counts and Proportions

Binomial distributions for sample counts Binomial distributions in statistical sampling Finding binomial probabilities Binomial mean and standard deviation Sample proportions Normal approximation for counts and proportions Binomial formula

1

2

Sampling Distributions

The law of large numbers assures us that if we measure enough subjects, the statistic x-bar will eventually get very close to the unknown parameter µ.

If we took every one of the possible samples of a certain size, calculated the sample mean for each, and graphed all of those values, we’d have a sampling distribution.

The population distribution of a variable is the distribution of values of the variable among all individuals in the population.

The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.

Binomial distribution for sample counts Example:

◦ n = 2500 adults asked whether shopping is frustrating n is the number of trials

◦ X = 1650 answered “Yes” X is the number of “successes”

◦ p-hat = X/n = 0.66 is the sample proportion (of successes)

Need to make sure we distinguish between the count and the sample proportion

Binomial Setting1. Each observation falls in just two categories:

Success/Failure Heads/Tails Yes/No

2. All observations are independent3. Fixed number of trials, n4. The probability of success, p, is the same in each trial

The distribution of the (total) count of successes in this binomial setting is:

Binomial distribution denoted B(n,p)

Examples: Toss a fair coin 10 times and count the number X of

heads◦ Binomial or not?◦ What about a biased coin?

Deal 10 cards from a shuffled deck of 52. X is the number of spades. ◦ Binomial?◦ Suggestions?

Number of girls born among first 100 children in a (large) hospital this year

Number of girls born in this hospital so far this year

Binomial distribution in statistical sampling SRS is not quite a Binomial setting

◦ Why? Check the 4 properties! However, if the population is 10 times larger than

our sample n, then the number of “successes” in the sample is approximately Binomial.◦ We say B(n,p)◦ Here p is the population success rate

usually unknown

How to calculate Binomial probabilities?

We will just use table C For given n and p, table gives the probability for k

successes Table only gives p’s of 0.5 or less

◦ If you have a p greater than 0.5, you need to switch the role of successes and failures.

Binomial Example: Bill is the star player on his basketball team.

Over his career, his free throw percentage is 75%. However, his three-point shot percentage is only 20%.◦ If he tries 5 three-point shots, what is the probability

he will make 2?

◦ If he tries 10 free throws, what is the probability he will make 7?

◦ If he tries 10 free throws again, what is the probability he makes at least 7 free throws?

Optional: Binomial formula

!

( ) (1 )! !

where ! ( 1) ... 2 1

and 0! 1

k n knP X k p p

k n k

n n n

With SAS (Optional)

Need to create a dataset with variable names for the probabilities you want

For example, probbnml(p,n,k) will give you the probability less than or equal to k successes. This is considered a variable, we need to name it, such as…

prob_less_than_or_equal_to_k = probbnml(p,n,k); What if we want greater than?

prob_greater_than = 1 – probbnml(p,n,k);

What if we want equal to?prob_equal = probbnml(p,n,k) – probbnml(p,n,k-

1);

SAS Coding (Optional)

Calculate probabilities for binomial distribution: B(n,p) data binomial;

p=0.25;n=10;k=4;prob_less_than_or_equal_to_k = probbnml(p,n,k);prob_greater_than = 1 - probbnml(p,n,k);prob_equal = probbnml(p,n,k) - probbnml(p,n,k-1);

run;proc print data=binomial;run;

SAS output (Optional)

Binomial Example

prob_less_ prob_ than_or_ greater_ prob_Obs p n k equal_to_k than equal

1 0.25 10 4 0.92187 0.078127 0.14600

Binomial mean and standard deviation: If X has binomial distribution B(n,p) then

(1 )

X

X

np

np p

Example

For 10 tosses of a fair coin, let X = number of heads◦ What is the distribution of X?

◦ Mean of X =

◦ Standard Deviation of X =

Furthermore Let us take a binomial situation…

◦ We have many bags with 20 switches in each bag◦ The probability that each individual switch is bad is 0.5

So, the number of bad switches in each bag is a Binomial distribution with n = 20 and p = 0.5◦ B(20,0.5)

What if we look at how many switches are bad in many different bags?...draw a histogram!

This gives us a histogram of the B(20,0.5)!What do you notice about the shape? Hmm…

21

Normal Approximation for Binomial DistributionsAs n gets larger, something interesting happens to the shape of a binomial distribution.

Suppose that X has the binomial distribution with n trials and success probability p. When n is large, the distribution of X is approximately Normal with mean and standard deviation

As a rule of thumb, we will use the Normal approximation when n is so large that np ≥ 10 and n(1 – p) ≥ 10.

Normal Approximation for Binomial Distributions

X np

What about sample proportion?

The sample proportion relates directly to the count X:

Counts or X:

Propotions or p-hat:

ˆX

pn

is approximately ( , (1 ) )

(1 )ˆ is approximately ( , )

X N np np p

p pp N p

n

is approximately ( , (1 ) )

(1 )ˆ is approximately ( , )

X N np np p

p pp N p

n

Remember this concept?

Example: In 2001, Barry Bonds hit 73 home runs. Was

this feat as surprising as most of us thought? In the prior two seasons, Bonds hit a home run in 10% of his times at bat. If he went to bat 476 times in 2001, what is the probability that he hits 73 or more home runs just by chance? (Solve in terms of both X and p-hat.) Is it appropriate to use the normal approximation for this problem?

(The real probability from the Binomial is 0.0001)

Examples: What is the probability that the percentage of

heads in 100 tosses is between 40% and 60%?

Assume that exactly 60% of population does not like shopping. What is the chance of obtaining sample proportion larger than 0.65 for sample size=2500?

Summary

Sampling distribution of sample counts and proportions

Evaluating the Binomial Probabilities Using the approximate sample distribution to

assess certain probabilities The probabilities evaluated using the normal

distribution are not exact, but approximations

5.1 The Sampling Distribution of a Sample Mean Population distribution vs. sampling distribution The mean and standard deviation of the sample mean Sampling distribution of a sample mean Central limit theorem

27

Examples: TOP—Distribution of individual stocks vs. BOTTOM—Distribution of portfolios (several stocks combined)

Connection

Because portfolios usually contain many individual stocks, when we look at the return of portfolios, we are looking at the return of the sum (or average) of many individual stocks

What happens to the distribution of the portfolios? Let’s look again…

Sample mean

Given an SRS of size n, we observe n values X1, X2,…, Xn, of a quantitative random variable

The sample mean of the SRS is:

1 2

1x= ( ... )

n nX X X

Assume the population has mean µ and standard deviation σ.

Then if the observations are independent, the sample mean, x_bar , has population mean and standard deviation given as follows:

Mean and Standard deviation of sample mean

x xn

x x

n

The height in inches of a randomly chosen young woman is N(64.5, 2.5)

What is the mean and standard deviation of the average of 100 randomly chosen young women?◦ Think in terms of stocks and portfolios◦ What will the normal distribution above do?

Example

The mean stays the same, the variation decreases

Sampling distribution of the sample mean If the variable X in the population is N(µ,σ) then

Kicker: This is often a good approximation even if the original distribution is not normal.

This is a HUGE result, called the Central Limit Theorem (or CLT).

It says if we start with ANY distribution, the sample mean will be normally distributed.

x is ( , )Nn

Take 100 randomly chosen young women and measure their height. What is the chance that the average height of these 100 women is between 64 and 65 inches?

Example cont.

The mean time for maintenance of an air conditioner is 60 minutes, with a standard deviation of 60 minutes.

What is the probability that average maintenance time of 70 air conditioners will exceed 50 minutes?◦ Note, we didn’t say the time for maintenance is

normally distributed. In fact, it follows an exponential distribution.

Example 2

Visually: Distributions of average time of servicing n=1, 2, 10, and 25 air conditioners

…and for 70 air conditioners

Remember this concept again? (Approximate) normality of sample mean

A particular application

If you know n, then the distributions for the sum and average are equivalent (if you know one, you know the other).◦ So since has a normal distribution, then sums are also

normally distributed! A count (think binomial) is just a sum!◦ We are just adding up individual observations, of course that is

a sum and hence normal!◦ So of course counts are normal!◦ Similarly proportions function like averages, and are also

normally distributed! The CLT is the key.

x

The Central Limit Theorem:

x is approximately ( , )Nn

This is our (familiar) approximate normality. Important assumptions:

◦ SRS (Simple Random Sample)◦ Population distribution of X has mean µ and standard deviation σ; ◦ Last but not least, n needs to be “large enough”. Remember the air

conditioning example. Generally, we say n ≥ 30 is “large enough”. Warning: not all interesting distributions are normal

◦ But, the sample means are always roughly normal for large sample sizes.

Summary Approximate normal distribution of the sample

mean from a SRS. CLT holds for ANY population distribution. Also, if in fact the underlying population

distribution is exact (in some cases it is), then the result is also exact, not an approximation.

Use the CLT to evaluate probabilities regarding averages.

Summary of Chapter 5

How do you tell the “X-bar” problems apart from section 1.3 “X” problems? ◦ Section 1.3 “X” problems have a sample size of 1 (n = 1). ◦ Section 5.2 “X-bar” problems have a sample size bigger than 1.

Type Mean Standard Deviation

Individual, x x x

Sample mean, x x x xx

n

Sample proportion, p̂ p̂ p ˆ

(1 )p

p p

n

We flip cards from a stack of cards containing 10 normal decks of cards and count each time we flip an “Ace” as a success. ◦ What is the population proportion of success?◦ If we only did 50 cards as a sample, and 4 aces

were flipped, what is the sample proportion of success?

◦ If we did repeated samples of size n = 50, what is the mean and standard deviation of the sample proportion?

Extra Example 1

Bob is playing in the club golf tournament. Bob’s scores vary as he plays the course repeatedly and has a N(77,3) distribution.◦ What is the probability that Bob will shoot a 74 or

lower in the first round of the club tournament?◦ What is the probability that Bob will average 74 or

lower for the 4 rounds of the club tournament?

Extra Example 2