Binomial distributions for sample counts Binomial distributions in statistical sampling Finding...
of 45
/45

Author
charlotteanderson 
Category
Documents

view
222 
download
2
Embed Size (px)
Transcript of Binomial distributions for sample counts Binomial distributions in statistical sampling Finding...
 Slide 1
 Binomial distributions for sample counts Binomial distributions in statistical sampling Finding binomial probabilities Binomial mean and standard deviation Sample proportions Normal approximation for counts and proportions Binomial formula 1
 Slide 2
 2 Sampling Distributions The law of large numbers assures us that if we measure enough subjects, the statistic xbar will eventually get very close to the unknown parameter . If we took every one of the possible samples of a certain size, calculated the sample mean for each, and graphed all of those values, wed have a sampling distribution. The population distribution of a variable is the distribution of values of the variable among all individuals in the population. The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. The population distribution of a variable is the distribution of values of the variable among all individuals in the population. The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.
 Slide 3
 Example: n = 2500 adults asked whether shopping is frustrating n is the number of trials X = 1650 answered Yes X is the number of successes phat = X/n = 0.66 is the sample proportion (of successes) Need to make sure we distinguish between the count and the sample proportion
 Slide 4
 1.Each observation falls in just two categories: Success/Failure Heads/Tails Yes/No 2.All observations are independent 3.Fixed number of trials, n 4.The probability of success, p, is the same in each trial The distribution of the (total) count of successes in this binomial setting is: Binomial distribution denoted B(n,p)
 Slide 5
 Toss a fair coin 10 times and count the number X of heads Binomial or not? What about a biased coin? Deal 10 cards from a shuffled deck of 52. X is the number of spades. Binomial? Suggestions? Number of girls born among first 100 children in a (large) hospital this year Number of girls born in this hospital so far this year
 Slide 6
 SRS is not quite a Binomial setting Why? Check the 4 properties! However, if the population is 10 times larger than our sample n, then the number of successes in the sample is approximately Binomial. We say B(n,p) Here p is the population success rate usually unknown
 Slide 7
 We will just use table C For given n and p, table gives the probability for k successes Table only gives p s of 0.5 or less If you have a p greater than 0.5, you need to switch the role of successes and failures.
 Slide 8
 Slide 9
 Slide 10
 Slide 11
 Slide 12
 Bill is the star player on his basketball team. Over his career, his free throw percentage is 75%. However, his threepoint shot percentage is only 20%. If he tries 5 threepoint shots, what is the probability he will make 2? If he tries 10 free throws, what is the probability he will make 7? If he tries 10 free throws again, what is the probability he makes at least 7 free throws?
 Slide 13
 Slide 14
 Need to create a dataset with variable names for the probabilities you want For example, probbnml(p,n,k) will give you the probability less than or equal to k successes. This is considered a variable, we need to name it, such as prob_less_than_or_equal_to_k = probbnml(p,n,k); What if we want greater than? prob_greater_than = 1 probbnml(p,n,k); What if we want equal to? prob_equal = probbnml(p,n,k) probbnml(p,n,k1);
 Slide 15
 Calculate probabilities for binomial distribution: B(n,p) data binomial; p=0.25; n=10; k=4; prob_less_than_or_equal_to_k = probbnml(p,n,k); prob_greater_than = 1  probbnml(p,n,k); prob_equal = probbnml(p,n,k)  probbnml(p,n,k1); run; proc print data=binomial; run;
 Slide 16
 Binomial Example prob_less_ prob_ than_or_ greater_ prob_ Obs p n k equal_to_k than equal 1 0.25 10 4 0.92187 0.078127 0.14600
 Slide 17
 If X has binomial distribution B(n,p) then
 Slide 18
 For 10 tosses of a fair coin, let X = number of heads What is the distribution of X? Mean of X = Standard Deviation of X =
 Slide 19
 Let us take a binomial situation We have many bags with 20 switches in each bag The probability that each individual switch is bad is 0.5 So, the number of bad switches in each bag is a Binomial distribution with n = 20 and p = 0.5 B(20,0.5) What if we look at how many switches are bad in many different bags?...draw a histogram!
 Slide 20
 Slide 21
 21 Normal Approximation for Binomial Distributions As n gets larger, something interesting happens to the shape of a binomial distribution. Suppose that X has the binomial distribution with n trials and success probability p. When n is large, the distribution of X is approximately Normal with mean and standard deviation As a rule of thumb, we will use the Normal approximation when n is so large that np 10 and n(1 p) 10. Suppose that X has the binomial distribution with n trials and success probability p. When n is large, the distribution of X is approximately Normal with mean and standard deviation As a rule of thumb, we will use the Normal approximation when n is so large that np 10 and n(1 p) 10. Normal Approximation for Binomial Distributions
 Slide 22
 The sample proportion relates directly to the count X: Counts or X: Propotions or phat:
 Slide 23
 Slide 24
 In 2001, Barry Bonds hit 73 home runs. Was this feat as surprising as most of us thought? In the prior two seasons, Bonds hit a home run in 10% of his times at bat. If he went to bat 476 times in 2001, what is the probability that he hits 73 or more home runs just by chance? (Solve in terms of both X and phat.) Is it appropriate to use the normal approximation for this problem? (The real probability from the Binomial is 0.0001)
 Slide 25
 What is the probability that the percentage of heads in 100 tosses is between 40% and 60%? Assume that exactly 60% of population does not like shopping. What is the chance of obtaining sample proportion larger than 0.65 for sample size=2500?
 Slide 26
 Sampling distribution of sample counts and proportions Evaluating the Binomial Probabilities Using the approximate sample distribution to assess certain probabilities The probabilities evaluated using the normal distribution are not exact, but approximations
 Slide 27
 Population distribution vs. sampling distribution The mean and standard deviation of the sample mean Sampling distribution of a sample mean Central limit theorem 27
 Slide 28
 Slide 29
 Because portfolios usually contain many individual stocks, when we look at the return of portfolios, we are looking at the return of the sum (or average) of many individual stocks What happens to the distribution of the portfolios? Let s look again
 Slide 30
 Given an SRS of size n, we observe n values X 1, X 2,, X n, of a quantitative random variable The sample mean of the SRS is:
 Slide 31
 Assume the population has mean and standard deviation . Then if the observations are independent, the sample mean, x_bar, has population mean and standard deviation given as follows:
 Slide 32
 The height in inches of a randomly chosen young woman is N(64.5, 2.5) What is the mean and standard deviation of the average of 100 randomly chosen young women? Think in terms of stocks and portfolios What will the normal distribution above do?
 Slide 33
 Slide 34
 If the variable X in the population is N(,) then Kicker: This is often a good approximation even if the original distribution is not normal. This is a HUGE result, called the Central Limit Theorem (or CLT). It says if we start with ANY distribution, the sample mean will be normally distributed.
 Slide 35
 Take 100 randomly chosen young women and measure their height. What is the chance that the average height of these 100 women is between 64 and 65 inches?
 Slide 36
 The mean time for maintenance of an air conditioner is 60 minutes, with a standard deviation of 60 minutes. What is the probability that average maintenance time of 70 air conditioners will exceed 50 minutes? Note, we didn t say the time for maintenance is normally distributed. In fact, it follows an exponential distribution.
 Slide 37
 Slide 38
 Slide 39
 Slide 40
 If you know n, then the distributions for the sum and average are equivalent (if you know one, you know the other). So since has a normal distribution, then sums are also normally distributed! A count (think binomial) is just a sum! We are just adding up individual observations, of course that is a sum and hence normal! So of course counts are normal! Similarly proportions function like averages, and are also normally distributed! The CLT is the key.
 Slide 41
 This is our (familiar) approximate normality. Important assumptions: SRS (Simple Random Sample) Population distribution of X has mean and standard deviation ; Last but not least, n needs to be large enough. Remember the air conditioning example. Generally, we say n 30 is large enough. Warning: not all interesting distributions are normal But, the sample means are always roughly normal for large sample sizes.
 Slide 42
 Approximate normal distribution of the sample mean from a SRS. CLT holds for ANY population distribution. Also, if in fact the underlying population distribution is exact (in some cases it is), then the result is also exact, not an approximation. Use the CLT to evaluate probabilities regarding averages.
 Slide 43
 How do you tell the Xbar problems apart from section 1.3 X problems? Section 1.3 X problems have a sample size of 1 (n = 1). Section 5.2 Xbar problems have a sample size bigger than 1.
 Slide 44
 We flip cards from a stack of cards containing 10 normal decks of cards and count each time we flip an Ace as a success. What is the population proportion of success? If we only did 50 cards as a sample, and 4 aces were flipped, what is the sample proportion of success? If we did repeated samples of size n = 50, what is the mean and standard deviation of the sample proportion?
 Slide 45
 Bob is playing in the club golf tournament. Bob s scores vary as he plays the course repeatedly and has a N(77,3) distribution. What is the probability that Bob will shoot a 74 or lower in the first round of the club tournament? What is the probability that Bob will average 74 or lower for the 4 rounds of the club tournament?