Chapter 18 – Sampling Distribution Models
description
Transcript of Chapter 18 – Sampling Distribution Models
Chapter 18 – Sampling Distribution Models
How accurate is our sample?
Sometimes different polls show different results for the same question.
Since each poll samples a different group of people, we should expect some variation in the results.
We could try drawing lots of samples and looking at the variation amongst those samples.
Experiment: Simulating a sample
A recent US Census Bureau study (source) reports that about 30% of Americans 25 or older have a Bachelor’s degree.
Open up a blank Minitab worksheet and let’s generate some random data: Calc > Random Data > Bernoulli Enter 200 rows Store in Column C1-C20 Event Probability: .3
Proportion estimates for samples of size 5
We can treat each row as a sample and calculate the proportion of each sample using the mean.
Samples of size 5: Calc > Row Statistics > Mean
Input Variables: C1 – C5 Store result in: C21
Look at these sample proportions. Are they close to the population proportion of 30%?
Draw a histogram of the sample proportions in C21
Proportion estimates for samples of size 10
Samples of size 10: Calc > Row Statistics > Mean
Input Variables: C1 – C10 Store result in: C22
Look at these sample proportions. Are they close to the population proportion of 30%?
Draw a histogram of the sample proportions in C22
Proportion estimates for samples of size 20
Samples of size 10: Calc > Row Statistics > Mean
Input Variables: C1 – C20 Store result in: C23
Look at these sample proportions. Are they close to the population proportion of 30%?
Draw a histogram of the sample proportions in C23
Sampling Distribution Model for a Proportion
Our histogram of the sample proportions started to look like a Normal model
The larger our sample size gets, the better the Normal model works
Assumptions: Independence: sampled values must be independent
of each other Sample Size: n must be large enough
Conditions to check for assumptions
Randomization Condition: Experiments should have treatments randomly assigned Survey samples should be a simple random sample or
representative, unbiased sample otherwise
10% Condition: Sample size n must be no more than 10% of population
Success/Failure Condition: Sample size needs to be large enough to expect at least
10 successes and 10 failures
Sampling Distribution Model for a Proportion
If the sampled values are independent and the sample size is large enough,
The sampling distribution model of is modeled by a Normal model with:
p
( )p p ( ) pqSD pn
Example: Proportion of Vegetarians
7% of the US population is estimated to be vegetarian. If a random sample of 200 people resulted in 20 people reporting themselves as vegetarians, is this an unusually high proportion?
Conditions: Randomization 10% condition Success/Failure
Vegetarians Example continued
Since our conditions were met, it’s ok to use a Normal model.
= 20/200 = .10
E( ) = p = .07
z = This result is within 2 sd’s of mean, so not unusual
pp
(.07)(.93)( ) .018200
pqSD pn
.10 .07 1.67.018
68-95-99.7 Rule with Vegetarians
p 1σ
2σ
3σ
-3σ
-2σ
-1σ
68%
95%
98%
Sampling Distribution of a Mean
Rolling dice simulation10,000 individual rolls recorded
Figure from DeVeaux, Intro to Stats
Sampling Distribution of a Mean
Roll 2 dice 10,000 times, average dice
Figure from DeVeaux, Intro to Stats
Sampling Distribution of a Mean
Rolling 3 dice 10,000 times and averaging dice
Figure from DeVeaux, Intro to Stats
Sampling Distribution of a Mean
Rolling 5 dice 10,000 times and averaging
Figure from DeVeaux, Intro to Stats
Sampling Distribution of a Mean
Rolling 20 dice 10,000 times and averaging
Once again, as sample size increases, Normal model appears
Figure from DeVeaux, Intro to Stats
Central Limit Theorem
The sampling distribution of any mean becomes more nearly Normal as the sample size grows. The larger the sample, the better the approximation
will be
Observations need to be independent and collected with randomization.
CLT Assumptions
Assumptions: Independence: sampled values must be independent Sample Size: sample size must be large enough
Conditions: Randomization 10% Condition Large enough sample
Which Normal Model to use?
The Normal Model depends on a mean and sd
Sampling Distribution Model for a Mean
When a random sample is drawn from any population with mean µ and standard deviation σ, its sample mean y has a sampling distribution with:
Mean: µ Standard Deviation:
n
Example: CEO compensation
800 CEO’sMean (in thousands) = 10,307.31SD (in thousands) = 17,964.62
Samples of size 50 were drawn with:Mean = 10,343.93 SD = 2,483.84
Samples of size 100 were drawn with:Mean = 10,329.94 SD = 1,779.18
According to CLT, what should theoretical mean and sd be?
Example from DeVeaux, Intro to Stats