Chapter 18 – Sampling Distribution Models

How accurate is our sample?

Sometimes different polls show different results for the same question.

Since each poll samples a different group of people, we should expect some variation in the results.

We could try drawing lots of samples and looking at the variation amongst those samples.

Experiment: Simulating a sample

A recent US Census Bureau study (source) reports that about 30% of Americans 25 or older have a Bachelor’s degree.

Open up a blank Minitab worksheet and let’s generate some random data: Calc > Random Data > Bernoulli Enter 200 rows Store in Column C1-C20 Event Probability: .3

http://en.wikipedia.org/wiki/Educational_attainment_in_the_United_States

Proportion estimates for samples of size 5

We can treat each row as a sample and calculate the proportion of each sample using the mean.

Samples of size 5: Calc > Row Statistics > Mean

Input Variables: C1 – C5 Store result in: C21

Look at these sample proportions. Are they close to the population proportion of 30%?

Draw a histogram of the sample proportions in C21

Sampling Distribution Model for a Proportion

Our histogram of the sample proportions started to look like a Normal model

The larger our sample size gets, the better the Normal model works

Assumptions: Independence: sampled values must be independent

of each other Sample Size: n must be large enough

Conditions to check for assumptions

Randomization Condition: Experiments should have treatments randomly assigned Survey samples should be a simple random sample or

representative, unbiased sample otherwise

10% Condition: Sample size n must be no more than 10% of population

Success/Failure Condition: Sample size needs to be large enough to expect at least

10 successes and 10 failures

Sampling Distribution Model for a Proportion

If the sampled values are independent and the sample size is large enough,

The sampling distribution model of is modeled by a Normal model with:

p

( )p p ( ) pqSD pn

Example: Proportion of Vegetarians

7% of the US population is estimated to be vegetarian. If a random sample of 200 people resulted in 20 people reporting themselves as vegetarians, is this an unusually high proportion?

Conditions: Randomization 10% condition Success/Failure

Vegetarians Example continued

Since our conditions were met, it’s ok to use a Normal model.

= 20/200 = .10

E( ) = p = .07

z = This result is within 2 sd’s of mean, so not unusual

pp

(.07)(.93)( ) .018200

pqSD pn

.10 .07 1.67.018

68-95-99.7 Rule with Vegetarians

p 1σ

2σ

3σ

-3σ

-2σ

-1σ

68%

95%

98%

Sampling Distribution of a Mean

Rolling dice simulation10,000 individual rolls recorded

Figure from DeVeaux, Intro to Stats


Roll 2 dice 10,000 times, average dice



Rolling 3 dice 10,000 times and averaging dice



Rolling 5 dice 10,000 times and averaging



Rolling 20 dice 10,000 times and averaging

Once again, as sample size increases, Normal model appears


Central Limit Theorem

The sampling distribution of any mean becomes more nearly Normal as the sample size grows. The larger the sample, the better the approximation

will be

Observations need to be independent and collected with randomization.

CLT Assumptions

Assumptions: Independence: sampled values must be independent Sample Size: sample size must be large enough

Conditions: Randomization 10% Condition Large enough sample

Which Normal Model to use?

The Normal Model depends on a mean and sd

Sampling Distribution Model for a Mean

When a random sample is drawn from any population with mean µ and standard deviation σ, its sample mean y has a sampling distribution with:

Mean: µ Standard Deviation:

n

Example: CEO compensation

800 CEO’sMean (in thousands) = 10,307.31SD (in thousands) = 17,964.62

Samples of size 50 were drawn with:Mean = 10,343.93 SD = 2,483.84

Samples of size 100 were drawn with:Mean = 10,329.94 SD = 1,779.18

According to CLT, what should theoretical mean and sd be?

Example from DeVeaux, Intro to Stats

Chapter 18 – Sampling Distribution Models

Documents

Transcript of Chapter 18 – Sampling Distribution Models