Sampling Distribution of a Sample Proportion Lecture 26 Sections 8.1 – 8.2 Wed, Mar 8, 2006.
-
Upload
quinn-mellin -
Category
Documents
-
view
222 -
download
2
Transcript of Sampling Distribution of a Sample Proportion Lecture 26 Sections 8.1 – 8.2 Wed, Mar 8, 2006.
Sampling Sampling Distribution of a Distribution of a
Sample Sample ProportionProportionLecture 26Lecture 26
Sections 8.1 – 8.2Sections 8.1 – 8.2
Wed, Mar 8, 2006Wed, Mar 8, 2006
Preview of the Central Limit Preview of the Central Limit TheoremTheorem
We looked at the distribution of the We looked at the distribution of the sumsum of 1, 2, and 3 uniform random of 1, 2, and 3 uniform random variables variables UU(0, 1).(0, 1).
We saw that the shapes of their We saw that the shapes of their distributions was moving towards the distributions was moving towards the shape of the normal distribution.shape of the normal distribution.
If we replace “sum” with “average,” If we replace “sum” with “average,” we will obtain the same phenomenon, we will obtain the same phenomenon, but on the scale from 0 to 1 each time.but on the scale from 0 to 1 each time.
Preview of the Central Limit Preview of the Central Limit TheoremTheorem
0 1
1
2
Preview of the Central Limit Preview of the Central Limit TheoremTheorem
0 1
1
2
Preview of the Central Limit Preview of the Central Limit TheoremTheorem
0 1
1
2
Preview of the Central Limit Preview of the Central Limit TheoremTheorem
Some observations:Some observations: Each distribution is centered at the Each distribution is centered at the
same place, ½.same place, ½. The distributions are being “drawn in” The distributions are being “drawn in”
towards the center.towards the center. That means that their standard That means that their standard
deviation is decreasing.deviation is decreasing. Can we quantify this?Can we quantify this?
Preview of the Central Limit Preview of the Central Limit TheoremTheorem
0 1
1
2 = ½2 = 1/12
Preview of the Central Limit Preview of the Central Limit TheoremTheorem
0 1
1
2 = ½2 = 1/24
Preview of the Central Limit Preview of the Central Limit TheoremTheorem
0 1
1
2 = ½2 = 1/36
Preview of the Central Limit Preview of the Central Limit TheoremTheorem
This tells us that a mean based on This tells us that a mean based on three observations is much more three observations is much more likely to be close to the population likely to be close to the population mean than is a mean based on only mean than is a mean based on only one or two observations.one or two observations.
Parameters and Parameters and Statistics Statistics
THE PURPOSE OF A STATISTIC IS THE PURPOSE OF A STATISTIC IS TO ESTIMATE A POPULATION TO ESTIMATE A POPULATION PARAMETER.PARAMETER. A sample mean is used to estimate the A sample mean is used to estimate the
population mean.population mean. A sample proportion is used to estimate A sample proportion is used to estimate
the population proportion.the population proportion. Sample statistics, by their very Sample statistics, by their very
nature, are variable.nature, are variable. Population parameters are fixed.Population parameters are fixed.
Some QuestionsSome Questions
We hope that the sample proportion We hope that the sample proportion is close to the population proportion.is close to the population proportion.
How close can we expect it to be?How close can we expect it to be? Would it be worth it to collect a Would it be worth it to collect a
larger sample?larger sample? If the sample were larger, would we If the sample were larger, would we
expect the sample proportion to be expect the sample proportion to be closer to the population proportion?closer to the population proportion?
How much closer?How much closer?
The Sampling The Sampling Distribution of a StatisticDistribution of a Statistic Sampling Distribution of a StatisticSampling Distribution of a Statistic – –
The distribution of values of the The distribution of values of the statistic over all possible samples of statistic over all possible samples of size size nn from that population. from that population.
The Sample ProportionThe Sample Proportion Let Let pp be the population proportion. be the population proportion. Then Then pp is a fixed value (for a given is a fixed value (for a given
population).population). Let Let pp^̂ (“ (“pp-hat”) be the sample proportion.-hat”) be the sample proportion. Then Then pp^̂ is a random variable; it takes on is a random variable; it takes on
a new value every time a sample is a new value every time a sample is collected.collected.
The sampling distribution of The sampling distribution of pp^̂ is the is the probability distribution of all the possible probability distribution of all the possible values of values of pp^̂..
ExampleExample
Suppose that this class is 3/4 Suppose that this class is 3/4 freshmen.freshmen.
Suppose that we take a sample of 2 Suppose that we take a sample of 2 students, selected students, selected with replacementwith replacement..
Find the sampling distribution of Find the sampling distribution of pp^̂..
ExampleExample
F
N
F
N
F
N
3/4
1/4
3/4
1/4
3/4
1/4
P(FF) = 9/16
P(FN) = 3/16
P(NF) = 3/16
P(NN) = 1/16
ExampleExample
Let Let XX be the be the numbernumber of freshmen in of freshmen in the sample.the sample.
The probability distribution of The probability distribution of XX is is
xx PP((xx))
00 1/161/16
11 6/166/16
22 9/169/16
ExampleExample
Let Let pp^̂ be the be the proportionproportion of freshmen of freshmen in the sample. (in the sample. (pp^̂ = = XX//nn.).)
The sampling distribution of The sampling distribution of pp^̂ is is
xx PP((pp^̂ = = xx))
00 1/161/16
1/21/2 6/166/16
11 9/169/16
Samples of Size Samples of Size nn = 3 = 3
If we sample 3 people (with If we sample 3 people (with replacement) from a population that replacement) from a population that is 3/4 freshmen, then the proportion is 3/4 freshmen, then the proportion of freshmen in the sample has the of freshmen in the sample has the following distribution.following distribution.
xx PP((pp^̂ = = xx))
00 1/64 1/64 = .02= .02
1/31/3 9/64 9/64 = .14= .14
2/32/3 27/64 27/64 = .42= .42
11 27/64 27/64 = .42= .42
Samples of Size Samples of Size nn = 4 = 4
If we sample 4 people (with If we sample 4 people (with replacement) from a population that replacement) from a population that is 3/4 freshmen, then the proportion is 3/4 freshmen, then the proportion of freshmen in the sample has the of freshmen in the sample has the following distribution.following distribution.xx PP((pp^̂ = = xx))
00 1/256 1/256 = .004= .004
1/41/4 12/256 12/256 = .05= .05
2/42/4 54/256 54/256 = .21= .21
3/43/4 108/256 108/256 = .42= .42
11 81/256 81/256 = .32= .32
The Parameters of the The Parameters of the Sampling DistributionsSampling Distributions
When When nn = 1, the sampling distribution = 1, the sampling distribution isis
The mean and standard deviation areThe mean and standard deviation are = 3/4 = 0.75= 3/4 = 0.75 22 = 3/16 = 0.1875 = 3/16 = 0.1875
pp^̂ PP((pp^̂))
00 1/41/4
11 3/43/4
The Parameters of the The Parameters of the Sampling DistributionsSampling Distributions
When When nn = 2, the sampling distribution = 2, the sampling distribution isis
The mean and standard deviation areThe mean and standard deviation are = 3/4 = 0.75= 3/4 = 0.75 22 = 3/32 = 0.09375 = 3/32 = 0.09375
pp^̂ PP((pp^̂))
00 1/161/16
1/21/2 6/166/16
11 9/169/16
The Parameters of the The Parameters of the Sampling DistributionsSampling Distributions
When When nn = 3, the sampling distribution = 3, the sampling distribution isis
The mean and standard deviation areThe mean and standard deviation are = 3/4 = 0.75= 3/4 = 0.75 22 = 3/48 = 0.0625 = 3/48 = 0.0625
pp^̂ PP((pp^̂))
00 1/64 = .021/64 = .02
1/31/3 9/64 = .149/64 = .14
2/32/3 27/64 = .4227/64 = .42
11 27/64 = .4227/64 = .42
The Parameters of the The Parameters of the Sampling DistributionsSampling Distributions
When When nn = 4, the sampling distribution = 4, the sampling distribution isis
The mean and standard deviation areThe mean and standard deviation are = 3/4 = 0.75= 3/4 = 0.75 22 = 3/64 = 0.046875 = 3/64 = 0.046875
pp^̂ PP((pp^̂))
00 1/256 = .0041/256 = .004
1/41/4 12/256 = .0512/256 = .05
2/42/4 54/256 = .2154/256 = .21
3/43/4 108/256 = .42108/256 = .42
11 81/256 = .3281/256 = .32
Sampling DistributionsSampling Distributions
Run the program Run the program
Central Limit Theorem for Central Limit Theorem for Proportions.exeProportions.exe..
Use Use nn = 30 and = 30 and pp = 0.75; generate = 0.75; generate 100 samples.100 samples.
100 Samples of Size 100 Samples of Size nn = = 3030
= 0.75
= 0.079
Observations and Observations and ConclusionsConclusions
Observation #1: The values of Observation #1: The values of pp^̂ are are clustered around clustered around pp..
Conclusion #1: Conclusion #1: pp^̂ is probably close is probably close to to pp..
Larger Sample SizeLarger Sample Size
Now we will select 100 samples of Now we will select 100 samples of size 120 instead of size 30.size 120 instead of size 30.
Run the program Run the program
Central Limit Theorem for Central Limit Theorem for Proportions.exeProportions.exe..
Pay attention to the Pay attention to the spreadspread (standard deviation) of the (standard deviation) of the distribution.distribution.
100 Samples of Size 100 Samples of Size nn = = 120120
= 0.75
= 0.0395
Observations and Observations and ConclusionsConclusions
Observation #2: As the sample size Observation #2: As the sample size increases, the clustering is tighter.increases, the clustering is tighter.
Conclusion #2A: Larger samples Conclusion #2A: Larger samples give more reliable estimates.give more reliable estimates.
Conclusion #2B: For sample sizes Conclusion #2B: For sample sizes that are large enough, we can make that are large enough, we can make very good estimates of the value of very good estimates of the value of pp..
Larger Sample SizeLarger Sample Size
Now we will select 10000 samples of Now we will select 10000 samples of size 120 instead of only 100 samples.size 120 instead of only 100 samples.
Run the program Run the program
Central Limit Theorem for Central Limit Theorem for Proportions.exeProportions.exe..
Pay attention to the Pay attention to the shapeshape of the of the distribution.distribution.
10,000 Samples of Size 10,000 Samples of Size nn = 120= 120
= 0.75
= 0.0395
10,000 Samples of Size 10,000 Samples of Size nn = 126= 126
More Observations and More Observations and ConclusionsConclusions
Observation #3: The distribution of Observation #3: The distribution of pp^̂ appears to be approximately appears to be approximately normal.normal.
One More ConclusionOne More Conclusion
Conclusion #3: We can use the Conclusion #3: We can use the normal distribution to calculate just normal distribution to calculate just how close to how close to pp we can expect we can expect pp^̂ to to be.be.
However, we must know the values However, we must know the values of of and and for the distribution of for the distribution of pp^̂..
That is, we have to That is, we have to quantifyquantify the the sampling distribution of sampling distribution of pp^̂..
The Sampling The Sampling Distribution of Distribution of pp^̂
It turns out that the sampling It turns out that the sampling distribution of distribution of pp^̂ is approximately is approximately normal with the following parameters.normal with the following parameters.
This is the This is the Central Limit Theorem for Central Limit Theorem for ProportionsProportions, summarized on page 519., summarized on page 519.
n
ppp
n
ppp
pp
1ˆ ofdeviation Standard
1ˆ of Variance
ˆ ofMean
The approximation to the normal The approximation to the normal distribution is excellent ifdistribution is excellent if
The Sampling The Sampling Distribution of Distribution of pp^̂
.51 and 5 pnnp
Why Surveys WorkWhy Surveys Work
Suppose 51% of the population plan Suppose 51% of the population plan to vote for candidate to vote for candidate XX, i.e., , i.e., pp = = 0.51.0.51.
What is the probability that an exit What is the probability that an exit survey of 1000 people would show survey of 1000 people would show candidate candidate XX with less than 45% with less than 45% support, i.e., support, i.e., pp^̂ < .45? < .45?
Why Surveys WorkWhy Surveys Work
First, describe the sampling First, describe the sampling distribution of distribution of pp^̂ if the sample size is if the sample size is nn = 1000 and = 1000 and pp = 0.51. = 0.51. Check: Check: npnp = 510 = 510 5 and 5 and nn(1 – (1 – pp) = 490 ) = 490
5. 5. pp^̂ is approximately normal. is approximately normal.
01581.0
1000
49.051.0
51.0
ˆ
ˆ
p
p
Why Surveys WorkWhy Surveys Work
The The zz-score of 0.45 is -score of 0.45 is zz = (0.45 – = (0.45 – 0.51)/.01581 0.51)/.01581 = -3.795.= -3.795.
PP((pp^̂ < 0.45) = < 0.45) = PP((ZZ < -3.795) < -3.795)
= 0.00007385 (not likely!)= 0.00007385 (not likely!) Or use normalcdf(-E99, 0.45, 0.51, Or use normalcdf(-E99, 0.45, 0.51,
0.01581).0.01581).
Why Surveys WorkWhy Surveys Work
Perform the same calculation, but with a Perform the same calculation, but with a smaller sample size, say smaller sample size, say nn = 50. = 50.
The probability turns out to be 0.1980, The probability turns out to be 0.1980, nearly a 20% chance.nearly a 20% chance.
By symmetry, there is also a 20% chance By symmetry, there is also a 20% chance that the sample proportion is greater than that the sample proportion is greater than 57%.57%.
Thus, there is a Thus, there is a 40% chance40% chance that the that the sample proportion is off by at least 6 sample proportion is off by at least 6 percentage points.percentage points.