Chapter 9

Chapter 9

Estimating a Population Proportion

Created by Kathy Fritz

Selecting an Estimator

What makes a statistic a good estimator of a population characteristic?

1. Choose a statistic that is unbiasedUnbiased, since the distribution is centered at the true value

Biased, since the distribution is

NOT centered at the true value

Unbiased, since the distribution is centered at the true value

A statistic with a sampling distribution that is centered at the actual value of the

population characteristic is an unbiased estimator of that population characteristic.

In other words, a statistic that does not consistently tend to underestimate or to overestimate the value of a population

characteristic is an unbiased estimator of that characteristic.

1. Choose a statistic that is unbiased2. Choose a statistic with a small standard

error

Unbiased, but has a larger standard

error so it is not as precise.

Unbiased, but has a smaller standard error so it is more

precise.

What makes a statistic a good estimator of a population characteristic?

A statistic that is unbiased and has a small standard error is likely to result in an

estimate that is close to the actual value of that population characteristic.

If a sampling distribution is centered very close to the actual value of the population characteristic, a

small standard error ensures that values of the statistic will cluster tightly around the actual value

of the population characteristic.

The standard deviation of a sampling distribution is called the standard error.

In a review of ALL criminal cases heard by the Supreme Courts of 11 states from 2000 to 2004, 391 of the 1488 cases were decided in favor of the defendant. Let p be the proportion of all cases reviewed that decided in favor of the defendant.

Suppose that the proportion p = 0.263 was not known. To estimate this proportion, you plan to select a sample and compute , the sample proportion that were decided in favor of the defendant.

If n = 25, then the standard error of is

Supreme Court Cases Continued . . . Let p be the proportion of all cases reviewed that decided in favor of the defendant.



How does the sample size affect the

standard error of ?

standard error of �̂�=√𝑝 (1−𝑝)𝑛

=√ 0.263 (1−0.263)100

=0.044

Supreme Court Cases Continued . . .

Suppose that p = 0.40. How does this affect the standard error of ?

If n = 25 and p = 0.263, then the standard error of is


standard error of �̂�=√𝑝 (1−𝑝)𝑛

=√ 0.40 (1−0.40)25

=0.098

Supreme Court Cases Continued . . .

Suppose that p = 0.04. How does this affect the standard error of ?

If n = 25 and p = 0.263, then the standard error of isDoes it surprise you that tends to produce more precise estimates the farther the

population proportion is from 0.5?

For a fixed sample size, the standard error of is greatest when p = 0.5.


Estimating a Population Proportion

Margin of Error

The value of the sample proportion provides an estimate of the population proportion p.

Let p = 0.484

If , then the estimate is “off” by 0.058. This difference represents the error in the estimate.

A different sample might produce an estimate of , resulting in an estimation error of 0.014.

The margin of error of a statistic is the maximum likely estimation error.

It is unusual for an estimate to differ from the actual value of the population characteristic by more than the margin of error.

Notice that different samples will produce

different estimates that will have different estimation errors.

Recall the General Properties for Sampling Distributions of

1. The mean of the sampling distribution is p.

2. The standard error (deviation) of the sampling distribution is

3. If n is large, the sampling distribution is approximately normal.

When these properties hold, we can use what we know about normal distributions to tell us

about how behaves as an estimator of p.

If a variable has a standard normal distribution, about 95% of the time the value of variable will be between -1.96 and 1.96.

-1.96 1.96

Central Area = 0.95

Upper tail area = .025

Lower tail area = .025

0

If n is large, the sampling distribution is approximately normal with mean p and standard error .

-1.96 1.96

Central Area = 0.95

Upper tail area = .025

Lower tail area = .025

p

For any normal distribution, about 95% of the observed values will be within 1.96 standard

deviations of the mean.

About 95% of the possible will fall within of the population proportion p.

This is the margin of error for estimating a population

proportion.

Margin of Error for Estimating a Population Proportion p

Appropriate when the following conditions are met1. The sample is a random sample from the population

of interestOR the sample is selected in a way that makes it reasonable to think the sample is representative of the population.

2. The sample size is large enough. This condition is met when either both and OR (equivalently)the sample includes at least 10 successes and at least 10 failures.

Margin of Error for Estimating a Population Proportion p Continued . . .

When these conditions are met

Interpretation of margin of errorIt would be unusual for the sample proportion to differ from the actual value of the population proportion by more than the margin of error.

For 95% of all random samples, the estimation error will be less than the margin of error.

The formula given for the margin of error is actually the estimated margin of error, but it is common to refer to it without the “estimated”.

Any time a margin of error is reported, it is an estimated margin of error.

Based on a representative sample of 511 U.S. teenagers ages 12 to 17, International Communications Research estimated that the proportion of teens who support keeping the legal drinking age at 21 is with a margin of error of 0.04. Let’s see how this margin of error was computed.

Check conditions:1. Given that the sample was representative of the

population

2. The sample size is large enough because

and

Legal Drinking Age Continued . . .

with a margin of error of 0.04

Compute margin of error

InterpretationAn estimate of the proportion of U.S. teens who favor keeping the legal drinking age at 21 is 0.64. It is unlikely that this estimate differs from the actual population proportion by more than 0.04.

A Large Sample Confidence Interval for a Population Proportion

Confidence IntervalConfidence Level

Developing a Confidence Interval

npp )1(

96.1

npp )1(

96.1

p

This line represents 1.96 standard deviations below

the mean.

This line represents 1.96 standard deviations above

the mean.

�̂�

Notice that the length of each half of the interval

equals

Suppose we get this .

This fell within 1.96 standard deviations of the value of p AND

its interval “captures” p.

Using this method of calculation, the confidence interval will not capture p 5% of the time.

Approximate sampling distribution of

�̂��̂�



Notice that this line equals . We will use this to create an interval of values to estimate

p.

This fell within 1.96 standard deviations of the value of p AND

its interval “captures” p.

This did not fall within 1.96 standard deviations of the value of p AND its

interval does NOT “capture” p.

When n is large, a 95% confidence interval for p is

Confidence Intervals

A confidence interval (CI) for a population characteristic specifies an interval of plausible values for the characteristic.

The interval is constructed in such a way so that the resulting interval will be successful in capturing the actual value of the population characteristic a specified percentage of time.

The primary goal of a confidence interval is to estimate an unknown

population characteristic.

Confidence level

The confidence level associated with a confidence interval is the success rate of the method used to construct the interval.

If this method was used to generate an interval estimate over and over again from different random samples, in the long run 95% of the resulting intervals would include the actual value of the characteristic being estimated.

Our confidence is in the method – NOT in any one particular interval!

The diagram to the right is 100 95% confidence intervals for p computed from 100 different random samples.

Note that the ones with asterisks do not capture p.

If we were to compute 100 more confidence intervals for p from 100 different random samples, would we get the same results?

7 out of the 100 confidence intervals do

not contain p.Why not?

Other Confidence Levels

Suppose we wanted to create confidence intervals with a 90% confidence level . . .

Suppose we wanted to create confidence intervals with a 99% confidence level . . .

Notice that these critical

values differ for different confidence

levels.

Notice also that the larger the

confidence level, the larger

the critical value will be

AND the wider the interval will

be.

Appropriate when the following conditions are met:

The Large-Sample Confidence Interval for p

Now let’s look at general formula.

1. The sample is a random sample from the population of interest or the sample is selected in a way that makes it reasonable to think the sample is representative of the population.

2. The sample size is large enough. This condition is met when either both and or (equivalently) the sample includes at least 10 successes and at least 10 failures.

The normal distribution is only an approximation of the sampling distribution of and the true confidence

level may differ somewhat from the reported level. If and , the approximation is reasonable and the actual confidence level is usually quite close to the reported

level. This is why it is important to verify this condition.

The Large-Sample Confidence Interval for p Continued . . .

When these conditions are met, a confidence interval for the population proportion is

This is a generic formula for a confidence interval:Statistic ± critical value (standard error of the statistic)

npp

zp)ˆ1(ˆ

value) critical (ˆ

Estimated standard error

of

The desired confidence level determines which z critical value is used. The three most common

confidence levels use the following z critical values:

Confidence Level z Critical Value90% 1.64595% 1.9699% 2.58

The Large-Sample Confidence Interval for p Continued . . .

Interpretation of Confidence Interval

You can be confident that the actual value of the population proportion is included in the computed interval.

Interpretation of Confidence Level

The confidence level specifies the approximate percentage of time that this method is expected to be successful in capturing the actual population proportion.

In any given problem, this statement should be worded in

context.

Recall from Chapter 7 . . .

Four Key Questions:

Q Estimate or hypothesis testing?

S Sample data or experimental data?

T One variable or two?Categorical or numerical?

N How many samples or treatments?

E (Estimate) – Explain what population characteristic you plan to estimate.

M (Method) – Select a method using QSTN

C (Check) – Verify that the conditions are met

C (Calculate) – Perform the necessary calculations

C (Communicate) – Interpret the confidence interval

5 Steps:

Of 1100 drivers surveyed, 990 admitted to careless or aggressive driving during the previous 6 months. Assuming that it is reasonable to regard this sample of 1100 as representative of the population of drivers, compute a 90% confidence interval to estimate p, the proportion of all drivers who have engaged in careless or aggressive driving in the last 6 months.

Step 1 (E): The proportion of drivers who have engaged in careless or aggressive driving during the last 6 months, p, will be estimated.

Step 2 (M): Because the answers to the four key questions are Q: estimation, S: sample data, T: one categorical variable, N: one sample, a confidence interval for a population proportion will be considered.

Careless or Aggressive Driving Continued . . .

Step 3 (C): There are two conditions that need to be met for the confidence interval of this section to be appropriate.

Step 4 (C): Calculate the interval

1. You do not know how the sample was selected. In order to proceed, you MUST assume that the sample was representative of the population.2. Sample size is large enough because and

Careless or Aggressive Driving Continued . . .

Step 5 (C): Communicate results

Interpret Confidence Interval:Assuming that the sample was representative of the population, you can be about 90% confident that the actual proportion of drivers who engaged in careless or aggressive driving in the past 6 months is somewhere between 0.885 and 0.915.

Interpret Confidence level:The method used to construct this interval estimate is successful in capturing the actual value of the population proportion about 90% of the time.

Three Things that Affect the Width of a Confidence Interval

1. The higher the confidence level, the wider the interval.

2. The larger the sample size, the narrower the interval.

3. The closer is to 0.5, the wider the interval.

An Alternative to the Large-Sample z IntervalEven when the sample size conditions are met, sometimes the actual confidence level associated with the method may be noticeably different from the reported confidence level.

One way to correct this is to use a modified sample proportion, , the proportion of successes after adding two successes and two failures to the sample.

Use this modified sample proportion in place of in the usual confidence interval formula.

Choosing a Sample Size to Achieve a Desired Margin of Error

Using a 95% confidence interval, the sample size required to estimate a population proportion p with a margin of error M is

Choosing a Sample Size

Before collecting any data, you might wish to determine a sample size that ensures a

certain margin of error.

The value of p may be estimated using prior information.

If there is no prior knowledge available, then the conservative estimate for p is 0.5.

If we solve this for n . . .

Why is the conservative estimate for p = 0.5?

0.1(0.9) = 0.090.2(0.8) = 0.160.3(0.7) = 0.210.4(0.6) = 0.240.5(0.5) = 0.25

By using 0.5 for p, we are using the largest possible value for p(1 – p) in our calculations.

The formula for the margin of error is

Since we are looking for the sample size that produces a certain margin of error, then we need to focus on the possible values of p(1 - p)

Researchers have found biochemical markers of cancers in the exhaled breath of cancer patients, but chemical analysis of breath specimens has not yet proven effective in diagnosing cancer.

How many different breath specimens should be used if you want to estimate the long-run proportion of correct identifications for this dog with a margin of error of 0.10?

A study is to be performed to investigate whether a dog can be trained to identify the presence or absence of cancer by sniffing breath specimens.

𝑛=𝑝 (1−�̂� )( 1.96𝑀 )

2

=0.25( 1.960.10 )

2

=96.04

A sample of at least 97 breath specimens should be used.

Always round the sample size up to the next whole number.

Avoid These Common Mistakes

If a 90% confidence interval for p, the proportion of students at a particular college who own a computer, is (0.56, 0.78), you might say


“You can be 90% confident that between 56% and 78% of the students at this college own a computer.”

Interpretation of interval

You have used a method to

produce this estimate that

is successful in capturing

the actual population

proportion about 90% of

the time.

Interpretation of confidence level

Don’t get these two statements confused!


1. In order for an estimate to be useful, you must know something about its accuracy. You should beware of a single number estimate that is not accompanied by a margin of error or some other measure of accuracy.


2. A confidence interval estimate that is wide indicates that you don’t have very precise information about the population characteristic being estimated.

The best strategy for decreasing the width of a confidence interval is to take a larger sample!

Don’t be fooled by a high confidence level.

High confidence is not the same thing as saying you have precise information about the value of a

population characteristic.


3. The accuracy of an estimate depends on the sample size, not the population size.

Notice that the margin of error involves the sample size n, and decreases as n increases.The size of the population, N, does need to be considered if sampling without replacement and the sample size is more than 10% of the population size.

In this case, the margin of error is adjusted by multiplying it by a finite population correction factor


4. CONDITIONS ARE IMPORTANT!

If conditions are met, the large sample confidence interval provides a method for using sample data to estimate the population proportion with confidence, and the confidence level is a good approximation of the success rate for the method.


5. When reading published reports, don’t fall into the trap of thinking confidence interval every time you see a ± in an expression.

In addition to confidence intervals it is common to see both estimate ± margin of error and estimate ± standard error reported.

±

Chapter 9

Documents

Transcript of Chapter 9