Confidence Interval (CI) for a Proportion Choosing the Sample Size

23
Confidence Interval (CI) for a Proportion Choosing the Sample Size

description

Confidence Interval (CI) for a Proportion Choosing the Sample Size. When we collect data from our 1 random sample and compute the sample proportion, the interval of values. Confidence Interval for p. forms an (approximate) confidence interval (CI) for p . - PowerPoint PPT Presentation

Transcript of Confidence Interval (CI) for a Proportion Choosing the Sample Size

Page 1: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Confidence Interval (CI)for a Proportion

Choosing the Sample Size

Page 2: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Confidence Interval for pWhen we collect data from our 1 random sample and compute the sample proportion, the interval of values

forms an (approximate) confidence interval (CI) for p.

nppzEEpˆ1ˆ

whereˆ

Page 3: Confidence Interval (CI) for a Proportion Choosing the Sample Size

ExampleA marketing firm intends to survey newspaper readers to determine what percent of readers notice a particular ad campaign. They will summarize their data and form a 95% confidence interval; the interval will be reported back to the company purchasing the ads.

The company has requested an error margin no greater than 0.05.

What should be the sample size n?

Page 4: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Error MarginBefore the study is conducted, E is unknown. If the goal is to sample enough to have an error margin no greater than a target value E:

Enppz

ˆ1ˆ

ppEzn ˆ1ˆ

2

Page 5: Confidence Interval (CI) for a Proportion Choosing the Sample Size

ExampleA marketing firm intends to survey newspaper readers to determine what percent of readers notice a particular ad campaign. They will summarize their data and form a 95% confidence interval; the interval will be reported back to the company purchasing the ads.

The company has requested an error margin no greater than 0.05.

Page 6: Confidence Interval (CI) for a Proportion Choosing the Sample Size

“Ignorant” Solution

Use 0.5 as the prevalence.

16.3845.05.005.096.1ˆ1ˆ

22

ppEzn

It’s impossible to sample 0.16 of a reader.

Answer: Sample at least 385 readers.

Sample sizes must be whole numbers.

Page 7: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Implementing the Solution320 of the 385 readers noticed the campaign.

The actual error margin is

Had 0.83 had been known in advance (it wasn’t), the required sample size would have been 217.

0374.00191.096.1385

1688.08312.096.1ˆ1ˆ

*

nppz

8312.0385320ˆ p

Page 8: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Flaw in Ignorant SolutionIf you use 0.5 to determine the sample size, you will get an error margin no more than the desired value.

But…Unless the prevalence turns out to be 0.5 exactly, the error margin will be less than desired.

The error margin will be considerably less than desired if the prevalence is far from 0.5. Time and money are wasted.

Page 9: Confidence Interval (CI) for a Proportion Choosing the Sample Size

ExampleHow many trees should we sample in order to estimate the proportion expected to die with error margin no greater than 0.02 = 2%? Assume we want 99% confidence in our result.

Past history suggests a result in the vicinity of 0.190.

A 90% CI based on 216 trees yielded 0.190 ± 0.044.

We might try values 0.15 to 0.25 in order to obtain an indication of what the sample size should be.

Page 10: Confidence Interval (CI) for a Proportion Choosing the Sample Size

ExampleThe actual error margin will depend on the observed proportion. If it is closer to 0.5 than what we use to get a sample size, then we will not meet the desired error margin. We won’t meet the goal.

That is why 0.5 is guaranteed to get the error margin.

If it is further from 0.5 than what we use, then we will undershoot the desired error margin. This is fine, except that it adds to the expense of the study.

For values 0.35 – 0.65, using 0.5 as a guess generally doesn’t oversample by too much.

Page 11: Confidence Interval (CI) for a Proportion Choosing the Sample Size

ExampleThe proportion of students who have had the flu (through 11/4/2009) was estimated with a sample of n = 62.

The 90% confidence interval was:

0.177 0.080 (0.097, 0.257)

How many students should be sampled to reduce the error margin by half to 0.04?

Page 12: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Example0.177 0.080 (0.097, 0.257)

Our result supports a future result between about 0.10 and 0.26. Our best guess would be about 0.18.

Assuming Required n

0.18

ppEzn ˆ1ˆ* 2

Page 13: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Example0.177 0.080 (0.097, 0.257)

Our result supports a future result between about 0.10 and 0.26. Our best guess would be about 0.18.

Assuming Required n

0.18 250 (about 4 62)

63.24982.018.004.0

645.1 2

n

Page 14: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Relation between n and EIn general, the larger n is, the smaller E is. If we only compare situations with the same confidence and proportion, then

Reducing the error margin by a multiplicative factor of k requires increasing the sample size by a factor of k2.

Ex: Making the error margin twice (2 times) as small requires making the sample size 22 = 4 times bigger.

For E = 0.02 (4 times smaller), n = 16(62) = 992.

For E = 0.01 (8 times smaller), n = 64(62) = 3968

Page 15: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Example0.177 0.080 (0.097, 0.257)

Our result supports a future result between about 0.10 and 0.26. Our best guess would be about 0.18.

Assuming Required n

0.18 250 (about 4 62)

0.10 152

0.26 326

0.50 (ignorant) 423

Page 16: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Example

150

200

250

300

350

400

450

0.1 0.2 0.3 0.4 0.5

proportion

sam

ple

size

Page 17: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Example

050

100150200250300350400

0 0.2 0.4 0.6 0.8 1

proportion

sam

ple

size

Page 18: Confidence Interval (CI) for a Proportion Choosing the Sample Size

ExampleA computer manufacturer’s tech support office wants to assess the percent of customers who make service calls within the first month. The company wants to be 90% confident that the sample percentage is within two percentage points of the true percentage

Past surveys have revealed this figure to be in the 5 – 15% range.

Page 19: Confidence Interval (CI) for a Proportion Choosing the Sample Size

ExampleYou try 0.10. (The confidence is 90%. The desired error margin is 2%.) What is the required sample size? (Give a whole number as answer.)

The required sample size is 609.

85.6089.01.002.0

645.1 2

n

ppEzn ˆ1ˆ* 2

Page 20: Confidence Interval (CI) for a Proportion Choosing the Sample Size

1.00.90.80.70.60.50.40.30.20.10.0

1800

1600

1400

1200

1000

800

600

400

200

0

Proportion

Nec

essa

ry S

ampl

e Si

zeSolutions

Guessed value 0.05 0.10 0.15 0.50Minimum n 322 609 863 1695

Page 21: Confidence Interval (CI) for a Proportion Choosing the Sample Size

0.1500.1250.1000.0750.050

1000

800

600

400

200

0

Proportion

Nec

essa

ry S

ampl

e Si

zeSolutions

Guessed value 0.05 0.10 0.15 0.50Minimum n 322 609 863 1695

Page 22: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Trade-off for any solutionsSome prevalence must be assumed to obtain a sample size.

If the result is closer to 0.5 than what was assumed, the error margin will be larger than desired.

If the result is farther from 0.5 that what was assumed, resources will have been wasted.

Page 23: Confidence Interval (CI) for a Proportion Choosing the Sample Size

Reasonable CompromiseUse 0.5 if you have no idea, or if you anticipate a prevalence close to 0.5.

In public opinion polls for a 2-candidate election, the prevalence is often near 0.5. So 0.5 is used. (Remember: 95% confidence for media polls.)

If you have a guessed (range of) value(s), use it, but recognize that: an actual result closer to 0.5 will cause you to miss the objective; a result further from 0.5 will cause you to oversample.