Estimating Proportions - California State University,...

28
Estimating Estimating Proportions Proportions with Chapter 10 Confidence Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.

Transcript of Estimating Proportions - California State University,...

Page 1: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

EstimatingEstimating ProportionsProportions

with Chapter 10

Confidence

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc.

Page 2: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Principal Idea:Survey 150 randomly selected students and 41% think marijuana should be legalized41% think marijuana should be legalized.

If we report between 33% and 49% of all students at the college think that marij ana sho ld be legali edthe college think that marijuana should be legalized, how confident can we be that we are correct?

Confidence interval: an interval of estimates that is likely to capture the population value.

Objective: how to calculate and interpret a confidence interval estimate of a population proportion

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 2

interval estimate of a population proportion.

Page 3: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

10.1 The Language and Notation of Estimation

U i i di id l bj b d• Unit: an individual person or object to be measured.• Population (or universe): the entire collection of units

about which we would like information or the entireabout which we would like information or the entire collection of measurements we would have if we could measure the whole population.

• Sample: the collection of units we will actually measure or the collection of measurements we will actually obtain.

• Sample size: the number of units or measurements in the sample, denoted by n.

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 3

Page 4: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

More Language and Notation of Estimation

• Population proportion: the fraction of the population that has a certain trait/characteristic or the probabilitythat has a certain trait/characteristic or the probability of success in a binomial experiment – denoted by p. The value of the parameter p is not known.

• Sample proportion: the fraction of the sample that has a certain trait/characteristic – denoted by . The statistic is an estimate of pp̂

p̂The statistic is an estimate of p.

The Fundamental Rule for Using Data for Inference is

p

that available data can be used to make inferences about a much larger group if the data can be considered to be representative with regard to the question(s) of interest.

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 4

ep ese a e ega d o e ques o (s) o e es

Page 5: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

10.2 Margin of ErrorMedia Descriptions of Margin of Error:

• The difference between the sample proportion and the population proportion is less than the margin of error about 95% of the time, or for about 19 of every 20 sample estimates.

h diff b h l i• The difference between the sample proportion and the population proportion is more than the margin of error about 5% of the time or formargin of error about 5% of the time, or for about 1 of every 20 sample estimates

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 5

Page 6: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Example 10.1 Teens and Interracial Dating

1997 USA Today/Gallup Poll of teenagers across country: 57% of the 497 teens who go out on dates say they’ve been

out with someone of another race or ethnic group.

R t d i f f thi ti t b t 4 5%Reported margin of error for this estimate was about 4.5%.

• In surveys of this size, the difference between the sample estimate of 57% and the true percent is likely* to be lessestimate of 57% and the true percent is likely to be less than 4.5% one way or the other.

• There is, however, a small chance that the sample estimate , , pmight be off by more than 4.5%.

* h l f h lik l i fCopyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 6

* The value of how ‘likely’ is often 95%.

Page 7: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Example 10.2 If I Won the Lottery …If you won 10 million dollars in the lottery,

would you continue to work or stop working?

1997 G ll P ll 59% f th 616 l d d t1997 Gallup Poll: 59% of the 616 employed respondents said they would continue to work.

R t d i f ti b t thi llReported information about this poll:

• Results based on telephone interviews with a randomly selected sample of 1014 adults conducted Aug 22–25 ‘97selected sample of 1014 adults, conducted Aug 22–25, 97.

• Among this group, 616 are employed full-time/part-time.

F l b d hi l f “ k ”• For results based on this sample of “workers,” one can say with 95% confidence that the error attributable to sampling could be plus or minus 4 percentage points.

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 7

p g p p g p

Page 8: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

10.3 Confidence IntervalsConfidence interval: an interval of values computed from sample data that is likely to include the true population value.

Interpreting the Confidence Level• The confidence level is the probability that the procedure• The confidence level is the probability that the procedure

used to determine the interval will provide an interval that includes the population parameter.If id ll ibl d l l t d l f• If we consider all possible randomly selected samples of same size from a population, the confidence level is the fraction or percent of those samples for which the

fid i l i l d h l iconfidence interval includes the population parameter.

Note: Often express the confidence level as a percent. Common levels are 90% 95% 98% and 99%

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 8

Common levels are 90%, 95%, 98%, and 99%.

Page 9: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Constructing a 95% Confidence Intervalf P l ti P tifor a Population Proportion

Sample estimate ± Margin of error

In the long run, about 95% of all confidence intervalscomputed in this way will capture the population value

f th ti d b t 5% f th ill i itof the proportion, and about 5% of them will miss it.

Be careful: The confidence level only expresses how often the procedure works in the long run. Any one specific interval either does or does not include the true unknown population value.

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 9

c ude e ue u ow popu o v ue.

Page 10: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Example 10.1 Teens and I t i l D ti ( t)Interracial Dating (cont)

Poll: 57% of dating teens sampled had gone outPoll: 57% of dating teens sampled had gone out with somebody of another race/ethnic group. Margin of error was 4.5%.

95% Confidence Interval:57% ± 4 5% 52 5% t 61 5%57% ± 4.5%, or 52.5% to 61.5%

We have 95% confidence that somewhere betweenWe have 95% confidence that somewhere between 52.5% and 61.5% of all American teens who date have gone out with somebody of another race or ethnic group.

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 10

Page 11: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Example 10.2 Winning the Lottery d W k ( t)and Work (cont)

Poll: 40% of employed workers sampled would quit working if they won the lotteryquit working if they won the lottery. Margin of error was 4%.

95% Confidence Interval Estimate:95% Confidence Interval Estimate:Sample estimate ± Margin of error

40% ± 4%36% t 44%36% to 44%

With 95% confidence, somewhere between 36% and 44% f ki A i ld th ld it kiof working Americans would say they would quit working

if they won $10 million in the lottery.Interval does not cover 50% => Appears that fewer than half of

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 11

all working Americans think they would quit if won lottery.

Page 12: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

10.4 Calculating A Margin of Error for 95% Confidence

For a 95% confidence level, the approximate margin of error for a sample proportion ismargin of error for a sample proportion is

( )pp ˆ1ˆ2errorofMargin −

≈n

g

Note: The “95% margin of error” is simply two standard errors, or 2 s.e.( ).p̂

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 12

Page 13: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Example 10.3 Pollen Count Must Be HighPoll: Random sample of 883 American adults.

“Are you allergic to anything?”

Results: 36% of the sample said “yes”, = .36

( ) ( )36136ˆ1ˆ

( ) ( ) 032.883

36.136.212error ofmargin 95% =−

=−

≈n

pp

We can be 95% fid t that some here bet een

95% Confidence Interval: .36 ± .032, or about .33 to .39

We can be 95% confident that somewhere between 33% and 39% of all adult Americans have allergies.

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 13

Page 14: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

The Conservative Estimate f M i f Eof Margin of Error

Conservative estimate 1Conservative estimate of the margin of error = n

1

• It usually overestimates the actual size of the margin of error. g

• It works (conservatively) for all survey questions based on the same sample size, even if the sample proportions differ from one question to the next.

• Obtained when = .5 in the margin of error formula. p̂

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 14

Page 15: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Example 10.3 Really Bad Allergies (cont)Poll: Random sample of 883 American adults 3% of the sample experience “severe” symptomsp p y p

%4.3or 034.8831error ofmargin veconservati ==883

95% (conservative) Confidence Interval: 3% ± 3 4% or 0 4% to 6 4%

When is far from .5, the conservative margin of error

3% ± 3.4%, or -0.4% to 6.4%

p̂is too conservative. The 95% margin of error using

= .03 is just .011 or 1.1%, for an interval from 1.9% to 4 1%p̂

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 15

to 4.1%.

Page 16: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

10.5 General Theory of CIs for a Proportion

Developing the 95% Confidence LevelFrom the sampling distribution of we have: For 95% of all samples,

Developing the 95% Confidence Levelp̂

p ,-2 standard deviations < – p < 2 standard deviations

Don’t know true standard deviation, so use standard error.p̂

For approximately 95% of all samples, -2 standard errors < – p < 2 standard errorsp̂

which implies for approximately 95% of all samples, – 2 standard errors < p < + 2 standard errorsp̂p̂

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 16

Page 17: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

General Description of the

Approximate 95% CI for the population proportion:

Approximate 95% CI for a Proportion

Approximate 95% CI for the population proportion:± 2 standard errors

h d d i

p̂( )pp ˆ1ˆ

)ˆ( −The standard error is

Interpretation: For about 95% of all randomly selected

( )n

pppes 1)ˆ.(. =

Interpretation: For about 95% of all randomly selected samples from the population, the confidence interval computed in this manner captures the population proportion.

( )Necessary Conditions: and are both greater than 10, and the sample is randomly selected.

pnˆ ( )pn ˆ1−

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 17

Page 18: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

General Format for Confidence Intervals

For any confidence level, a confidence interval for either a population proportion or a population

b dmean can be expressed as

Sample estimate ± Multiplier × Standard errorSample estimate ± Multiplier × Standard error

The multiplier is affected by the choice of confidence level.

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 18

Page 19: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

More about the Multiplier

Note: Increase confidence level => larger multiplier.

Multiplier, denoted as z*, is the standardized score such that the area bet een * and * nder thearea between -z* and z* under the standard normal curve corresponds to the desired confidence level.

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 19

Page 20: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Formula for a Confidence Interval f P l ti P tifor a Population Proportion p

( )ˆ1ˆ ( )n

ppzpˆ1ˆˆ −

± ∗

• is the sample proportion.• z* denotes the multiplier.

where

• is the standard error of( )pp ˆ1ˆ − p̂• is the standard error of .( )n

p

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 20

Page 21: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Example 10.6 Intelligent Life Elsewhere?Poll: Random sample of 935 AmericansDo you think there is intelligent life on other planets?y g p

Results: 60% of the sample said “yes”, = .60

( ) ( )616 −p̂

( ) ( ) 016.935

6.16.ˆ.. ==pes

90% Confidence Interval: 60 ± 1 65( 016) or 60 ± 02690% Confidence Interval: .60 ± 1.65(.016), or .60 ± .02698% Confidence Interval: .60 ± 2.33(.016), or .60 ± .037

Note: entire interval is above 50% => high confidence that a majority believe there is intelligent life.

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 21

Page 22: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Example 10.6 Intelligent Life Elsewhere?Poll: Random sample of 935 Americans“Do you think there is intelligent life on other planets?y g p

Results: 60% of the sample said “yes”, = .60p̂

We want a 50% confidence interval. If the area between -z* and z* is .50, then the area to the left of z* is 75then the area to the left of z is .75. From Table A.1 we have z* ≈ .67.

Note: Lower confidence level results in a narrower interval.

50% Confidence Interval: .60 ± .67(.016), or .60 ± .011

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 22

Page 23: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Conditions for Using the Formula

1. Sample is randomly selected from the population.N t A il bl d t b d t k i fNote: Available data can be used to make inferences about a much larger group if the data can be considered to be representative with regard to p gthe question(s) of interest.

2. Normal curve approximation to the distribution of possible sample proportions assumes a “large” sample size. Both and should be at least 10 (although some say these

pnˆ ( )pn ˆ1−should be at least 10 (although some say these need only to be at least 5).

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 23

Page 24: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

10.7 Using Confidence Intervals gto Guide Decisions

Principle 1. A value not in a confidence interval can be rejected as a possible value of the population proportion. A l i fid i l i “ bl ”A value in a confidence interval is an “acceptable” possibility for the value of a population proportion.

Principle 2. When the confidence intervals for proportions in two different populations do not overlap, it is reasonable to conclude that the two population proportions are different.

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 24

Page 25: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Example 10.7 Which Drink Tastes Better?Taste Test: A sample of 60 people taste both drinks and 55% like taste of Drink A better than Drink B.

Makers of Drink A want to advertise these results.Makers of Drink B make a 95% confidence intervalMakers of Drink B make a 95% confidence interval for the population proportion who prefer Drink A.

( )55155 −95% Confidence Interval: ( ) 13.55.60

55.155.255. ±→−

±

Note: Since .50 is in the interval, there is not enough evidence to claim that Drink A is preferred by a majority of population represented by the sample

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 25

majority of population represented by the sample.

Page 26: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Case Study 10.2 Nicotine Patches vs ZybanStudy: New England Journal of Medicine 3/4/99)

893 ti i t d l ll t d t• 893 participants randomly allocated to four treatment groups: placebo, nicotine

t h l Z b l d Z b lpatch only, Zyban only, and Zyban plus nicotine patch.

• Participants blinded: all used a patch (nicotine or placebo) and all took a pill (Zyban or placebo).

• Treatments used for nine weeks.

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 26

Page 27: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

Case Study 10.2 Nicotine (cont)

Conclusions:

Zyban is effective( l f Z b(no overlap of Zyban and no Zyban CIs)

Nicotine patch is notNicotine patch is not particularly effective(overlap of patch

d h CI )and no patch CIs)

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 27

Page 28: Estimating Proportions - California State University, Fullertonmathfaculty.fullerton.edu/galpargu/M120/Lectures-Revis… ·  · 2010-04-27Copyright ©2004 Brooks/Cole, a division

In Summary: Confidence Interval f P l ti P tifor a Population Proportion p

( )pp ˆ1ˆGeneral CI for p: ( )

nppzp 1ˆ −

± ∗

Approximate 95% CI for :

( )pppˆ1ˆ

2ˆ −±95% CI for p: n

p

Conservative 95% CI for p: n

p 1ˆ ±

Copyright ©2004 Brooks/Cole, a division of Thomson Learning, Inc. 28