IV. CONFIDENCE INTERVAL AND ESTIMATION

26
Mathacle PSet ----- Stats, Confidence Intervals and Estimation Level ---- 1 Number --- 1 Name: ___________________ Date: _____________ 1 Unbiased Estimators So we don’t have favorite. IV. CONFIDENCE INTERVAL AND ESTIMATION 4.1. Significant Level and Critical Values z and t The significant level, often denoted by , is a probability measure that indicates how confident the sample statistic such as sample mean x or sample proportion p is “significantly” different from the population mean or population proportion p , respectively. The significant level is usually associated with the critical z-scores z in ) 1 , 0 ( N distribution or critical t in t distribution, depending on which distribution in the applications is used. Three cases associated with are often practiced: the left-tailed and the right-tailed, two- tailed. The two-tailed case is used in determining the confidence interval in normal and t distributions. df is the degree of freedom in t distribution. In Ti-84: Left-tailed: invNorm z , df invT t , ; Right-tailed: 1 invNorm z , df invT t , 1 Two-tailed: 2 invNorm z , , 2 t invT df

Transcript of IV. CONFIDENCE INTERVAL AND ESTIMATION

Page 1: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

1

Unbiased Estimators – So we don’t have favorite.

IV. CONFIDENCE INTERVAL AND ESTIMATION

4.1. Significant Level and Critical Values z and t

The significant level, often denoted by , is a probability measure that indicates how

confident the sample statistic such as sample mean x or sample proportion p is

“significantly” different from the population mean or population proportion p ,

respectively. The significant level is usually associated with the critical z-scores z in

)1,0(N distribution or critical t in t – distribution, depending on which distribution in the

applications is used.

Three cases associated with are often practiced: the left-tailed and the right-tailed, two-

tailed. The two-tailed case is used in determining the confidence interval in normal and t

distributions. df is the degree of freedom in t distribution.

In Ti-84:

Left-tailed: invNormz , dfinvTt , ;

Right-tailed: 1invNormz , dfinvTt ,1

Two-tailed:

2

invNormz , ,

2t invT df

Page 2: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

2

Example 4.1.1. For the given , df , and the types of tails, find z or t as indicated.

Type df z or t Graph

1.0 Z - Distribution, right - tailed -

1.0 Z - Distribution, left - tailed -

1.0 Z - Distribution, two - tailed -

05.0 Z - Distribution, two - tailed -

0.01 Z - Distribution, two - tailed -

1.0 t - Distribution, two - tailed 5

1.0 t - Distribution, two - tailed 20

0.1 t - Distribution, two - tailed 30

0.1 t - Distribution, two - tailed 60

05.0 t - Distribution, two - tailed 5

0.05 t - Distribution, two - tailed 20

0.05 t - Distribution, two - tailed 30

01.0 t - Distribution, two - tailed 5

01.0 t - Distribution, two - tailed 30

Page 3: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

3

Solution:

Type df z or t Graph

1.0 Z - Distribution, right - tailed - 1.282z

1.0 Z - Distribution, left - tailed - 1.282z

1.0 Z - Distribution, two - tailed - 1.645z

05.0 Z - Distribution, two - tailed - 1.960z

0.01 Z - Distribution, two - tailed - 2.576z

1.0 t - Distribution, two - tailed 5 2.015t

1.0 t - Distribution, two - tailed 20 1.725t

0.1 t - Distribution, two - tailed 30 1.697t

0.1 t - Distribution, two - tailed 60 1.671t

05.0 t - Distribution, two - tailed 5 2.571t

0.05 t - Distribution, two - tailed 20 2.086t

0.05 t - Distribution, two - tailed 30 2.042t

01.0 t - Distribution, two - tailed 5 4.032t

01.0 t - Distribution, two - tailed 30 2.750t

Page 4: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

4

4.2. Confidence Interval (CI)

The confidence level (CL) is defined as 1 . In the two-tailed case when the is given,

the confidence interval (CI) is ,z z for Z- distribution, and the CI is ,t t

for t-distribution. The area on each side of the two-sided case is2

.

Example 4.2.1. To compare Z-distribution and t- distribution, find

distribution df CI

1.0 Z -

05.0 Z -

01.0 Z -

1.0 t 5

1.0 t 30

05.0 t 5

05.0 t 30

01.0 t 5

01.0 t 30

Solution:

distribution df CI

1.0 Z - ( 1.645,1.645)

05.0 Z - ( 1.960,1.960)

01.0 Z - ( 2.576, 2.576)

1.0 t 5 ( 2.015, 2.015)

1.0 t 30 ( 1.697,1.697)

05.0 t 5 ( 2.571, 2.571)

05.0 t 30 ( 2.042, 2.042)

01.0 t 5 ( 4.032, 4.032)

01.0 t 30 ( 2.750, 2.750)

Page 5: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

5

For the distribution 2( , )N , the z-scores can be used to transform 2( , ) (0,1)N N .

In summary:

Distribution Confidence Interval (CI)

)1,0(N ,CI z z or z0

)(tk ,CI t t or t0 , where k df

),( 2N zzCI , or z

when and are known

4.3. Definition of Unbiased Estimators

An estimator T is called an unbiased estimator for parameter , if the expected value of

T is :

][TE

The difference of ][TE is called the bias of T . The intuitive meaning of an unbiased

estimator is one that does not systematically overestimate or underestimate the .

4.4. The Proportion Estimator for Binomial Distributions

Let N denote the total number of units or items in the population. Suppose X is the sum

of n independent discrete random binomial variables nXXX ,,, 21 that take one or zero,

and with success probability p for each trial:

nXXXX ...21

Where n N . The mean and variance of X are npXE ][ and pnpXVar 1)( .

If the sample distribution 1 2ˆ nX X XXp

n n

is defined as the rescaled or the

average binomial variable X , then the mean of p̂ is

ˆ

[ ]ˆ[ ]p

X E X npE p E p

n n n

Page 6: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

6

That is, the estimator p̂ has the property of ppE ˆ . Therefore, the sample proportion

p̂ is an unbiased estimator for the population proportion p. Its variance is

2

ˆ 2 2

[ ] (1 ) (1 )ˆ[ ]p

X Var X np p p pVar p Var

n n n n

The proportion problem should really be modeled as hypergeometric problem. That is,

the k successes in n draws without replacement. The standard deviation for

hypergeometric model is

2

ˆ

(1 )

1p

N n p p

N n

Where 1

N n

N

is the modification factor. When the sampling fraction

n

Nis small, say

10%n

N , the standard deviation can be approximated as

ˆ

1(1 ) (1 )

11

p

n

p p p pN

n n

N

This is the the “10% rule”, and in this case the binomial approximation is satisfactory.

When 1)1( pnp , or equivalently 10ˆ pn and 10)ˆ1( pn as noted later, the

binomial distribution can be approximated by the Z distribution. The z-scores in this case

can be calculated as

ˆ ˆ

ˆ ˆ(1 ) (1 )

p p p pz

p p p p

n n

p̂ can be used to approximate ˆ

ˆ ˆ(1 )p

p p

n

, when p is unknown and the 10% rule is

satisfied. The significant level is

Page 7: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

7

22 )ˆ1(ˆ

ˆ

)1(

ˆ z

n

pp

ppz

n

pp

pp

The confidence interval is then

2 2

ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ,

p p p pCI p z p z

n n

or in the interval form:

2 2

ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ

p p p pp z p p z

n n

or in the form of error terms:

n

ppzp

)ˆ1(ˆˆ

2

In summary, the steps to obtain CI for the sample proportion (statistic) p̂ with sample

size n are

1.) The sample is a simple random sample.

2.) The sample size is less than 10% of population.

3.) 10ˆ pn , 10)ˆ1( pn to approximate the binomial distribution to normal distribution.

Example 4.4.1. THS administrators wanted to know how many 10th graders and 11th

graders did either internships or community services in the past summer. A random

sample of 75 students indicated that 60 students did one of the two. Find the 95%

confidence interval for the school proportion. Assume that the school population of these

two classes is 800 and all two grades are equally likely to do summer internships or

community services.

Solution:

8.075/60ˆ p , 05.0 , 96.12

zz ( )025.0(invNormz )

80800%1075 n , 608.075ˆ pn , 152.075ˆ1 pn

Page 8: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

8

091.08.075

)8.01(8.096.18.0

)ˆ1(ˆˆ

2

n

ppzp

or

891.0,709.0CI

That is, the school is 95% confident that the true proportion of those two grades who did

the summer internships or community services is between 0.709 and 0.891.

Example 4.4.2. A previous study has suggested that about 19.3% of teens (aged 12-19)

are obese. How large of a sample will be needed in order to estimate the true proportion

of obese teens with 95% confidence and a margin of error of no more than 1%?

Solution:

193.0ˆ p , 05.0 , 96.12

zz ( )025.0(invNormz )

59830001.0

)193.01(193.096.1

0001.0

ˆ1ˆ

01.0)ˆ1(ˆ 2

2

2

2

ppz

nn

ppz

4.5. The Mean Estimator for the Normal Distribution 2( , )N

Suppose nXXX ,,, 21 are continuous independent random variables from a normal

distribution with the known mean and the known variance 2 . Then

n

XXXX n

...21

is the unbiased mean estimator:

n

n

n

XE

n

XXXEXE

in][...

][ 21

The variance is 2 2 2

2 1 2

2 2

[ ]...[ ]

inx

E XX X X nVar X Var

n n n n

Page 9: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

9

Similar to the proportion problem, the correct hypergeometric model should add the

correction factor to the variance: 2

2

1x

N n

N n

When the 10% rule is satisfied, the standard deviation is

2 1

111

x

n

N n N

N n n n

N

The Central Limit Theorem says that when the sample size is large, this unbiased

estimator 2

~ ( , ) ~ (0,1)x

X N Nn

n

.

For the confidence level 1 ,

2

z

n

x. The confidence interval is,

nzx

nzxCI

22

,

or

nzx

nzx

22

or

nzx

2

The conditions to obtain CI for the sample mean (statistic) x with sample size n are

1.) The sample is a simple random sample.

2.) The sample size is less than 10% of population.

3.) is known and 30n for the condition of using normal distribution.

Example 4.5.1. The President of a large university wishes to estimate the average age of

the students presently enrolled. For the past studies, the standard deviation is known to be

Page 10: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

10

2 years. A sample of 50 students is selected randomly, and sample mean is found to be

23.2 years. Find the 95% confidence interval of the school’s population mean.

Solution:

2 , 50n , 2.23x , 05.0 , 96.12

zz , ?CI

50

296.12.23

50

296.12.23

or

8.23,6.22CI

That is, the President can say with 95% confidence that the average age of students is

between 22.6 and 23.8 years old.

Example 4.5.2. From the last example, the President would like to be 99% confident that

the estimate of average age should be accurate within 1 year when the standard deviation

of the ages is 3 years. How large a sample is necessary?

Solution:

3 , 01.0 , 58.22

zz , ?n

6013

58.212

nnn

z

The sample size needs to be at least 60.

4.6. The Mean Estimator for t-Distribution

For a random variable ~ ( 1)X T n , where ( 1)T n is the t-distribution with the degrees

of freedom 1 nk and with sample size of n , the mean 0][ XE , the variance

2][

k

kXVar for 2k , the skewness is 0 for 3k , and the kurtosis is

4

6

kfor 4k .

Page 11: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

11

When variance 2 is unknown for nXXX ,,, 21 , the unbiased variance estimator is:

22

1

1

1

n

i

i

s x xn

That is, 2 2[ ]E s S , and 2

xs , the estimate of standard error of 1 2 ... nX X XX

n

, is

22

1x

N n ss

N n

When the 10% rule is satisfied, the standard error of x is

x

ss

n

Note that the sample standard deviation estimator

n

i

i xxn

s1

2

1

1

is not an unbiased estimator of the population standard error S .

When the sample size is less than 30, and/or is unknown, similar to use normal

distribution, the mean estimator X can be studentized by the t-distribution

~ ( 1)x

T ns

n

, and the confidence level is determined by

1

2

n

k

xt

sn

.

The confidence interval is

1 1

2 2

,n ns sCI x t x t

n n

or

1 1

2 2

n ns sx t x t

n n

Page 12: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

12

or

1

2

n sx t

n

Example 4.6.1. A sample of 28 THS teachers travels an average (mean) of 14.3 miles to

school. The standard deviation of their travel time was 2 miles. Find the 95% confidence

interval of true mean or population mean.

Solution:

28n , 2s , 3.14x , 05.0 , 27128 k , 05.227

2

tt , ?CI

28

205.23.14

28

205.23.14

1.155.13

or

1.15,5.13CI

4.7. Compare the Difference of Two Means with Unequal Known Variances and Large

sample Size

It is much more common to compare the difference of two means or proportions than to

estimate the means or proportions themselves.

For two normal distributions 2

1 1,N x and 2

2 2,N x , when 1 2, 30n n and variances

2

1 and 2

2 are known and unequal, the difference of the two means 2 1x x to estimate the

difference of population means 1 2 has the distribution 2 2

1 21 2

1 2

,N x xn n

. The

confidence interval is

2 2 2 2

1 2 1 21 2 1 2

1 2 1 22 2

CI ( ) , ( )x x z x x zn n n n

Page 13: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

13

or 2 2

1 21 2

1 22

( )x x zn n

Example 4.7.1. A research team is interested in the difference between serum uric acid

levels in patients with and without Down's syndrome. A sample of 12 individuals with

Down's syndrome yielded a mean of 1 4.5x mg/100 ml. A sample of 15 normal

individuals of the same age and sex were found to have a mean value of 2 3.4x mg/100

ml. If it is reasonable to assume that the two populations of values are normally

distributed with variances equal to 1 and 1.5, find the 95 percent confidence interval for

1 2 .

Solution:

2

1 1 112, 4.5, 1n x

2

2 1 115, 3.4, 1.5n x

1 2 4.5 3.4 1.1x x

2 2 2

1 2

1 2

1 1.50.4282

12 15n n

2 2

1 21 2

1 22

( ) 1.1 1.96(0.4282) 1.1 0.8393x x zn n

, or (0.26,1.94)CI

That is, the difference between the two population means is 1.1 and we are 95%

confident that the true difference between the means lies between 0.26 and 1.94.

4.8. Compare the Difference of Two Means with Small Sample Size and/or with Unknown

Equal Variances

For sample size 1 2, 30n n and/or with unknown equal variances, the pooled standard

error is used for estimating difference of population means 1 2 from the difference of

the two means 1 2x x :

2 2

1 1 2 2

1 2

( 1) ( 1)

2p

n s n ss

n n

Page 14: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

14

and

1 2 1 2

1 2

1 2

( )( 2)

1 1p

x xT n n

sn n

.

The confidence interval is

1 2 1 22 2

1 2 1 2

1 2 1 22 2

1 1 1 1CI ( ) , ( )

n n n n

p px x t s x x t sn n n n

or

1 2 2

1 2

1 22

1 1( )

n n

px x t sn n

Example 4.8.1. An experiment was done to compare the mean number of tapeworms in

the stomachs of sheep that had been treated for worms versus those not treated. There

were 7 sheep in the treatment group and 7 in the control group. The means and standard

deviation are

Treatment Control

x 28.57 40.0 2s 198.62 215.33

n 7 7

What is the confidence interval for the difference of two means at significant level

0.1 ?

Solution:

The sample size is small and it is reasonable to assume the variances are equal. So the

pooled estimate will be used.

28.57Tx , 40.0Cx , 2 198.62Ts , 2 215.33Cs , 1 7n , 2 7n , 0.1 ,

1 2 2 7 7 2 12

0.05 0.05

2

1.782n n

t t t

Page 15: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

15

The pooled standard error is

2 2

1 1 2 2

1 2

( 1) ( 1) (7 1)(198.62) (7 1)(215.33)14.387

2 7 7 2p

n s n ss

n n

1 2 2

1 22

1 1( )

1 1(28.57 40.00) 1.782(14.387)

7 7

11.44 13.70

n n

T C px x t sn n

4.9. Compare the Difference of Two Proportions

For two binomial distributions 1 1 1( , , )b n p x and 2 2 2( , , )b n p x , when 1n and 2n are less than

10% of population and 1 1 1 1 2 2 2 2ˆ ˆ ˆ ˆ, (1 ), , (1 ) 10n p n p n p n p , the difference of the two

proportions 1 2ˆ ˆp p to estimate the difference of population means

1 2p p has the distribution 1 1 2 21 2

1 2

ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ ,

p p p pN p p

n n

. The confidence

interval is

1 1 2 2 1 1 2 21 2 1 2

1 2 1 22 2

ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ(1 ) (1 ) (1 ) (1 )ˆ ˆ ˆ ˆCI ,

p p p p p p p pp p z p p z

n n n n

or

1 1 2 21 2

1 22

ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ

p p p pp p z

n n

Example 4.9.1. Independent random samples of 100 luxury cars and 250 non-luxury cars

in a certain city are examined to see if they have bumper stickers. Of the 250 non-luxury

cars, 125 have bumper stickers and of the 100 luxury cars, 30 have bumper stickers. What

is a 90% confidence interval for the difference in the proportion of non-luxury cars with

Page 16: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

16

bumper stickers and the proportion of luxury cars with bumper stickers for the population

of cars represented by these samples?

Solution:

2 250n , 1

125ˆ 0.5

250p ,

2 100n , 2

30ˆ 0.3

100p , 0.1

Assume that 1n and

2n are less than 10% of total numbers of luxury cars and non-luxury

cars, respectively. 0.05

2

1.645z z , 1 1 1 1 2 2 2 2ˆ ˆ ˆ ˆ, (1 ), , (1 ) 10n p n p n p n p .

1 1 2 21 2

1 22

ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ

0.5(1 0.5) 0.3(1 0.3)0.5 0.3 1.645

250 100

0.2 0.092

p p p pp p z

n n

4.10. Summary and Examples

Case #1: use the sample proportion to estimate the population proportion

Given: the significant level , the sample size n , the favorable outcome x

Distribution: ( , , )b n p x , assume the sample proportion follows the binomial distribution.

Sample Statistic: ˆx

pn

Population Parameter: proportion p

Confidence Interval (CI): n

ppzp

)ˆ1(ˆˆ

2

Conditions: 10%n of population, 10ˆ pn , 10)ˆ1( pn

Case #2: use the sample mean to estimate the population mean

Given: the significant level , the sample size n , the sample mean x

Distribution: 2,N x , assume the sample mean follows the normal distribution.

Sample Statistic: x

Population Parameter: mean

Page 17: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

17

Confidence Interval (CI): n

zx

2

Conditions: 30n , is known

Case #3: use the sample mean to estimate the population mean

Given: the significant level , the sample size n , the sample mean x

Distribution: ~ ( 1)x

T ns

n

, assume

x

sn

follows the t distribution with 1df n .

Sample Statistic: x

Population Parameter: mean

Confidence Interval (CI): 1

2

n sx t

n

Conditions: 30n , and/or is unknown

Distribution Sample

Statistic

Confidence Interval Comments

( , , )b n p x p̂

n

ppzp

)ˆ1(ˆˆ

2

1.) 10%n of

pop

2.) 10ˆ pn

3.) 10)ˆ1( pn

2,N x x

nzx

2

1.) is known

2.) 30n

~ ( 1)x

T ns

n

x

1

2

n sx t

n

1.) 1k n deg.

of freedom

2.) 5 30n or

3.) is unknown

2 2

1 21 2

1 2

,N x xn n

1 2x x 2 2

1 21 2

1 22

( )x x zn n

1.) 1 2, are

known and

unequal.

2.) 1 2, 30n n

1 2 1 2

1 2

1 2

( )( 2)

1 1p

x xT n n

sn n

1 2x x

1 2

1 2

2

1 22

( )

1 1n n

p

x x

t sn n

1 2,x x are the sample means

1.)

1 2 2k n n is

deg. of freedom

2.) 1 25 , 30n n

and/or 1 2, are

equal.

Page 18: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

18

2 2

1 1 2 2

1 2

( 1) ( 1)

2p

n s n ss

n n

and 1 2,s s are the sample

deviations

3.) 1 2, are

unknown and

equal

( , , )b n p x

1 2ˆ ˆp p

1 2

1 1 2 2

1 22

ˆ ˆ

ˆ ˆ ˆ ˆ(1 ) (1 )

p p

p p p pz

n n

1.) 1 2, 10%n n

of pop

2.)

1 1 2 2ˆ ˆ, 10n p n p ,

1 1ˆ(1 ) 10n p ,

2 2ˆ(1 ) 10n p

Example 4.10.1. I want to construct a 99% confidence interval for the proportion of

Americans who think that the government has placed too many regulation on businesses,

and I want a margin of error of no more than 3%. Assume the population proportion is

0.5. How large of a sample will this require?

Solution:

0.01 , 0.5p , 0.005

2

2.576z z , 2

(1 )3%

p pz

n

?n

2

0.5(0.5) 2.575(0.5)2.576 3% 1842

0.03n

n

Example 4.10.2. A study of 5302 people aged 60 or older in US found 124 with

rheumatoid arthritis. Construct 90% confidence interval for the actual proportion of all

people aged 60 and older who have rheumatoid arthritis.

Solution:

0.1 , 124

ˆ 0.02335302

p , 0.05

2

1.6449z z , 5302n , ?CI

0.0233(1 0.0233)0.0233 1.6449 0.0233 0.0021

5302

Or

0.0212, 0.0254CI

Page 19: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

19

Example 4.10.3. [AP practice question, College Board] A large company is considering

opening a franchise in St. Louis and wants to estimate the mean household income for the

area using a simple random sample of the households. Based on information from a pilot

study, the company assumes that the standard deviation of household incomes is

$7,200 . What is the least number of households that should be surveyed to obtain an

estimate that is within $200 of the true mean houshold income with 95 percent

confidence?

Solution:

The variance is known. 2

7,200, 0.05, 1.96,z 2

200zn

2 2

2

1.96 72004979

200 200n z

Example 4.10.4. [AP practice question, College Board] Courtney has constructed a

cricket out of paper and rubber bands. According to the instructions for making the

cricket, when it jumps it will land on its feet half of the time and on its back the other half

of the time. In the 50 jumps, Courtney’s cricket landed on its feet 35 times. In the next 10

jumps, it landed on its feet only twice. Based on this experience, Courtney can conclude

that

(A) the cricket was due to land on its feet less than half the time during the final 10 jumps,

since it had handed too often on its feet during the first 50 jumps.

(B) a confidence interval for estimating the cricket’s true probability of landing on its feet

is wider after the final 10 jumps than it was before the final 10 jumps.

(C) a confidence interval for estimating the cricket’s true probability of landing on its feet

after the final 10 jumps is exactly the same as it was before the final 10 jumps.

(D) a confidence interval for estimating the cricket’s true probability of landing on its feet

is more narrow after the final 10 jumps than it was before the final 10 jumps.

Page 20: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

20

(E) a confidence interval for estimating the cricket’s true probability of landing on its feet

based on the initial 50 jumps does not include 0.2, so there must be a defect in the

cricket’s construction to account for the poor showing in the final 10 jumps.

Solution:

The answer is D. The proportion is asumed to be 0.5p . The error term (width) of the

confidence interval is calculated by 2

(1 )p pz

n

. So, when the sample size is

increasing, the error term will be decreasing. That is, the CI is narrowing when sample

size is increasing.

Example 4.10.5. [2015 AP Stats FRQ, #2] To increase business, the owner of a restaurant

is running a promotion in which a customer’s bill can be randomly selected to receive a

discount. When a customer’s bill is printed, a program in the cash register randomly

determines whether the customer will receive a discount on the bill. The program was

written to generate a discount with a probability of 0.2, that is, giving 20 percent of the

bills a discount in the long run. However, the owner is concerned that the program has a

mistake that results in the program not generating the intended long-run proportion of 0.2.

The owner selected a random sample of bills and found that only 15 percent of them

received discounts. A confidence interval for p, the proportion of bills that will receive a

discount in the long run, is 0.15 0.06 . All conditions for inference were met.

a.). Consider the confidence interval 0.15 0.06 .

i. Does the confidence interval provide convincing statistical evidence that the

program is not working as intended? Justify your answer.

ii. Does the confidence interval provide convincing statistical evidence that the

program generates the discount with a probability of 0.2? Justify your answer.

A second random sample of bills was taken that was four times the size of the original

sample. In the second sample, 15 percent of the bills received the discount.

b.) Determine the value of the margin of error based on the second sample of bills that

would be used to compute an interval for p with the same confidence level as that of the

original interval.

Page 21: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

21

c) Based on the margin of error in part (b) that was obtained from the second sample,

what do you conclude about whether the program is working as intended? Justify your

answer.

Solution:

a.)

i. No. The assumed proportion is 0.2, and it is within the CI. So, there is no

statistical evidence to claim that the program is not working.

ii. No. Any number within CI could be the probability.

b.) 0.06

0.034

.

c.) Now the CI is 0.15 0.03 , so 0.2 is not within the CI. So, there is a convincing

evidence that the program is not working.

Example 4.10.6. [2011 AP Stats FRQ, #6] Every year, each student in a nationally

representative sample is given tests in various subjects. Recently, a random sample of

9,600 12th-grade students from US were administered a multiple-choice US history exam.

One of the multiple-choice questions is below. (The correct answer is C.)

Of the 9,600 students, 28 percent answered the multiple-choice question correctly.

a.). Let p be the proportion of all United States twelfth-grade students who would answer

the question correctly. Construct and interpret a 99 percent confidence interval for p.

Assume that students who actually know the correct answer have a 100 percent chance of

answering the question correctly, and students who do not know the correct answer to the

question guess completely at random from among the four options. Let k represent the

proportion of all United States twelfth-grade students who actually know the correct

answer to the question.

Page 22: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

22

b.) A tree diagram of the possible outcomes for a randomly selected twelfth-grade student

is provided below. Write the correct probability in each of the five empty boxes. Some of

the probabilities may be expressions in terms of k.

c.) Based on the completed tree diagram, express the probability, in terms of k, that a

randomly selected twelfth-grade student would correctly answer the history question.

d.) Using your interval from part (a) and your answer to part (c), calculate and interpret a

99 percent confidence interval for k, the proportion of all United States twelfth-grade

students who actually know the answer to the history question. You may assume that the

conditions for inference for the confidence interval have been checked and verified.

Solution:

a.). ˆ 0.280p , ˆ 9600(0.280) 2688 10np , ˆ(1 ) 9600(1 0.280) 6912 10n p ,

0.01 , 0.005

2

2.576z z , ˆ ˆ(1 )p p

n

.

Page 23: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

23

2

ˆ ˆ(1 )ˆ

0.280(1 0.280)0.280 2.576

9600

0.28 0.012

(0.268, 0.292)

p pp z

n

The CI indicates that 99% confidence that the population proportion is between 0.268 and

0.292. That is, we are 99 percent confident that the interval from 0.268 to 0.292 contains

the population proportion of all United States twelfth-grade students who would answer

this question correctly.

b.)

c.)

( _ )

( _ ) ( _ )

0.25(1 )

0.25 0.75

P Ans Corr

P Know Corr P Guess Corr

k k

k

d.)

Since ˆ 0.25 0.75p k , then

Page 24: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

24

ˆ0.268 0.292

0.268 0.25 0.75 0.292

0.024 0.056

p

k

k

We are 99 percent confident that the interval from 0.024 to 0.056 contains the proportion

of all United States twelfth-grade students who actually know the answer to the history

question.

Example 4.10.7. [2013 AP Stats FRQ, #1] An environmental group conducted a study to

determine whether crows in a certain region were ingesting food containing unhealthy

levels of lead. A biologist classified lead levels greater than 6.0 parts per million (ppm) as

unhealthy. The lead levels of a random sample of 23 crows in the region were measured

and recorded. The data are shown in the stemplot below.

a.) What proportion of crows in the sample had lead levels that are classified by the

biologist as unhealthy?

b.) The mean lead level of the 23 crows in the sample was 4.90 ppm and the standard

deviation was 1.12 ppm. Construct and interpret a 95 percent confidence interval for the

mean lead level of crows in the region.

Solution:

a.) 4

ˆ 0.17423

p

Page 25: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

25

b.) 4.90x , 23n 23 1 22df , 23

0.025

2

2.069dft t

1

2

1.124.90 2.069 4.90 0.483

23

(4.417, 5.383)

n sx t

n

CI

We can be 95% confident that the population mean lead level among all crows in this

region is between 4.416 and 5.384 parts per million

Page 26: IV. CONFIDENCE INTERVAL AND ESTIMATION

Mathacle

PSet ----- Stats, Confidence Intervals and Estimation

Level ---- 1

Number --- 1

Name: ___________________ Date: _____________

26

Quiz -- Confidence Interval