9 Confidence intervals Chapter9 p218 Confidence intervals Point estimation The first method...

18

Transcript of 9 Confidence intervals Chapter9 p218 Confidence intervals Point estimation The first method...

9Confidence intervals

Chapter9 p218

Confidence intervals

Point estimation The first method involves using the sample mean to estimate population meanPoint estimation does not provide any information about the variability of the estimator, we do not know how close sample mean is to population mean

Second method of estimation, known as interval estimation Range of values is called a confidence interval (CI)

9.1 Two-sided confidence intervals Given a random variable X, the central limit theorem states that

has a standard normal distribution (SND) if X is itself ND and an approximate SND if it is not but n is sufficiently large.

95% of the observations lie between -1.96 and 1.96, that isP(-1.96 Z 1.96) = 0.95≦ ≦

Substituting Z, rearranging the terms,

We are 95% confident that the interval will cover . Since the estimator is a random variable, therefore, the interval is random and has a 95% chance of covering .

n

XZ

/

95.0)96.196.1( n

Xn

XP

)96.1,96.1(n

Xn

X

X

Chapter9 p218

9.1 Two-sided confidence intervals

A 99% CI P(-2.58 Z 2.58) = 0.99≦ ≦

Approximately 99 out of 100 CIs obtained from 100 independent random samples of size n drawn from this population would cover the population mean. 99% CI > 95% CIA generic CI (z/2 , -z/2) = value that cuts off an area of /2 in the (upper, lower) tail of the SND. Therefore the general form for a 100%(1-) CI for is

)58.2,58.2(n

Xn

X

),( 2/2/n

zXn

zX

Chapter9 p218

9.1 Two-sided confidence intervals

As the sample size n increases, the standard error decreases, this results in a more narrow CI.

n/

n 95% CI for Length of CI

10 1.240100 0.392

1000 0.124

620.0x

196.0x

062.0x

9.1 Two-sided confidence intervals

Serum cholesterol levels for all males in the US who are hypertensive and who smoke. This distribution is approximately normal with an unknown mean , and s.d = 46 mg/100 ml.The 95% CI cover the population mean is

Suppose n = 12, these men have a mean serum cholesterol of 217 ml/100 ml. The 95% CI is (191, 243).

)46

96.1,46

96.1(n

Xn

X

The 95% CI is (191, 243). This CI has a frequency interpretation. Suppose the true mean serum cholesterol is 211 mg/100 ml, if we were to draw 100 random samples of the size 12 from this population and use each one to construct a 95% CI, we would expect that on average, 95 of the intervals would cover the population mean = 211 and 5 would not. varies from sample to sample. Although the centers of the intervals differ, they all have the same length. Each of the CIs that does not contain is marked by a dot.

X

Chapter9 p218

9.1 Two-sided confidence intervals

The 99% CI is given by

The length of the 99% CI is 251 – 183 = 68 mg/100 ml. How large a sample would be need t reduce the length of this interval to only 20 mg/100 ml ? that is we are interested in the sample size necessary to produce the interval (217 - 10, 217 + 10) = (207, 227).

We would require a sample of 141 men to reduce the length of the 99% CI to 20 mg/100 ml.

)251,183()12

4658.2,

12

4658.2217( X

8.14046

58.210 nn

Chapter9 p220

9.2 One-sided confidence intervals

Children who have lead poisoning tend to have much lower levels of hemoglobin than children who do not. Therefore we might be interested in finding an upper bound for . Find a 95% of the observations for the population mean, , lie above z = -1.645, that is, P(Z -1.645) = 95%, and assume ≧ = 0.85 mg / 100 ml.

If the sample mean = 10.6 mg/100 ml, then the upper bound is

95.0)645.1(

95.0)645.1/

(

nXP

n

XP

mlmg 100/8.1074

85.0645.16.10

Chapter9 p222

9.3 Student’s t distribution

If the population is unknown, we can estimate by s. The ratio

does not have a SND, instead it is known as Student’s t distribution with n-1 degree of freedom (df). We denotes this using the notation tn-1. the t distribution is unimodal and symmetric around its mean 0.

ns

Xt

/

Chapter9 p222

9.3 Student’s t distribution

t distribution has a thicker tails than the SND

Smaller df more spread outLarger df approaches SND, because as n increases, s

9.3 Student’s t distribution

The values of tn-1 that cut off the upper 2.5% of the distribution with the various df.

For the SND, z = 1.96 marks the upper 2.5% of the distribution. Observe that as n increases tn-1 approaches this value.

df(n-1) tn-1

1 12.706

2 4.303

5 2.571

10 2.228

20 2.086

30 2.042

40 2.021

60 2.000

120 1.980

∞ 1.960

9.3 Student’s t distributionExample n =10, the population of infants receiving antiacids that contain aluminum. These antacids are often used to treat peptic or digestive disorders, The distribution of plasma aluminum levels is known to be approximately normal. A 95% CI for is

If is known and it is 7.13 mg/L, the 95% CI for would have been

Most of the time

because of variability, it is possible that s < for a given sample

)3.42,1.32(

)10

13.7262.22.37,

10

13.7262.22.37()

10262.2,

10262.2(

s

Xs

X

)6.41,8.32(

)10

13.796.12.37,

10

13.796.12.37(

SNDt CICIn

)()(1

Chapter9 p224

9.3 Student’s t distribution

LHS of Fig. 9.3 ( = 46 mg/100ml) = Fig. 9.1

RHS of Fig. 9.3 ( is unknown) shows 100 additional intervals using the same samplesAgain, 95 of the CIs contain m, and the other 5 do not. Note that this time, the intervals vary in length.

Chapter9 p226

Chapter9 p226

9.4 Further applications

Chapter9 p227

9.4 Further applications