9 Confidence intervals Chapter9 p218 Confidence intervals Point estimation The first method...
-
Upload
gordon-roger-conley -
Category
Documents
-
view
227 -
download
0
Transcript of 9 Confidence intervals Chapter9 p218 Confidence intervals Point estimation The first method...
Chapter9 p218
Confidence intervals
Point estimation The first method involves using the sample mean to estimate population meanPoint estimation does not provide any information about the variability of the estimator, we do not know how close sample mean is to population mean
Second method of estimation, known as interval estimation Range of values is called a confidence interval (CI)
9.1 Two-sided confidence intervals Given a random variable X, the central limit theorem states that
has a standard normal distribution (SND) if X is itself ND and an approximate SND if it is not but n is sufficiently large.
95% of the observations lie between -1.96 and 1.96, that isP(-1.96 Z 1.96) = 0.95≦ ≦
Substituting Z, rearranging the terms,
We are 95% confident that the interval will cover . Since the estimator is a random variable, therefore, the interval is random and has a 95% chance of covering .
n
XZ
/
95.0)96.196.1( n
Xn
XP
)96.1,96.1(n
Xn
X
X
Chapter9 p218
9.1 Two-sided confidence intervals
A 99% CI P(-2.58 Z 2.58) = 0.99≦ ≦
Approximately 99 out of 100 CIs obtained from 100 independent random samples of size n drawn from this population would cover the population mean. 99% CI > 95% CIA generic CI (z/2 , -z/2) = value that cuts off an area of /2 in the (upper, lower) tail of the SND. Therefore the general form for a 100%(1-) CI for is
)58.2,58.2(n
Xn
X
),( 2/2/n
zXn
zX
Chapter9 p218
9.1 Two-sided confidence intervals
As the sample size n increases, the standard error decreases, this results in a more narrow CI.
n/
n 95% CI for Length of CI
10 1.240100 0.392
1000 0.124
620.0x
196.0x
062.0x
9.1 Two-sided confidence intervals
Serum cholesterol levels for all males in the US who are hypertensive and who smoke. This distribution is approximately normal with an unknown mean , and s.d = 46 mg/100 ml.The 95% CI cover the population mean is
Suppose n = 12, these men have a mean serum cholesterol of 217 ml/100 ml. The 95% CI is (191, 243).
)46
96.1,46
96.1(n
Xn
X
The 95% CI is (191, 243). This CI has a frequency interpretation. Suppose the true mean serum cholesterol is 211 mg/100 ml, if we were to draw 100 random samples of the size 12 from this population and use each one to construct a 95% CI, we would expect that on average, 95 of the intervals would cover the population mean = 211 and 5 would not. varies from sample to sample. Although the centers of the intervals differ, they all have the same length. Each of the CIs that does not contain is marked by a dot.
X
Chapter9 p218
9.1 Two-sided confidence intervals
The 99% CI is given by
The length of the 99% CI is 251 – 183 = 68 mg/100 ml. How large a sample would be need t reduce the length of this interval to only 20 mg/100 ml ? that is we are interested in the sample size necessary to produce the interval (217 - 10, 217 + 10) = (207, 227).
We would require a sample of 141 men to reduce the length of the 99% CI to 20 mg/100 ml.
)251,183()12
4658.2,
12
4658.2217( X
8.14046
58.210 nn
Chapter9 p220
9.2 One-sided confidence intervals
Children who have lead poisoning tend to have much lower levels of hemoglobin than children who do not. Therefore we might be interested in finding an upper bound for . Find a 95% of the observations for the population mean, , lie above z = -1.645, that is, P(Z -1.645) = 95%, and assume ≧ = 0.85 mg / 100 ml.
If the sample mean = 10.6 mg/100 ml, then the upper bound is
95.0)645.1(
95.0)645.1/
(
nXP
n
XP
mlmg 100/8.1074
85.0645.16.10
Chapter9 p222
9.3 Student’s t distribution
If the population is unknown, we can estimate by s. The ratio
does not have a SND, instead it is known as Student’s t distribution with n-1 degree of freedom (df). We denotes this using the notation tn-1. the t distribution is unimodal and symmetric around its mean 0.
ns
Xt
/
Chapter9 p222
9.3 Student’s t distribution
t distribution has a thicker tails than the SND
Smaller df more spread outLarger df approaches SND, because as n increases, s
9.3 Student’s t distribution
The values of tn-1 that cut off the upper 2.5% of the distribution with the various df.
For the SND, z = 1.96 marks the upper 2.5% of the distribution. Observe that as n increases tn-1 approaches this value.
df(n-1) tn-1
1 12.706
2 4.303
5 2.571
10 2.228
20 2.086
30 2.042
40 2.021
60 2.000
120 1.980
∞ 1.960
9.3 Student’s t distributionExample n =10, the population of infants receiving antiacids that contain aluminum. These antacids are often used to treat peptic or digestive disorders, The distribution of plasma aluminum levels is known to be approximately normal. A 95% CI for is
If is known and it is 7.13 mg/L, the 95% CI for would have been
Most of the time
because of variability, it is possible that s < for a given sample
)3.42,1.32(
)10
13.7262.22.37,
10
13.7262.22.37()
10262.2,
10262.2(
s
Xs
X
)6.41,8.32(
)10
13.796.12.37,
10
13.796.12.37(
SNDt CICIn
)()(1
Chapter9 p224
9.3 Student’s t distribution
LHS of Fig. 9.3 ( = 46 mg/100ml) = Fig. 9.1
RHS of Fig. 9.3 ( is unknown) shows 100 additional intervals using the same samplesAgain, 95 of the CIs contain m, and the other 5 do not. Note that this time, the intervals vary in length.