6.3 One- and Two- Sample Inferences for Means. If is unknown Estimate by sample standard deviation...

39
6.3 One- and Two- Sample Inferences for Means If isknow n a 95% Confidence Intervalis 1.96 1.96 x x x SE n But isnever“know n”.

description

What is the correct multiplier? “t” 100(1- a )% confidence interval when s is unknown 95% CI =100( )% confidence interval when s is unknown

Transcript of 6.3 One- and Two- Sample Inferences for Means. If is unknown Estimate by sample standard deviation...

Page 1: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

6.3 One- and Two- Sample Inferences for Means

If is known a 95% Confidence Interval is

1.96 1.96 xx x SEn

But is never “known”.

Page 2: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

If σis unknown• Estimate σby sample standard deviation s• The estimated standard error of the mean will be

• Using the estimated standard error we have a confidence interval of

• The multiplier needs to be bigger than Z (e.g., 1.96 for 95% CI). The confidence interval needs to be wider to take into account the added uncertainty in using s to estimate s.

• The correct multipliers were figured out by a Guinness Brewery worker.

nsSE /

)____(n

sx

Page 3: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

What is the correct multiplier? “t”

• 100(1-a)% confidence interval when s is unknown

• 95% CI =100(1-0.05)% confidence interval when s is unknown

( / )x t s n

0.975 ( / )x t s n

Page 4: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Properties of t distribution

• The value of t depends on how much information we have about s. The amount of information we have about s depends on the sample size.

• The information is “degrees of freedom” and for a sample from one normal population this will be: df=n-1.

Page 5: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

t curve and z curve

Both the standard normal curve N(0,1) (the z distribution), and all t(v) distributions are density curves, symmetric about a mean of 0, but t distributions have more probability in the tails.

As the sample size increases, this decreases and the t distribution more closely approximates the z distribution. By n = 1000 they are virtually indistinguishable from one another.

Page 6: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Quantiles of t distribution

t table is given in the book: Table B.4

It depends on the degrees of freedom as welldf probability t5 0.90 1.47610 0.95 1.812 20 0.99 2.52825 0.975 2.060 ∞ 0.975 1.96

Page 7: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Confidence interval for the mean when s is unknown

s sx t x tn n

Page 8: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Example

• Noise level, n=12 74.0 78.6 76.8 75.5 73.8 75.6 77.3 75.8 73.9 70.2 81.0 73.9 1. Point estimate for the average noise level of

vacuum cleaners;2. 95% Confidence interval

Page 9: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Solution

• n=12, • Critical value with df=11

• 95% CI:

53.75x 75.2s

0.975 2.201t

75.153.751275.2201.253.75

28.7778.73

Page 10: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Example 8 (page 366) Failure times of 10 springs. The normal plot looks fairly straight. (If not, try transforming or a different distribution, e.g. Weibull)

168.333.1

33.1 10.4710

90% CI: 168.3 1.833*10.47 168.3 19.2 149.1 to 187.5

xs

SE

If we were to test Ho: 150 vs Ha: 150 , we would not reject H0, since 150 is in the confidence interval for .

Page 11: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

To test #: oH Analogous to the large sample test with z test statistic

# xz

n

We would have #xT s

n

Determination of reject / don’t reject Ho as well as p-values are found use T-table with 1 ndf

Page 12: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

We could do the test using:

33.1 10.4710

168.3 150 18.3 1.7410.47 10.47

1.383 1.74 1.833 (0.9) test statistic (.95)0.05 ( 1.74) 0.1

( 1.74)*20.1 0.2

SE

t

Q QP t

p value P tp value

But the confidence interval is more informative.

Page 13: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

On the other hand

• t0.95,9 =1.833, t0.05,9 =-1.83• If we have a test statistic value that is either

too small (<-1.83) or too big (>1.83), then we have strong evidence against H0.

• t=1.74 which is not too small or too big (compared to the cutoff values above/ ”critical values”) , then we cannot reject the null.

Page 14: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Another method: Rejection Regions

Alternative Hypotheses

> 0 < 0 0

Rejection Regions

z>z

-----------------

t>t

z<-z

---------------

t<-t

z>z/2 orz<-z/2

------------------

t>t/2 ort<-t/2

Page 15: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Rejection Region method and p-value method

• For Ha: <0, if z test statistic is less than -1.645, then the p-value is less than 0.05. Comparing the p-value to 0.05 is the same as comparing the z value to -1.645.

• For t tests we can also find some critical values corresponding to level of that we can compare to our test statistic.

• Test statistic in the rejection region is the same as p-value is less than .

Page 16: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Paired Data

12

3

4

5

6

T=top water zinc concentration (mg/L)B=bottom water zinc (mg/L)

1 2 3 4 5 6Top 0.415 0.238 0.390 0.410 0.605 0.609Bottom 0.430 0.266 0.567 0.531 0.707 0.716

1982 study of trace metals in South Indian River. 6 random locations

Page 17: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

6.3.2 Paired Mean DifferenceTo compare Top & Bottom Water Zinc from a River Location Bottom Top 1 0.430 0.415 2 0.266 0.238 3 0.567 0.390 4 0.531 0.410 5 0.707 0.605 6 0.716 0.609 That is equivalent to ask: is it true that difference>0?

Page 18: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

This is a special case of the mean of a single column of numbers . Create a new column for the difference between 2 variables. Top & Bottom Water Zinc from a River Location Bottom Top Difference = d 1 0.430 0.415 0.015 2 0.266 0.238 0.028 3 0.567 0.390 0.177 4 0.531 0.410 0.121 5 0.707 0.605 0.102 6 0.716 0.609 0.107

061.0 092.0 dsd

Page 19: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Check normality Ordered id 0.015 0.028 0.102 0.107 0.121 0.177 Z Quantiles -1.38 -0.67 -0.21 0.21 0.67 1.38

Zinc in River

-2

-1.5-1

-0.50

0.51

1.5

0 0.05 0.1 0.15 0.2

Zinc

Stan

dard

Nor

mal

Q

uant

iles

Series1

Note: Even normal plots from random normal data are not perfectly straight

Page 20: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

n = 6 id values 516 df 95% Confidence Interval t = 2.571

0.156 to028.0064.0095.0

)025.0(571.2092.0

025.0 061.0s 092.0 d

n

sSEd dd

Page 21: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

By the usual hypothesis testing perspective, the p-value for 0: vs0: HaHo d is less than 0.05, since 0d is not in a 95%

confidence interval. Our results would be “statistically significant” evidence against 0: doH .

d0.092 s 0.061 0.025

0.092 0 3.680.025

| | 2.5712*0.005 2*0.010.01 0.02

dd

sd SEn

t

tp

p

Page 22: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

In hypothesis testing notation the p-value must be less than to claim statistical significance. A test is significant at the level if the Ho value is not in the 100(1- )% confidence interval. What about = 0.01 level? p-value > 0.01 Don’t reject H0

level 0.01at t significanlly statisticaNot 0.192 to0.008-0.1000.092

0.1000.092 )025.0(032.4092.0

Page 23: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Assumptions: The population of differences follows a normal distribution. A normal plot of differences, d’s, should be fairly straight. Note: We don’t need B or T to be normal.

Page 24: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Tell when to reject H0: μ = 120 using a t-test. Answers would be of the form

reject H0 when t < -1.746 or maybe reject H0 when |t| > 1.746 or maybe reject H0 when t > 1.746

(a) HA: μ < 120, α = 0.05, n=20

(b) HA: μ > 120, α = 0.10, n=18

(c) HA: μ ≠ 120, α = 0.01, n=9

Rejection region exercise

Page 25: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Answers would be of the form 0.10 < p < 0.05 p < 0.001 p > 0.8 After finding the p-value in each case, tell whether to reject or not reject H0 at the α = 0.05 level.

(a) HA: μ > 120, n=7, t = -2.58

(b) HA: μ < 120, n=7, t = -2.58

(c) HA: μ ≠ 120, n=7, t = -2.58

Find p-value exercise

Page 26: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

6.3.3 Large Sample Comparisons of Two Means

Glue 1: 211 ,

Glue 2: 222 , Both populations normal

1n independent values for glue 1 2n independent values for glue 2

Not paired, blocked, …

Page 27: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

21 xx is our guess at 21

How much might 21 xx deviate from 21 ?

2

22

1

21

22

121 )()1()()(nn

xVarxVarxxVar

Page 28: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Experiment 1x 2x 21 xx 1 10.1 11.2 -1.1 2 11.4 10.6 0.8 3 12.2 10.4 1.8 … ∞ … … …

Mean 1 2 21

Variance 1

21

n

2

22

n

2

22

1

21

nn

21SE 2

2SE 22

21 SESE

A confidence interval for 21 is given by

2 22 21 2

1 2 1 2 1 21 2

x x z x x z SE SEn n

And for hypothesis testing1 2 1 2

2 21 2

1 2

( ) x xz

n n

But we never really know

Page 29: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

6.3.4 Small Sample Comparisons of Two Means

The confidence intervals need to be widened to account for additional uncertainty in 2

1s and 22s as estimators of 2

1 and 22 .

Case 1: Assume equal variances. 22

121

Case 2: Don’t assume variances are necessarily equal. But they may be.

Page 30: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Case 1: Both 21s and

21s are estimators of

2 .

Pool 21s and

21s into a pooled, combined estimate of

2ps = weighted average of

21s and

21s , weight by df.

)1()1()1()1(

21

222

2112

nn

snsnsp

Page 31: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

2121

21

221

2

2

1

2

21

11

11

nntsxx

nnstxx

ns

ns

txx

p

p

pp

For the tabled value of t, use df = (n1-1) + (n2-1)

Page 32: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

To test #: 21 oH , check if # in confidence interval or use

21

2

21

11

#

nns

xxT

p

And compare to T-table with df = (n1-1) + (n2-1) .

Page 33: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Case 2: Don’t assume 2 2 21 2

2 2

2 21 21 2 1 2 1 2

1 2

or s sx x t x x t SE SEn n

22 2

1 2

1 24 41 2

2 21 1 2 2( 1) ( 1)

df

s sn n

s sn n n n

1 2

2 21 2

#

x xT

SE SE

2 21 2 1 2

1 2

1 1: With n, o pNote n n nly df change SE SE sn n

Page 34: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Lifetime of Springs. Table 6.7 and Figure 6.15

Springs Figure 6.15

-2.000

-1.500

-1.000

-0.500

0.000

0.500

1.000

1.500

2.000

100 150 200 250 300

Lifetime

Nor

mal

Qua

ntile

900 Stress

950 Stress

900 Stress 950 Stress 216 162 225 171 153 216 198 189 225 216 189 135 306 225 162 135 243 189 117 162

Page 35: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

1 2

1 2

1 2

1 2

10 10215.5 168.342.9 33.1 13.57 10.47

n nx xs sSE SE

Note: Usually lifetimes are more lognormal than normal. To follow the book’s example, carry on in time scale.

Page 36: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Case 1: Assume 2 2 21 2

1.17101

1011468

1899

3.38

14682

1.339.4299

)1.33(9)9.42(9

21

22222

xxVar

df

s

s

p

p

215.1-168.3 ± 2.101(17.1) 46.8 ± 36.0 10.8 to 82.8 Based on the confidence interval, we reject 0: 21 oH vs 0: 21 oH at = 0.05 level.

Page 37: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Alternatively, to test 0: 21 oH vs 0: 21 oH

215.1 168.6 2.7 9 9 1817.1

2 0.005 2 0.010.01 0.02

t df

pp

To test 0: 21 oH vs 0: 21 oH , 0.005 < p < 0.01. To test 0: 21 oH vs 1 2: 0oH , 0.99 < p < 0.995.

Page 38: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.

Case 2: Not assuming 221

21

2 21 2

1 2

2 21 2

1 2

16.9 17

17.1

: With n, o

1 1 p

df

SE SE

Note n n nly df change

SE SE sn n

46.8 ± 2.110(17.1) Wider CI

Page 39: 6.3 One- and Two- Sample Inferences for Means. If  is unknown Estimate  by sample standard deviation s The estimated standard error of the mean will.