Math 3680 Lecture #17 Two-Sample Inference

32
Math 3680 Lecture #17 Two-Sample Inference

description

Math 3680 Lecture #17 Two-Sample Inference. Two-Sample Data: Matched Pairs. - PowerPoint PPT Presentation

Transcript of Math 3680 Lecture #17 Two-Sample Inference

Page 1: Math 3680 Lecture #17 Two-Sample Inference

Math 3680

Lecture #17

Two-Sample Inference

Page 2: Math 3680 Lecture #17 Two-Sample Inference

Two-Sample Data:

Matched Pairs

Page 3: Math 3680 Lecture #17 Two-Sample Inference

Example. An industrial safety program was recently instituted. Ten similar plants recorded the average weekly loss (averaged over a month) in man-hours due to accidents. The chart shows the results both before and after the safety program was implemented.

Is the data statistically significant for the effectiveness of the safety program?

Plant Before After1 30.5 232 18.5 213 24.5 224 32 28.55 16 14.56 15 15.57 23.5 24.58 25.5 219 28 23.510 18 16.5

Page 4: Math 3680 Lecture #17 Two-Sample Inference

Note: We encountered this kind of problem before with the sign test. However, the sign test did not take into account the magnitudes of the differences between the before and after data; instead, the sign test only looked at which one was larger.

Using our improved techniques of hypothesis testing, we are now able to give more a more powerful test to determine the effectiveness of the safety program.

Page 5: Math 3680 Lecture #17 Two-Sample Inference

Solution: Let X denote the before data, and Y the after data. Let D = X - Y, the difference between the two.

H0 : D = 0

Ha : D > 0

Significance level: = 0.05.

Page 6: Math 3680 Lecture #17 Two-Sample Inference

The problem now reduces to regular one-variable hypothesis testing, which we know well by now.

Using the P-value method, we reject the null hypothesis. It appears that the safety program was effective.

Plant Before After Difference Avg Diff1 30.5 23 7.5 2.152 18.5 21 -2.53 24.5 22 2.5 SD Diff4 32 28.5 3.5 3.000462935 16 14.5 1.56 15 15.5 -0.5 Test Statistic7 23.5 24.5 -1 2.265949338 25.5 21 4.59 28 23.5 4.5 P Value

10 18 16.5 1.5 0.02484552

Page 7: Math 3680 Lecture #17 Two-Sample Inference

Example. Mechanical science engineers studied the impact of infrasound (sound waves at a frequency below the audibility range of the human ear) on a person’s blood pressure. Five university students were exposed to infrasound for one hour. See table.

a)Does it appear that the mean systolic blood pressure changed as a result of the infrasound?

b)Find a 95% (two-sided) confidence interval for the mean difference in blood pressure.

Student Before After1 105 1182 113 1293 106 1174 126 1345 113 115

C. Y. H. Qibai and H. Shi, Journal of Low Frequency Noise, Vibration and Active Control, Vol. 23 (2004)

Page 8: Math 3680 Lecture #17 Two-Sample Inference
Page 9: Math 3680 Lecture #17 Two-Sample Inference

Two-Sample Data:

Independent Samples with

Different Variances

Page 10: Math 3680 Lecture #17 Two-Sample Inference

Previously, we had problems in which the data was obviously paired. However, it’s not uncommon to compare two different data sets which are not paired.

Example. The Ohio EPA collected Index of Biotic Integrity (IBI) measurements for sites located in two Ohio river basins; high IBIs indicate healthier fish populations. Does it appear that the IBI values are the same for both locations?

E. L. Boone, Y. Keying and E. P. Smith, Journal of Agricultural, Biological, and Environmental Sciences, Vol. 10 (2005)

River Basin Sample Size Mean SDMuskingum 53 0.035 1.046

Hocking 51 0.34 0.96

Page 11: Math 3680 Lecture #17 Two-Sample Inference

Notice that this is a different problem than the one-sample problems that we saw earlier. Before, a typical question would be “Is the mean less than 0.4?”. Now, the question is, “Is there a difference?”

For such problems, we can use all of the previous machinery of confidence intervals and hypothesis testing. However, a couple of things will be different:

• The computation of the standard error (and hence the test statistic), and

• The computation of the number of degrees of freedom (when using the Student’s t-distribution).

Page 12: Math 3680 Lecture #17 Two-Sample Inference

Let’s define As discussed in the past,

We will typically be testing if the means are equal, in which case the null hypothesis will be D = 0.

.YXD

Y

Y

X

XD

Y

Y

X

X

YXD

n

s

n

sSE

nnYVarXVar

YVarXVarDVar

YEXE

222

22

,

)()(

)()()(

)()(

Page 13: Math 3680 Lecture #17 Two-Sample Inference

Furthermore, we will use Welch’s formula for computing the number of degrees of freedom:

rounded down to the nearest integer. (There isn’t precise agreement on this, but we’ll defer this discussion to a more advanced statistics class.)

,

11

11

2222

222

Y

Y

YX

X

X

Y

Y

X

X

ns

nns

n

ns

ns

df

Page 14: Math 3680 Lecture #17 Two-Sample Inference

Example. The Ohio EPA collected Index of Biotic Integrity (IBI) measurements for sites located in two Ohio river basins; high IBIs indicate healthier fish populations. Does it appear that the IBI values are the same for both locations?

E. L. Boone, Y. Keying and E. P. Smith, Journal of Agricultural, Biological, and Environmental Sciences, Vol. 10 (2005)

River Basin Sample Size Mean SDMuskingum 53 0.035 1.046

Hocking 51 0.34 0.96

Page 15: Math 3680 Lecture #17 Two-Sample Inference

Solution.

H0 : M = H, or D = 0.

Ha : M = H, or D 0.

Critical value: = 0.05.

River Basin Sample Size Mean SDMuskingum 53 0.035 1.046

Hocking 51 0.34 0.96

Page 16: Math 3680 Lecture #17 Two-Sample Inference

Now the cumbersome part:

River Basin Sample Size Mean SDMuskingum 53 0.035 1.046

Hocking 51 0.34 0.96

196759.051

)96.0(

53

)046.1( 22

22

H

H

M

MD n

s

n

sSE

Page 17: Math 3680 Lecture #17 Two-Sample Inference

so we use 101 degrees of freedom.

River Basin Sample Size Mean SDMuskingum 53 0.035 1.046

Hocking 51 0.34 0.96

,776.101

51)96.0(

501

53)046.1(

521

51)96.0(

53)046.1(

11

11

2222

222

2222

222

H

H

HM

M

M

H

H

M

M

ns

nns

n

ns

ns

df

Page 18: Math 3680 Lecture #17 Two-Sample Inference

Test statistic:

The critical values are 1.98373.

River Basin Sample Size Mean SDMuskingum 53 0.035 1.046

Hocking 51 0.34 0.96

.55012.1196759.0

0)34.0035.0(

D

D

SE

Dt

Page 19: Math 3680 Lecture #17 Two-Sample Inference

-1.984 -1.55 1.55 1.984

0.1

0.2

0.3

0.4

12.42%

We fail to reject the null hypothesis. There is not enough evidence to think that the mean IBI values are different at these two locations.

Page 20: Math 3680 Lecture #17 Two-Sample Inference

Example. While carpets are nice in hospitals, they may not be sanitary. In a Montana hospital, bacteria levels per cubic foot of air were tested in 8 carpeted and uncarpeted rooms. a) Are the bacteria levels in the uncarpeted rooms lower than in the carpeted rooms?

b) Find a 95% confidence interval for the difference in the mean number of bacteria per cubic foot of air.

W. G .Walter and A. Stober, Journal of Environmental Health, Vol. 30,p. 405 (1968)

Carpeted 11.8 8.2 7.1 13.0 10.8 10.1 14.6 14.0Uncarpeted 12.1 8.3 3.8 7.2 12.0 11.1 10.1 13.7

Page 21: Math 3680 Lecture #17 Two-Sample Inference
Page 22: Math 3680 Lecture #17 Two-Sample Inference

Two-Sample Data:

Testing for Proportions with Independent Samples

Page 23: Math 3680 Lecture #17 Two-Sample Inference

Example. A mobile computer network consists of computers that maintain wireless communication with one another as they move about a given area. Two different protocols are compared. With protocol A, 170 of 200 (85%) sent messages were successfully received. With protocol B, 123 of 150 (82%) sent messages were successfully received. Can we conclude that protocol A has the higher success rate?

T. Camp et. al., Proceedings of the IEEE International Conference on Communications pp. 3318-3324 (2002)

Page 24: Math 3680 Lecture #17 Two-Sample Inference

This problem is a case where the X and Y populations measure proportions which are assumed to be equal under the null hypothesis. Then

YXYXD

YX

D

nnpp

nnSE

nnYVarXVarDVar

YEXE

11)1(

11)1(

)1()1()()()(

0)()(

In the last formula, the estimate p is pooled, meaning that we compute the total number of successes over the total number of trials.

Page 25: Math 3680 Lecture #17 Two-Sample Inference

Also, just like our previous problems regarding proportions, we use the normal distribution and not the Student t-distribution. Our test statistic will therefore be labeled z.

As a consequence, we do not have to compute the degrees of freedom for these kind of problems.

Page 26: Math 3680 Lecture #17 Two-Sample Inference

Example. (Repeated for convenience) A mobile computer network consists of computers that maintain wireless communication with one another as they move about a given area. Two different protocols are compared. With protocol A, 170 of 200 (85%) sent messages were successfully received. With protocol B, 123 of 150 (82%) sent messages were successfully received. Can we conclude that protocol A has the higher success rate?

T. Camp et. al., Proceedings of the IEEE International Conference on Communications pp. 3318-3324 (2002)

Page 27: Math 3680 Lecture #17 Two-Sample Inference

Solution.

H0 : X = Y

Ha : X Y

Significance level: = 0.05

Under the null hypothesis, the proportion of received messages are the same under both protocols, and thus the population variance 1 is the same.

Page 28: Math 3680 Lecture #17 Two-Sample Inference

For p, we use the pooled proportion from both samples:

0399.0150

1

200

1)163.0)(837.0(

11)1(

837.0150200

123170

YXD nn

ppSE

p

Page 29: Math 3680 Lecture #17 Two-Sample Inference

Test statistic:

The critical values are 1.96.

75222.00399.0

0)82.085.0(

D

D

SE

Dz

Page 30: Math 3680 Lecture #17 Two-Sample Inference

-1.96 -0.75188 0.75188 1.96

0.1

0.2

0.3

0.4

-1.96 -0.75188 0.75188 1.96

0.1

0.2

0.3

0.4

45.21%

We fail to reject the null hypothesis. There is not enough evidence to think that the proportions of successful deliveries are different.

Page 31: Math 3680 Lecture #17 Two-Sample Inference

Example. Researchers studied coliform bacteria counts among particles found in wastewater samples. Of 161 particles that were 75-80 m in diameter, 19 contained coliform bacteria. Of 95 particles that were 90-95 m in diameter, 22 contained coliform bacteria. Can we conclude that the larger particles are more likely to contain coliform bacteria?

R. Emerick et. al., Water Environment Research pp. 432-438 (2000)

Page 32: Math 3680 Lecture #17 Two-Sample Inference