1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

31
1 HYPOTHESIS TESTING: HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT ABOUT TWO INDEPENDENT POPULATIONS POPULATIONS

Transcript of 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

Page 1: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

1

HYPOTHESIS TESTING:HYPOTHESIS TESTING:ABOUT TWO INDEPENDENT ABOUT TWO INDEPENDENT

POPULATIONSPOPULATIONS

Page 2: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

2

In this lecture, we are going to study the procedures for making inferences about two populations.

When comparing two populations we need two samples. Two basic kinds of samples can be used: independent and dependent. The dependence or independence of a sample is determined by the sources used for the data .

If the same set of sources is used to obtain the data representing different situations, we have dependent sampling. If two unrelated sets of sources are used, one set from each population, we have independent sampling.

Page 3: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

3

Two Independent PopulationsTwo Independent Populations

The Significance Test for The Difference Between Two Population Means

Mann Whitney U Test

The Significance Test for The Difference Between Two Population Proportions

2*2 Chi Square Tests

Page 4: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

4

The Significance Test for The Difference Between Two Population Means

Hypothesis testing involving the difference between two population means is most frequently employed to determine whether or not it is reasonable to conclude that the two are unequal. In such cases, one or the other of the following hypothesis may be formulated:

H0: 1= 2 Ha: 1 2(1)

H0: 1= 2 Ha: 1> 2(2)

H0: 1= 2 Ha: 1 < 2(3)

Page 5: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

5

The difference between two population means will be discussed in three different contexts:

1. When sampling is from normally distributed populations with known population variances

2. When sampling is from normally distributed populations with unknown population variances

3. When sampling is from populations that not normally distributed

Page 6: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

6

Sampling from Normally Distributed Populations: Population Variances Known

When each of two independent simple random samples has been drawn from a normally distributed population with a known variance, the test statistic for testing the null hypothesis of equal population means is

2

22

1

21

2121 )()(

nn

xxz

Page 7: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

7

Example

Researchers wish to know if the data they have collected provide sufficient evidence to indicate a difference in mean serum uric acid levels between normal individuals and individuals with mongolism. The data consist of serum uric acid readings on 12 mongoloid individuals and 15 normal individuals. The means are 4.5 mg/100 ml and 3.4 mg/100 ml. The data constitute two independent simple random samples each drawn from a normally distributed population with a variance equal to 1.

H0: 1= 2 Ha: 1 2

Page 8: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

8

2

22

1

21

2121 )()(

nn

xxz

82.239.0

1.1

15

1

12

1

0)4.35.4(

z

96.1025.02/ zz 82.2z< 0HReject

The two population means are not equal.

Page 9: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

9

Sampling from Normally Distributed Populations: Population Variances Unknown

When the population variances are unknown, two possibilities exist. The two population variances may be equal or unequal. When comparing two populations, it is quite natural that we compare their variances or standard deviations.

Page 10: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

10

2min

2max

S

SF

FF α1),-(n1),-(n 21 Variances are not equal

Variances are equal FF α1),-(n1),-(n 21

Testing the equality of two population variances:

Denominator

Degrees of

Freedom 1 2 3 4 5 ... 120 ... 1 161.4 199.5 215.7 224.6 230.2 ... 253.3 ... 254.3

2 18.5 19.0 19.16 19.25 19.30 ... 19.49 ... 19.50

3 10.13 9.55 9.28 9.12 9.01 ... 8.55 ... 8.53

... ... ... ... ... ... ... ... ... ...120 3.92 3.07 2.68 2.45 2.29 ... 1.35 ... 1.25

... ... ... ... ... ... ... ... ... ... 3.84 3.00 2.60 2.37 2.21 ... 1.22 ... 1.00

F Table(=0.05)

Numerator Degrees of Freedom

Page 11: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

11

2

2

1

2

2121 )()(

n

s

n

s

xxt

pp

2

)1()1(

21

222

2112

nn

snsns p

Population Variances Equal: When the population variances are unknown,

Page 12: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

12

Example A research team collected serum amylase data from a sample of healthy subjects and from a sample of hospitalized subjects. The data consist of serum amylase determination on 22 hospitalized subjects and 15 healthy subjects with mean 120 and 96 units/ml and standard deviation 40 and 35 units/ml, respectively. The data constitute two independent random samples, each drawn from a normally distributed population. The population variances are unknown. They wish to know if they would be justified in concluding that the population means are different.

31.11225

1600

S

SF

2min

2max

20.2F 5)(14,21,0.0 Variances are equal

Page 13: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

13

H0: 1- 2 =0 Ha: 1- 2 0

units/m 40s units/ml 120 11 xunits/m 35s units/ml 96 22 x

2

)1()1(

21

222

2112

nn

snsns p

145022215

35)115(40)122( 22

88.1

22

1450

15

1450

0)96120()()(

2

2

1

2

2121

n

s

n

s

xxt

pp

Page 14: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

14

0301.20.025) ,35(/2) ,2( 21 tt nn

Since ttable>tcalculated, accept H0.

tcalculated=1.88

The mean of serum amylase level of hospitalized subjects are not different from the mean of serum amylase levels of healthy subjects

Page 15: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

15

Population Variances Unequal: When two independent simple random samples have been drawn from normally distributed populations with unknown and unequal variances the test statistic for testing H0: 1= 2 is

2

22

1

21

2121 )()(

n

s

n

s

xxt

The critical value of t for level of significance and a two-sided test is approximately

/ns w,/ns w where 22221

211

21

22112/

ww

twtwt

/2)1,-(n2/2)1,-(n1 21t t, t t,

Page 16: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

16

Researchers wish to know if two populations differ with respect to the mean value of total complement activity (CH50). The data consist of total serum complement activity determinations of 20 apparently normal subjects and 10 subjects with disease. The sample means and standard deviations are 62.6 and 33.8 for normal subjects and 47.2 and 10.1 for subjects with disease.

Page 17: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

17

H0: 1- 2 =0 Ha: 1- 2 0

8.33s 6.62 11 x 1.10s 2.47 22 x

w1=33.82/10=114.244 and w2=(10.1)2/10=114.244

t1=2.2622 and t2=2.0930

41.1

20

1.10

10

8.33

0)2.476.62(22

t

255.21005.5244.114

)0930.2(1005.5)2622.2(244.1142/

t

-2.255<1.41<2.255

Accept H0.

Page 18: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

18

MANN-WHITNEY U TESTThe sign test discussed in the preceding lecture does not make full use of all the information present in the two samples when the variable of interest is measured on at least an ordinal scale. By reducing an observation’s information content to merely that of whether or not it fails above or below the common median is waste of information. If, for testing the desired hypothesis, there is available a procedure that makes use of more of the information inherent in the data, that procedure should be used if possible.

Page 19: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

19

Such a nonparametric procedure that can be used instead of the sign test is Mann Whitney U Test. Mann Whitney U Test is a nonparametric alternative for the significance test for difference between two independent population means. Since the test is based on the ranks of the observations it utilizes more information than does the sign test.

Page 20: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

20

The assumptions underlying the Mann-Whitney U Test are

as follows:

1. The two samples, of size n and m, respectively, available for analysis have been independently and randomly drawn from their respective populations.

2. The measurement scale is at least ordinal.

3. If the populations differ at all, they differ only with respect to their medians.

Page 21: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

21

1212

111

211 2

)1(

UnnU

Rnn

nnU

The calculation of the test statistic U is a two-step procedure. We first determine the sum of the ranks for the first sample. Then using this sum of ranks, we calculate a U score for each sample. The larger U score is the test statistic. The critical U value gets from the U table.

If the sample size is smaller than 20

Page 22: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

22

If the sample sizes are greater than 20, we can use the standard normal, z approximation. This is possible since the distribution of U is approximately normal with a mean

n1n2/2

and a standard deviation

12)1(

2

2121

21

nnnn

nnU

z

12)1( 2121 nnnn

Page 23: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

23

Example A researcher designed an experiment to assess the effects of prolonged inhalation of cadmium oxide. Fifteen laboratory animals served as experimental subjects, while 10 similar animals served as controls. The variable of interest was hemoglobin level following the experiment. The results are shown in the table. We wish to know if we can conclude that prolonged inhalation if cadmium oxide reduces hemoglobin level.

Page 24: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

24

Exposed animals

Rank Unexposed animals

Rank

14,4 7 17,4 24

14,2 6 16,2 17

13,8 2 17,1 23

16,5 19 17,5 25

14,1 4,5 15,0 8,5

16,6 16 16,0 15

15,9 14 16,9 22

15,6 12 15,0 8,5

14,1 4,5 16,3 18

15,3 10,5 16,8 21

15,7 13

16,7 20

13,7 1

15,3 10,5

14,0 3

yxa

yx

MMH

MMH

:

:0

Since n1 and n2 <20

25

125150

125

1452

)115(151015

1212

1

UnnU

U

From the table, critical value is 45 125 >45 Reject H0

Page 25: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

25

The Difference Between Two Population Proportions

The most frequent test employed relative to the difference between two population proportions is that their difference is zero. It is possible, however, to test that the difference is equal to some other value.

21 pp

2121

σ

)P(P)p(pz

21pp n

)p(1p

n

)p(1pσ

21

21

2211

nn

pnpnp

Page 26: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

26

In a study designed to compare a new treatment for migraine headache with the standard treatment, 78 of 100 subjects who received the standard treatment responded favorably. Of the 100 subjects who received the new treatment 90 responded favorably. Do these data provide sufficient evidence to indicate that the new treatment is more effective than the standard?

0:

0:

12

120

PPH

PPH

a

84.0100100

9078p

90.0100/90

78.0100/78

2

1

p

p

32.2

100

)16.0)(84.0(

100

)16.0)(84.0(

00.78)-.900(z

Z(0.05)=1.645>

The new treatment is more effective than the standard.

Page 27: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

27

2x2 Chi Square Test

We can use the chi-square test to compare frequencies or proportions in two or more groups. The classification according to two criteria, of a set of entities, can be shown by a table in which the r rows represents the various levels of of one criterion of classification and c columns represent the various levels of the second criterion. Such a table is generally called a contingency table.

We will be interested in testing the null hypothesis that in the population the two criteria of classification are independent or associated.

Page 28: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

28

2

1i

2

1j ij

2ijij2

E

)E(Oχ

N

OOE .ji.

ij

df = (r-1)(c-1)=1

+ -

1 O11

2

Total N

First criteriaTotal

Second Criteria

O12

O21 O22

O.1 O.2

O1.

O2.

Eij should be greater than or equal to 5.

Page 29: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

29

Is squint more common among children with a positive family history?Is there an association between squint and family history of squint?

+ -

+ 20 30 50

- 15 55 70

Total 35 85 120

SquintTotal

Family History

4.869χ 2

2(1,0.025)=5.024 > 4.869. Accept H0.

There is no relation between squint and family history

14.58 35.42

20.42 49.58

Page 30: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

30

If any expected frequencies are less than 5, then alternative procedure to called Fisher’s Exact Test should be performed.

Let O21=a, P(O21<a) should be calculated.

If P(f21<a) < ,then H1 is accepted.

+ -1 O11

2

Total N

First criteriaTotal

Second Criteria

O12

O21 O22

O.1 O.2

O1.

O2.

Page 31: 1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS.

31

0.06822! 10! 1! 6! 5!

11! 11! 16! 6!P1 0.006

22! 11! 0! 6! 5!

11! 11! 16! 6!P1

P(O211)=0.068+0.006=0.074

+ -

+ 5 6 11

- 1 10 11

Total 6 16 22

SquintTotalFamily

History + -

+ 6 5 11

- 0 11 11

Total 6 16 22

SquintTotalFamily

History