Comparison of Means

7/29/2019 Comparison of Means

1/50

T distribution

Student t-test

1


2/50

Steps in hypotheses testing concerning

SteSteps Examples

1. Set up hypotheses

Select level of significance

Ho: = oH1 : o

=0.05

2. Select the appropriate statistics Z= X- o/n

3. Generate decision rule Reject Ho if Z Z1-Do not reject Hoif Z Z1-

4. Compute the value test value

5. Draw conclusion about Ho by

comparing the test value (4) to the

decision rule (3).

2


3/50

Test of hypothesis concerning

Assumptions: Normal distribution or large sample (n30)

Simple random samples

Case 1: known x z ( /n)

Case 2. unknown andn 30 x z (s/n)

Case 3: unknown andn 30

X (s/n), df = n-1

3


4/50

Estimation of two samples is concerned withestimating (1-2), the difference in meansbetween groups/populations

Tests of hypotheses in the two sample caseare also concerned with the difference in themeans.

E.g. Ho : 1-2= 0 ( no difference in means) vsH1: 1-2 0 (means are different)

Or H1: 12 (the mean of population 1 is

larger than the mean of population 2) Or H1: 12 (the mean of population 1 is

smaller than the mean of population 2)

4


5/50

General format

Test value =(observed)-(expected)_____________________________

Standard error

1 - 2 is the observed difference, and 1-2 is theexpected difference which is 0 when the null

hypothesis is 1= 2, since the equivalent of 1-2 = 0

Standard error of the difference is (1/n1)+(2/n2)

5


6/50

If12 and2

2 are not known, the researcher

can use the variances s12and s2

2 obtained from

sample respectively, provided the sample sizesmust be 30 or more. The formula then is:

1 - 2 = difference in means

______

(s12/n1)+(s2

2/n2) = standard error

of difference in

meansProvided n1 30 and n2 30.

6


7/50

In comparison between two means, the same

basic steps of hypothesis testing for Z are

followed. When comparing two means by using t-test,

the researcher must decide if the two samples

are independent or dependent. Two assumptions of difference between two

means:

The sample must be independent of each other The population from which the samples were

drawn must be normally distributed

7


8/50

Student t-test

1. Testing the difference between two means

: independent large samples

2. Testing the difference between two

means: independent small samples

3. Testing the difference two means: small

dependent samples

8


9/50

Testing between two means of

independent large samples

general formula

Test statistic

12 (S1

2 /n1+ S22/n2)

Confidence Interval:

( - ) = Z1-/2 (se)

9


10/50

Example 1.

A survey found that the average hotel room

rate in Zaria is N88.42 and the average room

rate in Funtua is N80.61. Assume that the data

were obtained from two samples of 50 hotels

each and the standard deviation were N5.62

and N4.83, respectively. At = 0.05, can it be

concluded that there is a significant differencein the rates?

10


11/50

solution

Step 1 State the hypotheses

Ho : 1 = 2 and H: 1 2 (claim)

Step 2. Find the critical value

Z = 1.96 Compute the test value

1 - 2/(S12/n1)+(S1

2/n2), thus substitution

88.42-80.61/(5.622

/50)+(4.83

2

/50)7.81/(31.5844/50)+(23.3289/50)

7.81/(0.6317)+(0.4666)

7.81/(1.0982)11


12/50

7.81/(1.0983)

7.81/1.048 (note 1.048 is se)

t= 7.4523

Step 4. Make the decision. If tcalc > ttab Reject

the null hypothesis (Ho)

Step 5. Summarize the result

There is no enough evidence to support

the claim that the means are not equal. Hence

there is significant difference in the rates.

12


13/50

Fixing of confidence limit

1 - 2 Z *(S12/n1)+(S1

2/n2)

88.42-80.61 1.96*(5.622/50)+(4.832/50)

7.81 *1.96*(31.5844/50)+(23.3289/50)

7.81 1.96*(0.6317)+(0.4666)

7.81 1.96*(1.0982)

7.81 1.96(1.0478)

7.81 2.0537 therefore

7.81 2.0537 = 5.7363 &7.81 + 2.0537 = 9.8637

CI = (5.7363,9.8637) 13


14/50

Using of confidence level to

test hypotheses

State the hypotheses

Ho : 1 - 2 = 0

H1: 1 - 2 0

Make a decision. If CI does not contain 0, Reject null hypothesis

CI (5.7363, 9.8367) does not contain 0, thereforeHo is rejected

Summary. No enough evidence that the meansare the same. There is significant difference inmean rates

14


15/50

Supposing the mean cholesterol level of malesage 50 is 241. An investigator wishes to

examine whether the cholesterol levels aresignificantly reduced by modifying diets onlyslightly. A random sample of 12 patient agreeto participate in the study and followed the

modified diet for 3months. After 3months,their cholesterol levels were measured andsummary statistics are produced on the n=12subjects. The mean cholesterol level is 235

with standard deviation of 12.5. Based on thedata is there statistical evidence that themodified diet reduces cholesterol?

15


16/50

1. set up hypotheses

Ho : = 241

H1: 241

2. select the appropriate test statistic

t = -uo/(s/n) for

3. Decision rule

Reject Hoif t -1.796 (df= 11, p =0.05)

Do not reject Ho if t < -1.796

16


17/50

4. test value

t = -o/(s/n)

Substituting the values in the formula above:

235-241 -6 = -6 = -1.66

12.5/12 12.5/3.464 3.6

17


18/50

Example 4

An investigation is undertaken to examine themean times to relief from headache pain under 2entirely treatments: medication vs Relaxationtreatment. Patients suffering from chromic

headaches are enrolled in a study and randomlyassigned to one or the two treatments underinvestigation. Patients are instructed to eithertake assigned medication or perform the

relaxation exercises at the onset of their nextheadache. They are also instructed to record thetime in minutes until the headache pain isresolved.

18


19/50

Fifteen subjects are assigned to the

medication treatment and report a mean time

relief of 33.8minutes with a variance of2.85minutes. A second random sample of 15

subjects are assigned to the relaxation

treatment, and report a mean time to relief of

22.4minutes with a variance of 3.07 minutes

The data layout is shown below/next slide

19


20/50

Patients with

chronic

headaches

Randomize

Medication Relaxationtreatment

n1 = 15

1 = 33.8 minutesS1

2 = 2.85

N = 15

2 = 22.4 minutesS2

2 3.07

Are these sample means statistically significantly different . Run an appropriate test

to asses whether there is a significant difference in the mean time to relief under

the two different treatments using 5% level of significance. 20


21/50

Formula

12/ Sp*(1/n1+ 1/n2)

Where Sp = pooled standard deviation

= (X12(x)2/n1) + (X2

2(x)2/n2)

_______________ + _______________

n1-1 n2 - 1

21


22/50

Substituting in the formula:

t = 33.8 22.4

__________ 1.72 *(1/15+1/15)

11.4/0.63 = 18.10

t = 18.10 > 2.08 (t0.05, dfn1+n2 -2)

Reject Ho because there is significant evidencethat there is difference in the mean relief time

between medication and relaxation therapy.

22


23/50

Two dependent populations

Attributes Test Statistic Confidence Interval

Samples are matched or

paired, n (# pairs) 30

Samples are matches orpaired, n(# pairs) 30

Where d, Sd are the mean

Z = d - d__________

Sd/n

t = d - d__________

Sd/n

df = n-1

and standard deviation of

d Z1-/2*Sd/n

d t1-/2*Sd/n

df = n-1

the difference scores

23


24/50

Example

A nutritionist expert is examining a weight loss

programme to evaluate its effectiveness. Ten

subjects were randomly selected for the

investigation. Each subjects initial weight isrecorded, they follow the program for six

weeks, and they are again weighed. The data

are given below:

24


25/50

Subjects initial weight Final weight

1 180 165

2 142 138 3 126 128

4 138 136

5 175 170 6 205 197

7 116 115

8 142 128

9 157 144

10 136 130

25


26/50

Sbjts iw fw difference(d) difference2(d2)

1 180 165 15 225

2 142 138 4 16 3 126 128 -2 4

4 138 136 2 4

5 175 170 5 25 6 205 197 8 64

7 116 115 1 1

8 142 128 14 196

9 157 144 13 169

10 136 130 6 36

d = 66 d2 = 74026


27/50

d= d/n = 66/10 = 6.6

S2d = d2(d)2/n = 740 (66)2/10

n-1 9

S2d = 33.82

Sd

= 33.82 = 5.82

27


28/50

Test the hypothesis

1. Set up hypotheses

Ho : d = 0

H1 : d 0 2. Select the appropriate statistic

t = __ d - d____

Sd/n

28


29/50

6.6-0/(5.8210) = 3.59

df n-1 = 10 1 = 9

tcalc.

= 3.59, ttab(0.05,

df=9)

= 2.262

tcalc. ttab Reject Ho

We have 95% significant evidence, to show thatthere is mean weight loss following six weeksprogram.

29


30/50

Fixing of confidence interval

Recall

d t1-/2*Sd/n d = 6.6, t1-/2 = 2.262, Sd= 5.83 , 10 = 3.162278 6.6 2.262* 5.83/3.162278

6.6 2.262 * 18.44

6.6 41.71128

6.66+41.71128 = 48.31128

6.66- 41.71128 = -35.11128

(-35.11128, 48.311128) Do not reject Ho: we have95% significant evidence to show that the programhas no significant effect on mean weight loss after sixweeks.

30


31/50

Chi-square table

31


32/50

Chi- Square Analysis

Goodness of fit test

Test of independence

Test of heterogeneity

Used for the test of hypotheses of multi-variable

data in one-sample, two or more sample

applications.

Both tests and test statistic follows chi-squaredistribution (2).

32


33/50

Goodness of fit test

Test Statistic

2= (O-E)2

E

Where O = observed, E = expected

E.G. Volunteers at a teen hotline have beenassigned to based on the assumption that 40%

of all calls are drug-related, 25% are sex-related, 24% are stress-related and 1%concern educational issues.

33


34/50

For this investigation, each call is classified

into one category based on the primary issue

raised by the caller. To test the hypothesis, the

following data are collected from 120randomly selected calls placed to the teen

hotline. Based on the data, is the assumption

regarding the distribution appropriate?

34


35/50

35

Topical issue

Drugs Sex Stress Education

Number of calls 52 38 21 9


36/50

1 Sep up the hypothesis

Ho : p1 = 0.40, p2 = 0.25, p3 = 0,25, p4 = 0.1

H1 : Ho is false 2 Select appropriate statistic

2= (O-E)2

E

3. select level of confidence

= 0.05, here we determine df, n-1, 4 1 = 3

2 = 7.815 from table @ df=3, critical level0.05

36


37/50

4. Decision rule

Reject Ho if2 7.815

Do not reject Ho if

2

7.815 5. compute test statistic

37


38/50

Topical Issue Drugs Sex Stress Educational Total

O =

(observed

frqcy)

52 38 21 9 120

E =

(expectedfrqncy)

120(0.40) =

48

120 (0,25) =

30

120(0.25) =

30

120(0.1)

12

120

(O-E) 4 8 -9 -3 0

(O-E)2/E (4)2/48

= 0.33(8)2/30

= 2.13(-9)2/30

= 2.70(-3)2/12

= 0.755.915

38

Organized computations of the test statistic

The test statistic (2) = 5.913

l


39/50

Conclusions

Do not reject Hosince 5.913 7.815

We do not have significant evidence = 0.05to show that the distribution of topical issues

in the calls placed to the teen hotline is not as

assummed (40% drug related, 25% sex-

related, 25% stress-related and 10% eduction-

related).

39


40/50

Test of Independence

This considers applications involving two or

more samples or two categorical variables.

Our interest is to evaluate whether these two

categorical variables are related(dependent/associated) or unrelated

(independent/ not associated). The following

example illustrates the use of2

test ofindependence

40


41/50

Example.

The following data were collected in a multi-

site study of medical effectiveness in type IIdiabetes. Three sites were involved in the

study, a health maintenance organization

(HMO), a university teaching hospital (UTH),

and an independent practice association

(IPA). Type II patients were enrolled in the

study from each site and monitored for over a

three year period. The data below illustratethe treatment regimens of patients measured

by site

41


42/50

Treatment Regimens

Site Diet & exercise Oral

Hypoglycemics

Insulin Total

HMO 294 827 579 1700

UTH 132 388 352 772

IPA 189 516 404 1109Total 615 1630 1335 3581

42

Th t bl b i 3 X 3 t b l ti


43/50

The table above is a 3 X 3 cross-tabulation

table or a contingencytable.

Both sites and treatment regimens arecategorical variables

Site is called the row variables and treatment

regimen is called the column variables

The number of rows in the table is denoted R

and the number of columns in the table is

denoted C.

In this table, R=3 and C=3

The row and column totals are shown on the

right and bottom of the table, respectively.43


44/50

The 9 combinations of site and treatment

regimens are called the cells of the table.

e.g. Patients in the HMO treated by diet andexercise denoted one cell of the table,

patients in the HMO treated by the oral

hypoglycemics denote another cell, etc,

We wish to use the data to test the hypothesis

that the two variables (site and treatment

regimen) are independent (i.e. no difference

in treatment regimen across sites)

The hypotheses are written as follows

44

1 t th h th i


45/50

1. set up the hypothesis

Ho : Site and treatment regimen are

independent ( no relationship between siteand treatment regimen)

H1 : Ho is false ( site and treatment regimen

are related)

2. Select the significant level ( = 0.05)

3. select the appropriate statistic

2

= (O-E)2

E

45

4 D i i l


46/50

4. Decision rule

To select the appropriate critical value, we first

determine the df = (R-1)(C-1)=(3-1)(3-1) DF= (2)(2) = 4

From the table 2 = 9.49

Reject Ho if2calc 9.49(tab) else do not rejectHo if

2calc 9.49(tab)

5. compute the test statistic

46

h


47/50

To compute the test static

Note that the observed values are displayed in

the cells Let us compute the expected values and put

them in parenthesis in each cell.

The expected value for each cell is computedby finding the product of the row and column

totals in which the cell is located / total

patients involved in the investigation. Eg expected frequency of HMO and diet /

exercise = 1700 X 615/3581

47

Treatment Regimens


48/50

Treatment Regimens

Site Diet & exercise Oral

Hypoglycemics

Insulin Total

HMD 294

(1700X615)/3581=

291.95)

827

(1700X1630)/3581=

774.3)

579

(1700X1335)/3581=

633.8)

1700

UTH 132

(772X615)/3581

=132.6)

388

(772X1630/3581

=351.6)

352

(773X1335)/3581

=287.8)

772

IPA 189

(1109X615)/3581

=189.5)

516

(1109X1630)/3581

=505.1)

404

(1109X1335)/3581

=413.4)

1109

Total 615 1630 1335 3581

48Note: The marginal totals of observed = marginal totals of expected

U i th b d d t d f i


49/50

Using the observed and expected frequencies,we compute the test statistics

2= (O-E)2

E

(294-291.5)2 + (827-774.3)2 + (579-633.8)2 +

291.5 774.5 633.8

(132-132.6)2 + (288-351.6)2 + (352-187.8)2 +

132.6 351.6 187.8

(189-190.5)2 + (516-505.1)2 + (404-413.3)2 =

190.5 505.1 413.3

49

2 0 014 + 3 359 + 4 732 + 0 003 + 11 509 +


50/50

2 = 0.014 + 3.359 + 4.732 + 0.003 + 11.509 +

14.320 + 0.011 + 0.2235 +0.215

= 34.629

Conclusion. Reject Hosince 34.629 9.49. we

have significant evidence ( = 0.05) to showthat site and treatment regimen are not

independent ( i.e. related).

Comparison of Means

Documents

Transcript of Comparison of Means