Comparison of Means
-
Upload
suleiman-dauda -
Category
Documents
-
view
215 -
download
0
Transcript of Comparison of Means
-
7/29/2019 Comparison of Means
1/50
T distribution
Student t-test
1
-
7/29/2019 Comparison of Means
2/50
Steps in hypotheses testing concerning
SteSteps Examples
1. Set up hypotheses
Select level of significance
Ho: = oH1 : o
=0.05
2. Select the appropriate statistics Z= X- o/n
3. Generate decision rule Reject Ho if Z Z1-Do not reject Hoif Z Z1-
4. Compute the value test value
5. Draw conclusion about Ho by
comparing the test value (4) to the
decision rule (3).
2
-
7/29/2019 Comparison of Means
3/50
Test of hypothesis concerning
Assumptions: Normal distribution or large sample (n30)
Simple random samples
Case 1: known x z ( /n)
Case 2. unknown andn 30 x z (s/n)
Case 3: unknown andn 30
X (s/n), df = n-1
3
-
7/29/2019 Comparison of Means
4/50
Estimation of two samples is concerned withestimating (1-2), the difference in meansbetween groups/populations
Tests of hypotheses in the two sample caseare also concerned with the difference in themeans.
E.g. Ho : 1-2= 0 ( no difference in means) vsH1: 1-2 0 (means are different)
Or H1: 12 (the mean of population 1 is
larger than the mean of population 2) Or H1: 12 (the mean of population 1 is
smaller than the mean of population 2)
4
-
7/29/2019 Comparison of Means
5/50
General format
Test value =(observed)-(expected)_____________________________
Standard error
1 - 2 is the observed difference, and 1-2 is theexpected difference which is 0 when the null
hypothesis is 1= 2, since the equivalent of 1-2 = 0
Standard error of the difference is (1/n1)+(2/n2)
5
-
7/29/2019 Comparison of Means
6/50
If12 and2
2 are not known, the researcher
can use the variances s12and s2
2 obtained from
sample respectively, provided the sample sizesmust be 30 or more. The formula then is:
1 - 2 = difference in means
______
(s12/n1)+(s2
2/n2) = standard error
of difference in
meansProvided n1 30 and n2 30.
6
-
7/29/2019 Comparison of Means
7/50
In comparison between two means, the same
basic steps of hypothesis testing for Z are
followed. When comparing two means by using t-test,
the researcher must decide if the two samples
are independent or dependent. Two assumptions of difference between two
means:
The sample must be independent of each other The population from which the samples were
drawn must be normally distributed
7
-
7/29/2019 Comparison of Means
8/50
Student t-test
1. Testing the difference between two means
: independent large samples
2. Testing the difference between two
means: independent small samples
3. Testing the difference two means: small
dependent samples
8
-
7/29/2019 Comparison of Means
9/50
Testing between two means of
independent large samples
general formula
Test statistic
12 (S1
2 /n1+ S22/n2)
Confidence Interval:
( - ) = Z1-/2 (se)
9
-
7/29/2019 Comparison of Means
10/50
Example 1.
A survey found that the average hotel room
rate in Zaria is N88.42 and the average room
rate in Funtua is N80.61. Assume that the data
were obtained from two samples of 50 hotels
each and the standard deviation were N5.62
and N4.83, respectively. At = 0.05, can it be
concluded that there is a significant differencein the rates?
10
-
7/29/2019 Comparison of Means
11/50
solution
Step 1 State the hypotheses
Ho : 1 = 2 and H: 1 2 (claim)
Step 2. Find the critical value
Z = 1.96 Compute the test value
1 - 2/(S12/n1)+(S1
2/n2), thus substitution
88.42-80.61/(5.622
/50)+(4.83
2
/50)7.81/(31.5844/50)+(23.3289/50)
7.81/(0.6317)+(0.4666)
7.81/(1.0982)11
-
7/29/2019 Comparison of Means
12/50
7.81/(1.0983)
7.81/1.048 (note 1.048 is se)
t= 7.4523
Step 4. Make the decision. If tcalc > ttab Reject
the null hypothesis (Ho)
Step 5. Summarize the result
There is no enough evidence to support
the claim that the means are not equal. Hence
there is significant difference in the rates.
12
-
7/29/2019 Comparison of Means
13/50
Fixing of confidence limit
1 - 2 Z *(S12/n1)+(S1
2/n2)
88.42-80.61 1.96*(5.622/50)+(4.832/50)
7.81 *1.96*(31.5844/50)+(23.3289/50)
7.81 1.96*(0.6317)+(0.4666)
7.81 1.96*(1.0982)
7.81 1.96(1.0478)
7.81 2.0537 therefore
7.81 2.0537 = 5.7363 &7.81 + 2.0537 = 9.8637
CI = (5.7363,9.8637) 13
-
7/29/2019 Comparison of Means
14/50
Using of confidence level to
test hypotheses
State the hypotheses
Ho : 1 - 2 = 0
H1: 1 - 2 0
Make a decision. If CI does not contain 0, Reject null hypothesis
CI (5.7363, 9.8367) does not contain 0, thereforeHo is rejected
Summary. No enough evidence that the meansare the same. There is significant difference inmean rates
14
-
7/29/2019 Comparison of Means
15/50
Supposing the mean cholesterol level of malesage 50 is 241. An investigator wishes to
examine whether the cholesterol levels aresignificantly reduced by modifying diets onlyslightly. A random sample of 12 patient agreeto participate in the study and followed the
modified diet for 3months. After 3months,their cholesterol levels were measured andsummary statistics are produced on the n=12subjects. The mean cholesterol level is 235
with standard deviation of 12.5. Based on thedata is there statistical evidence that themodified diet reduces cholesterol?
15
-
7/29/2019 Comparison of Means
16/50
1. set up hypotheses
Ho : = 241
H1: 241
2. select the appropriate test statistic
t = -uo/(s/n) for
3. Decision rule
Reject Hoif t -1.796 (df= 11, p =0.05)
Do not reject Ho if t < -1.796
16
-
7/29/2019 Comparison of Means
17/50
4. test value
t = -o/(s/n)
Substituting the values in the formula above:
235-241 -6 = -6 = -1.66
12.5/12 12.5/3.464 3.6
17
-
7/29/2019 Comparison of Means
18/50
Example 4
An investigation is undertaken to examine themean times to relief from headache pain under 2entirely treatments: medication vs Relaxationtreatment. Patients suffering from chromic
headaches are enrolled in a study and randomlyassigned to one or the two treatments underinvestigation. Patients are instructed to eithertake assigned medication or perform the
relaxation exercises at the onset of their nextheadache. They are also instructed to record thetime in minutes until the headache pain isresolved.
18
-
7/29/2019 Comparison of Means
19/50
Fifteen subjects are assigned to the
medication treatment and report a mean time
relief of 33.8minutes with a variance of2.85minutes. A second random sample of 15
subjects are assigned to the relaxation
treatment, and report a mean time to relief of
22.4minutes with a variance of 3.07 minutes
The data layout is shown below/next slide
19
-
7/29/2019 Comparison of Means
20/50
Patients with
chronic
headaches
Randomize
Medication Relaxationtreatment
n1 = 15
1 = 33.8 minutesS1
2 = 2.85
N = 15
2 = 22.4 minutesS2
2 3.07
Are these sample means statistically significantly different . Run an appropriate test
to asses whether there is a significant difference in the mean time to relief under
the two different treatments using 5% level of significance. 20
-
7/29/2019 Comparison of Means
21/50
Formula
12/ Sp*(1/n1+ 1/n2)
Where Sp = pooled standard deviation
= (X12(x)2/n1) + (X2
2(x)2/n2)
_______________ + _______________
n1-1 n2 - 1
21
-
7/29/2019 Comparison of Means
22/50
Substituting in the formula:
t = 33.8 22.4
__________ 1.72 *(1/15+1/15)
11.4/0.63 = 18.10
t = 18.10 > 2.08 (t0.05, dfn1+n2 -2)
Reject Ho because there is significant evidencethat there is difference in the mean relief time
between medication and relaxation therapy.
22
-
7/29/2019 Comparison of Means
23/50
Two dependent populations
Attributes Test Statistic Confidence Interval
Samples are matched or
paired, n (# pairs) 30
Samples are matches orpaired, n(# pairs) 30
Where d, Sd are the mean
Z = d - d__________
Sd/n
t = d - d__________
Sd/n
df = n-1
and standard deviation of
d Z1-/2*Sd/n
d t1-/2*Sd/n
df = n-1
the difference scores
23
-
7/29/2019 Comparison of Means
24/50
Example
A nutritionist expert is examining a weight loss
programme to evaluate its effectiveness. Ten
subjects were randomly selected for the
investigation. Each subjects initial weight isrecorded, they follow the program for six
weeks, and they are again weighed. The data
are given below:
24
-
7/29/2019 Comparison of Means
25/50
Subjects initial weight Final weight
1 180 165
2 142 138 3 126 128
4 138 136
5 175 170 6 205 197
7 116 115
8 142 128
9 157 144
10 136 130
25
-
7/29/2019 Comparison of Means
26/50
Sbjts iw fw difference(d) difference2(d2)
1 180 165 15 225
2 142 138 4 16 3 126 128 -2 4
4 138 136 2 4
5 175 170 5 25 6 205 197 8 64
7 116 115 1 1
8 142 128 14 196
9 157 144 13 169
10 136 130 6 36
d = 66 d2 = 74026
-
7/29/2019 Comparison of Means
27/50
d= d/n = 66/10 = 6.6
S2d = d2(d)2/n = 740 (66)2/10
n-1 9
S2d = 33.82
Sd
= 33.82 = 5.82
27
-
7/29/2019 Comparison of Means
28/50
Test the hypothesis
1. Set up hypotheses
Ho : d = 0
H1 : d 0 2. Select the appropriate statistic
t = __ d - d____
Sd/n
28
-
7/29/2019 Comparison of Means
29/50
6.6-0/(5.8210) = 3.59
df n-1 = 10 1 = 9
tcalc.
= 3.59, ttab(0.05,
df=9)
= 2.262
tcalc. ttab Reject Ho
We have 95% significant evidence, to show thatthere is mean weight loss following six weeksprogram.
29
-
7/29/2019 Comparison of Means
30/50
Fixing of confidence interval
Recall
d t1-/2*Sd/n d = 6.6, t1-/2 = 2.262, Sd= 5.83 , 10 = 3.162278 6.6 2.262* 5.83/3.162278
6.6 2.262 * 18.44
6.6 41.71128
6.66+41.71128 = 48.31128
6.66- 41.71128 = -35.11128
(-35.11128, 48.311128) Do not reject Ho: we have95% significant evidence to show that the programhas no significant effect on mean weight loss after sixweeks.
30
-
7/29/2019 Comparison of Means
31/50
Chi-square table
31
-
7/29/2019 Comparison of Means
32/50
Chi- Square Analysis
Goodness of fit test
Test of independence
Test of heterogeneity
Used for the test of hypotheses of multi-variable
data in one-sample, two or more sample
applications.
Both tests and test statistic follows chi-squaredistribution (2).
32
-
7/29/2019 Comparison of Means
33/50
Goodness of fit test
Test Statistic
2= (O-E)2
E
Where O = observed, E = expected
E.G. Volunteers at a teen hotline have beenassigned to based on the assumption that 40%
of all calls are drug-related, 25% are sex-related, 24% are stress-related and 1%concern educational issues.
33
-
7/29/2019 Comparison of Means
34/50
For this investigation, each call is classified
into one category based on the primary issue
raised by the caller. To test the hypothesis, the
following data are collected from 120randomly selected calls placed to the teen
hotline. Based on the data, is the assumption
regarding the distribution appropriate?
34
-
7/29/2019 Comparison of Means
35/50
35
Topical issue
Drugs Sex Stress Education
Number of calls 52 38 21 9
-
7/29/2019 Comparison of Means
36/50
1 Sep up the hypothesis
Ho : p1 = 0.40, p2 = 0.25, p3 = 0,25, p4 = 0.1
H1 : Ho is false 2 Select appropriate statistic
2= (O-E)2
E
3. select level of confidence
= 0.05, here we determine df, n-1, 4 1 = 3
2 = 7.815 from table @ df=3, critical level0.05
36
-
7/29/2019 Comparison of Means
37/50
4. Decision rule
Reject Ho if2 7.815
Do not reject Ho if
2
7.815 5. compute test statistic
37
-
7/29/2019 Comparison of Means
38/50
Topical Issue Drugs Sex Stress Educational Total
O =
(observed
frqcy)
52 38 21 9 120
E =
(expectedfrqncy)
120(0.40) =
48
120 (0,25) =
30
120(0.25) =
30
120(0.1)
12
120
(O-E) 4 8 -9 -3 0
(O-E)2/E (4)2/48
= 0.33(8)2/30
= 2.13(-9)2/30
= 2.70(-3)2/12
= 0.755.915
38
Organized computations of the test statistic
The test statistic (2) = 5.913
l
-
7/29/2019 Comparison of Means
39/50
Conclusions
Do not reject Hosince 5.913 7.815
We do not have significant evidence = 0.05to show that the distribution of topical issues
in the calls placed to the teen hotline is not as
assummed (40% drug related, 25% sex-
related, 25% stress-related and 10% eduction-
related).
39
-
7/29/2019 Comparison of Means
40/50
Test of Independence
This considers applications involving two or
more samples or two categorical variables.
Our interest is to evaluate whether these two
categorical variables are related(dependent/associated) or unrelated
(independent/ not associated). The following
example illustrates the use of2
test ofindependence
40
-
7/29/2019 Comparison of Means
41/50
Example.
The following data were collected in a multi-
site study of medical effectiveness in type IIdiabetes. Three sites were involved in the
study, a health maintenance organization
(HMO), a university teaching hospital (UTH),
and an independent practice association
(IPA). Type II patients were enrolled in the
study from each site and monitored for over a
three year period. The data below illustratethe treatment regimens of patients measured
by site
41
-
7/29/2019 Comparison of Means
42/50
Treatment Regimens
Site Diet & exercise Oral
Hypoglycemics
Insulin Total
HMO 294 827 579 1700
UTH 132 388 352 772
IPA 189 516 404 1109Total 615 1630 1335 3581
42
Th t bl b i 3 X 3 t b l ti
-
7/29/2019 Comparison of Means
43/50
The table above is a 3 X 3 cross-tabulation
table or a contingencytable.
Both sites and treatment regimens arecategorical variables
Site is called the row variables and treatment
regimen is called the column variables
The number of rows in the table is denoted R
and the number of columns in the table is
denoted C.
In this table, R=3 and C=3
The row and column totals are shown on the
right and bottom of the table, respectively.43
-
7/29/2019 Comparison of Means
44/50
The 9 combinations of site and treatment
regimens are called the cells of the table.
e.g. Patients in the HMO treated by diet andexercise denoted one cell of the table,
patients in the HMO treated by the oral
hypoglycemics denote another cell, etc,
We wish to use the data to test the hypothesis
that the two variables (site and treatment
regimen) are independent (i.e. no difference
in treatment regimen across sites)
The hypotheses are written as follows
44
1 t th h th i
-
7/29/2019 Comparison of Means
45/50
1. set up the hypothesis
Ho : Site and treatment regimen are
independent ( no relationship between siteand treatment regimen)
H1 : Ho is false ( site and treatment regimen
are related)
2. Select the significant level ( = 0.05)
3. select the appropriate statistic
2
= (O-E)2
E
45
4 D i i l
-
7/29/2019 Comparison of Means
46/50
4. Decision rule
To select the appropriate critical value, we first
determine the df = (R-1)(C-1)=(3-1)(3-1) DF= (2)(2) = 4
From the table 2 = 9.49
Reject Ho if2calc 9.49(tab) else do not rejectHo if
2calc 9.49(tab)
5. compute the test statistic
46
h
-
7/29/2019 Comparison of Means
47/50
To compute the test static
Note that the observed values are displayed in
the cells Let us compute the expected values and put
them in parenthesis in each cell.
The expected value for each cell is computedby finding the product of the row and column
totals in which the cell is located / total
patients involved in the investigation. Eg expected frequency of HMO and diet /
exercise = 1700 X 615/3581
47
Treatment Regimens
-
7/29/2019 Comparison of Means
48/50
Treatment Regimens
Site Diet & exercise Oral
Hypoglycemics
Insulin Total
HMD 294
(1700X615)/3581=
291.95)
827
(1700X1630)/3581=
774.3)
579
(1700X1335)/3581=
633.8)
1700
UTH 132
(772X615)/3581
=132.6)
388
(772X1630/3581
=351.6)
352
(773X1335)/3581
=287.8)
772
IPA 189
(1109X615)/3581
=189.5)
516
(1109X1630)/3581
=505.1)
404
(1109X1335)/3581
=413.4)
1109
Total 615 1630 1335 3581
48Note: The marginal totals of observed = marginal totals of expected
U i th b d d t d f i
-
7/29/2019 Comparison of Means
49/50
Using the observed and expected frequencies,we compute the test statistics
2= (O-E)2
E
(294-291.5)2 + (827-774.3)2 + (579-633.8)2 +
291.5 774.5 633.8
(132-132.6)2 + (288-351.6)2 + (352-187.8)2 +
132.6 351.6 187.8
(189-190.5)2 + (516-505.1)2 + (404-413.3)2 =
190.5 505.1 413.3
49
2 0 014 + 3 359 + 4 732 + 0 003 + 11 509 +
-
7/29/2019 Comparison of Means
50/50
2 = 0.014 + 3.359 + 4.732 + 0.003 + 11.509 +
14.320 + 0.011 + 0.2235 +0.215
= 34.629
Conclusion. Reject Hosince 34.629 9.49. we
have significant evidence ( = 0.05) to showthat site and treatment regimen are not
independent ( i.e. related).