Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.
-
date post
19-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.
Copyright (c) Bani K. Mallick 1
STAT 651
Lecture 9
Copyright (c) Bani K. Mallick 2
Topics in Lecture #9 Comparing two population means
Output: detailed look
The t-test
Copyright (c) Bani K. Mallick 3
Book Sections Covered in Lecture #9
Chapter 6.2
Copyright (c) Bani K. Mallick 4
Relevant SPSS Tutorials Transformations of Data
2-sample t-test
Paired t-test
Copyright (c) Bani K. Mallick 5
Lecture 8 Review: Comparing Two Populations
There a two populations
Take a sample from each population
The sample sizes need not be the same
Population 1:
Population 2:
1n
2n
Copyright (c) Bani K. Mallick 6
Lecture 8 Review: Comparing Two Populations
Each will have a sample standard deviation
Population 1:
Population 2:
1s
2s
Copyright (c) Bani K. Mallick 7
Lecture 8 Review: Comparing Two Populations
Each sample with have a sample mean
Population 1:
Population 2:
That’s the statistics. What are the parameters?
1X
2X
Copyright (c) Bani K. Mallick 8
Lecture 8 Review: Comparing Two Populations
Each sample with have a population standard deviation
Population 1:
Population 2:
1
2
Copyright (c) Bani K. Mallick 9
Lecture 8 Review: Comparing Two Populations
Each sample with have a population mean
Population 1:
Population 2:
1
2
Copyright (c) Bani K. Mallick 10
Lecture 8 Review: Comparing Two Populations
How do we compare the population means and ????
The usual way is to take their difference:
If the population means are equal, what is their difference?
12
1 2
Copyright (c) Bani K. Mallick 11
Lecture 8 Review: Comparing Two Populations
The usual way is to take their difference:
If the population means are equal, their difference = 0
Suppose we form a confidence interval for the difference. From this we learn whether 0 is in the confidence interval, and hence can make decisions about the hypothesis
1 2
Copyright (c) Bani K. Mallick 12
NHANES Comparison
Group Statistics
60 2.9905 .6173 7.969E-02
59 2.6969 .6423 8.362E-02
Health StatusHealthy
Cancer
Log(Saturated Fat)N Mean Std. Deviation
Std. ErrorMean
Copyright (c) Bani K. Mallick 13
NHANES Comparison: what the output looks like
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Copyright (c) Bani K. Mallick 14
NHANES Comparison: the variable
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Copyright (c) Bani K. Mallick 15
NHANES Comparison: The method. If you think the
varianes are wildly different, try a transformation
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Copyright (c) Bani K. Mallick 16
NHANES Comparison: the p-value.
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Copyright (c) Bani K. Mallick 17
NHANES Comparison: the difference in sample means
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Copyright (c) Bani K. Mallick 18
NHANES Comparison: the standard error of difference in
sample means
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Copyright (c) Bani K. Mallick 19
NHANES Comparison: the 95% confidence interval
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 0.0065 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
F Sig.
Levene's Test forEuality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Copyright (c) Bani K. Mallick 20
NHANES Comparison
The “Mean Difference” is 0.2937. Since the healthy cases had a higher mean, this is
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
What is this a CI for? The difference in population mean log(saturated fat) intake between cancer cases and healthy controls:
(Healthy) – (Cancer)
Copyright (c) Bani K. Mallick 21
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
The null hypothesis of interest is that the population means are equal, i.e.,
(Healthy) – (Cancer) = 0
Copyright (c) Bani K. Mallick 22
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
Is the p-value p < 0.05 or p > 0.05?
Copyright (c) Bani K. Mallick 23
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
0 = Hypothesized
value
0.0065 0.5223
Confidence Interval
Copyright (c) Bani K. Mallick 24
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
Is the p-value p < 0.05 or p > 0.05?
Answer: p < 0.05 since the 95% CI does not cover zero.
Copyright (c) Bani K. Mallick 25
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
Is the p-value p < 0.01 or p > 0.01?
Answer: You cannot tell from a 95% CI. However, from the SPSS output, p = 0.012. (see next slide)
Copyright (c) Bani K. Mallick 26
NHANES Comparison: the 95% confidence interval
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 0.0065 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
F Sig.
Levene's Test forEuality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Copyright (c) Bani K. Mallick 27
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
What do we conclude from this confidence interval?
Copyright (c) Bani K. Mallick 28
NHANES Comparison
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.0065 to 0.5223
What do we conclude from this confidence interval?
The population mean log(saturated fat) intake is greater in the Healthy cases by between 0.0065 and 0.5223 (exponentiate to get in terms of grams of saturated fat), with 95% confidence
Copyright (c) Bani K. Mallick 29
Comparing Two Population Means: the Formulas
The data:
The populations:
The aim: CI for
1X 1s 1n
2X 2s 2n
1 12 2
1 2
Copyright (c) Bani K. Mallick 30
Comparing Two Populations
Does it matter which one you call population 1 and which one you call population 2?
Not at all. The key is to interpret the difference properly.
Copyright (c) Bani K. Mallick 31
Comparing Two Populations
The aim: CI for
This is the difference in population means
The estimate of the difference in population means is the difference in sample means
This is a random variable: it has sample to sample variability
1 2
1 2X X
Copyright (c) Bani K. Mallick 32
Comparing Two Populations
Difference of sample means
“Population” mean from repeated sampling is
The s.d. from repeated sampling is
1 2X X
1 2
2 21 2
1 2n n
Copyright (c) Bani K. Mallick 33
Comparing Two Populations
Difference of sample means
The s.d. from repeated sampling is
You need reasonably large samples from BOTH populations
1 2X X
2 21 2
1 2n n
Copyright (c) Bani K. Mallick 34
Comparing Two Populations
If you can reasonably believe that the population sd’s are nearly equal, it is customary to pick the equal variance assumption and estimate the common standard deviation by
2 21 1 2 2
p1 2
(n 1)s (n 1)ss
n n 2
Copyright (c) Bani K. Mallick 35
Comparing Two Populations
The standard error then of is the value
The number of degrees of freedom is
1 2X X
p 1 2
1 1s
n n
1 2n n 2
Copyright (c) Bani K. Mallick 36
Comparing Two Populations
A (1100% CI for is
Note how the sample sizes determine the CI length
1 2X X /2 1 2 p 1 2
1 1t (n +n -2)s
n n
1 2
Copyright (c) Bani K. Mallick 37
Comparing Two Populations
Generally, you should make your sample sizes nearly equal, or at least not wildly unequal. Consider a total sample size of 100
= 1 if n1 = 1, n2 = 99
= 0.20 if n1 = 50, n2 = 50
Thus, in the former case, your CI would be 5 times longer!
1 2
1 1
n n
1 2 /2 1 2 p 1 2
1 1X X t (n +n -2)s
n n
Copyright (c) Bani K. Mallick 38
Comparing Two Populations
The CI can of course be used to test hypotheses
This is the same as
So we just need to check whether 0 is in the interval, just as we have done
0 1 2 a 1 2H : vs H :
0 1 2 a 1 2H : =0 vs H : 0
Copyright (c) Bani K. Mallick 39
Comparing Two Populations: The t-test
There is something called a t-test, which gives you the information as to whether 0 is in the CI.
It does not tell you where the means lie however, so it is of limited use. P-values tell you the same thing.
0 1 2 a 1 2H : =0 vs H : 0
Copyright (c) Bani K. Mallick 40
Comparing Two Populations: The t-test
The t-statistic is defined by
1 2
p 1 2
X Xt =
1 1s
n n
Copyright (c) Bani K. Mallick 41
Comparing Two Populations: The t-test
You reject equality of means if
In this case, is p < or is p > ?
/2 1 2|t| > t (n +n -2)
Copyright (c) Bani K. Mallick 42
Comparing Two Populations: The t-test
You reject equality of means if
p <
/2 1 2|t| > t (n +n -2)
Copyright (c) Bani K. Mallick 43
NHANES Comparison: the t-test
Independent Samples Test
.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223
2.542 116.627 .012 .2937 .1155 6.488E-02 .5224
Equal variancesassumed
Equal variancesnot assumed
Log(Saturated Fat)F Sig.
Levene's Test forEquality of Variances
tdf
Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
/2 1 2 .025t (n +n -2) = t (117) 1.98
/2 1 2t = 2.543 > t (n +n -2) 1.98, hence reject
the hypothesis that the population means are equal,
for = 0.05
Copyright (c) Bani K. Mallick 44
Comparing Two Populations
SPSS Demonstrations: bluebonnets and Framingham Heart Disease and Blood Pressure, as time permits