Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

44
Copyright (c) Bani K. Mal lick 1 STAT 651 Lecture 9
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Page 1: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 1

STAT 651

Lecture 9

Page 2: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 2

Topics in Lecture #9 Comparing two population means

Output: detailed look

The t-test

Page 3: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 3

Book Sections Covered in Lecture #9

Chapter 6.2

Page 4: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 4

Relevant SPSS Tutorials Transformations of Data

2-sample t-test

Paired t-test

Page 5: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 5

Lecture 8 Review: Comparing Two Populations

There a two populations

Take a sample from each population

The sample sizes need not be the same

Population 1:

Population 2:

1n

2n

Page 6: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 6

Lecture 8 Review: Comparing Two Populations

Each will have a sample standard deviation

Population 1:

Population 2:

1s

2s

Page 7: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 7

Lecture 8 Review: Comparing Two Populations

Each sample with have a sample mean

Population 1:

Population 2:

That’s the statistics. What are the parameters?

1X

2X

Page 8: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 8

Lecture 8 Review: Comparing Two Populations

Each sample with have a population standard deviation

Population 1:

Population 2:

1

2

Page 9: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 9

Lecture 8 Review: Comparing Two Populations

Each sample with have a population mean

Population 1:

Population 2:

1

2

Page 10: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 10

Lecture 8 Review: Comparing Two Populations

How do we compare the population means and ????

The usual way is to take their difference:

If the population means are equal, what is their difference?

12

1 2

Page 11: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 11

Lecture 8 Review: Comparing Two Populations

The usual way is to take their difference:

If the population means are equal, their difference = 0

Suppose we form a confidence interval for the difference. From this we learn whether 0 is in the confidence interval, and hence can make decisions about the hypothesis

1 2

Page 12: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 12

NHANES Comparison

Group Statistics

60 2.9905 .6173 7.969E-02

59 2.6969 .6423 8.362E-02

Health StatusHealthy

Cancer

Log(Saturated Fat)N Mean Std. Deviation

Std. ErrorMean

Page 13: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 13

NHANES Comparison: what the output looks like

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Page 14: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 14

NHANES Comparison: the variable

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Page 15: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 15

NHANES Comparison: The method. If you think the

varianes are wildly different, try a transformation

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Page 16: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 16

NHANES Comparison: the p-value.

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Page 17: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 17

NHANES Comparison: the difference in sample means

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Page 18: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 18

NHANES Comparison: the standard error of difference in

sample means

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Page 19: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 19

NHANES Comparison: the 95% confidence interval

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 0.0065 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

F Sig.

Levene's Test forEuality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Page 20: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 20

NHANES Comparison

The “Mean Difference” is 0.2937. Since the healthy cases had a higher mean, this is

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

What is this a CI for? The difference in population mean log(saturated fat) intake between cancer cases and healthy controls:

(Healthy) – (Cancer)

Page 21: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 21

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

The null hypothesis of interest is that the population means are equal, i.e.,

(Healthy) – (Cancer) = 0

Page 22: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 22

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

Is the p-value p < 0.05 or p > 0.05?

Page 23: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 23

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

0 = Hypothesized

value

0.0065 0.5223

Confidence Interval

Page 24: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 24

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

Is the p-value p < 0.05 or p > 0.05?

Answer: p < 0.05 since the 95% CI does not cover zero.

Page 25: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 25

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

Is the p-value p < 0.01 or p > 0.01?

Answer: You cannot tell from a 95% CI. However, from the SPSS output, p = 0.012. (see next slide)

Page 26: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 26

NHANES Comparison: the 95% confidence interval

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 0.0065 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

F Sig.

Levene's Test forEuality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Page 27: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 27

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

What do we conclude from this confidence interval?

Page 28: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 28

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

What do we conclude from this confidence interval?

The population mean log(saturated fat) intake is greater in the Healthy cases by between 0.0065 and 0.5223 (exponentiate to get in terms of grams of saturated fat), with 95% confidence

Page 29: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 29

Comparing Two Population Means: the Formulas

The data:

The populations:

The aim: CI for

1X 1s 1n

2X 2s 2n

1 12 2

1 2

Page 30: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 30

Comparing Two Populations

Does it matter which one you call population 1 and which one you call population 2?

Not at all. The key is to interpret the difference properly.

Page 31: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 31

Comparing Two Populations

The aim: CI for

This is the difference in population means

The estimate of the difference in population means is the difference in sample means

This is a random variable: it has sample to sample variability

1 2

1 2X X

Page 32: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 32

Comparing Two Populations

Difference of sample means

“Population” mean from repeated sampling is

The s.d. from repeated sampling is

1 2X X

1 2

2 21 2

1 2n n

Page 33: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 33

Comparing Two Populations

Difference of sample means

The s.d. from repeated sampling is

You need reasonably large samples from BOTH populations

1 2X X

2 21 2

1 2n n

Page 34: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 34

Comparing Two Populations

If you can reasonably believe that the population sd’s are nearly equal, it is customary to pick the equal variance assumption and estimate the common standard deviation by

2 21 1 2 2

p1 2

(n 1)s (n 1)ss

n n 2

Page 35: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 35

Comparing Two Populations

The standard error then of is the value

The number of degrees of freedom is

1 2X X

p 1 2

1 1s

n n

1 2n n 2

Page 36: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 36

Comparing Two Populations

A (1100% CI for is

Note how the sample sizes determine the CI length

1 2X X /2 1 2 p 1 2

1 1t (n +n -2)s

n n

1 2

Page 37: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 37

Comparing Two Populations

Generally, you should make your sample sizes nearly equal, or at least not wildly unequal. Consider a total sample size of 100

= 1 if n1 = 1, n2 = 99

= 0.20 if n1 = 50, n2 = 50

Thus, in the former case, your CI would be 5 times longer!

1 2

1 1

n n

1 2 /2 1 2 p 1 2

1 1X X t (n +n -2)s

n n

Page 38: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 38

Comparing Two Populations

The CI can of course be used to test hypotheses

This is the same as

So we just need to check whether 0 is in the interval, just as we have done

0 1 2 a 1 2H : vs H :

0 1 2 a 1 2H : =0 vs H : 0

Page 39: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 39

Comparing Two Populations: The t-test

There is something called a t-test, which gives you the information as to whether 0 is in the CI.

It does not tell you where the means lie however, so it is of limited use. P-values tell you the same thing.

0 1 2 a 1 2H : =0 vs H : 0

Page 40: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 40

Comparing Two Populations: The t-test

The t-statistic is defined by

1 2

p 1 2

X Xt =

1 1s

n n

Page 41: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 41

Comparing Two Populations: The t-test

You reject equality of means if

In this case, is p < or is p > ?

/2 1 2|t| > t (n +n -2)

Page 42: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 42

Comparing Two Populations: The t-test

You reject equality of means if

p <

/2 1 2|t| > t (n +n -2)

Page 43: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 43

NHANES Comparison: the t-test

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

tdf

Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

/2 1 2 .025t (n +n -2) = t (117) 1.98

/2 1 2t = 2.543 > t (n +n -2) 1.98, hence reject

the hypothesis that the population means are equal,

for = 0.05

Page 44: Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9.

Copyright (c) Bani K. Mallick 44

Comparing Two Populations

SPSS Demonstrations: bluebonnets and Framingham Heart Disease and Blood Pressure, as time permits