Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

34
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13

Transcript of Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Page 1: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning

Testing Hypotheses

about Means

Chapter 13

Page 2: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 2

Hypothesis testing about: • a population mean

• the difference between means of two populations

Three Cautions:1. Inference is only valid if the sample is representative

of the population for the question of interest.

2. Hypotheses and conclusions apply to the larger population(s) represented by the sample(s).

3. If the distribution of a quantitative variable is highly skewed, consider analyzing the median rather than the mean – called nonparametric methods (Topic 2 on CD).

Page 3: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 3

13.1 Introduction to Hypothesis Tests for MeansSteps in Any Hypothesis Test1. Determine the null and alternative hypotheses.

2. Verify necessary data conditions, and if met, summarize the data into an appropriate test statistic.

3. Assuming the null hypothesis is true, find the p-value.

4. Decide whether or not the result is statistically significant based on the p-value.

5. Report the conclusion in the context of the situation.

Page 4: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 4

13.2: Testing Hypotheses about One Mean

Step 1: Determine null and alternative hypotheses

1. H0: = 0 versus Ha: 0 (two-sided)

2. H0: = 0 versus Ha: < 0 (one-sided)

3. H0: = 0 versus Ha: > 0 (one-sided)

Remember a p-value is computed assuming H0 is true, and 0 is the value used for that computation.

Page 5: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 5

Situation 1: Population of measurements of interest is approximately normal, and a random sample of any size is measured. In practice, use method if shape is not notably skewed or no extreme outliers.

Situation 2: Population of measurements of interest is not approximately normal, but a large random sample (n 30) is measured. If extreme outliers or extreme skewness, better to have a larger sample.

Step 2: Verify Necessary Data Conditions …

Page 6: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 6

The t-statistic is a standardized score for measuring the difference between the sample mean and the null hypothesis value of the population mean:

Continuing Step 2: The Test Statistic

This t-statistic has (approx) a t-distribution with df = n - 1.

ns

xt 0

error standard

valuenullmean sample

Page 7: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 7

• For Ha less than, the p-value is the area below t, even if t is positive.

• For Ha greater than, the p-value is the area above t, even if t is negative.

• For Ha two-sided, p-value is 2 area above |t|.

Step 3: Assuming H0 true, Find the p-value

Page 8: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 8

These two steps remain the same for all of the hypothesis tests considered in this discussion.

Choose a level of significance , and reject H0 if the p-value is less than (or equal to) .

Otherwise, conclude that there is not enough evidence to support the alternative hypothesis.

Steps 4 and 5: Decide Whether or Not the Result is Statistically Significant based on the p-value and Report the Conclusion in the Context of the Situation

Page 9: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 9

Example 13.1 Normal Body Temperature

What is normal body temperature? Is it actually less than 98.6 degrees Fahrenheit (on average)?

Step 1: State the null and alternative hypotheses

H0: = 98.6

Ha: < 98.6

where = mean body temperature in human population.

Page 10: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 10

Example 13.1 Normal Body Temp (cont)

Data: random sample of n = 16 normal body temps

Step 2: Verify data conditions …

98.4, 98.6, 98.8, 98.8, 98.0, 97.9, 98.5, 97.6, 98.4, 98.3, 98.9, 98.1, 97.3, 97.8, 98.4, 97.4

Boxplot shows no outliers nor strong skewness.

Sample mean of 98.2 is close to sample median of 98.35.

Page 11: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 11

Example 13.1 Normal Body Temp (cont)

Step 2: … Summarizing data with a test statistic

Key elements:

Sample statistic: = 98.200 (under “Mean”)

Standard error: (under “SE Mean”)

(under “T”)

124.016

497.0..

n

sxes

x

22.3124.0

6.982.980

ns

xt

Page 12: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 12

Example 13.1 Normal Body Temp (cont)

Step 3: Find the p-value

From output: p-value = 0.003

From Table A.3: p-value is less than 0.004.

Page 13: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 13

Example 13.1 Normal Body Temp (cont)

Step 4: Decide whether or not the result is statistically significant based on the p-value

Using = 0.05 as the level of significance criterion, the results are statistically significant because 0.003, the p-value of the test, is less than 0.05. In other words, we can reject the null hypothesis.

Step 5: Report the Conclusion

We can conclude, based on these data, that the mean temperature in the human population is actually less than 98.6 degrees.

Page 14: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 14

Rejection Region Approach (Optional)

Replaces Steps 3 and 4 with:

Substitute Step 3: Find the critical value and rejection region for the test.

Substitute Step 4: If the test statistic is in the rejection region, conclude that the result is statistically significant and reject the null hypothesis. Otherwise, do not reject the null hypothesis.

Note: Rejection region method and p-value method will always arrive at the same conclusion about statistical significance.

Page 15: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 15

Rejection Region Approach

Summary (use row of Table A.2 corresponding to df)

For Example 13.1 Normal Body Temperature?

Alternative was one-sided to the left, df = 15, and = 0.05.Critical value from table A.2 is –1.75.Rejection region is t – 1.75. The test statistic was –3.22 so the null hypothesis is rejected. Same conclusion is reached.

Page 16: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 16

13.4: Testing Hypotheses about Difference between Two Means

Step 1: Determine null and alternative hypothesesH0: 1 – 2 = versus

Ha: 1 – 2 or Ha: 1 – 2 < or Ha: 1 – 2 >

Watch how Population 1 and 2 are defined.

Lesson 1: the General (Unpooled) Case

Page 17: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 17

Step 2: Verify data conditions and compute the test statistic.

Both n’s are large or no extreme outliers or skewness in either sample. Samples are independent. The t-test statistic is:

2

22

1

21

21 0

error standard

valuenullmean sample

ns

ns

xxt

Steps 3, 4 and 5: Similar to t-test for one mean.

Page 18: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 18

Example 13.4 Effect of Stare on Driving

Question: Does stare speed up crossing times?

Step 1: State the null and alternative hypotheses

H0: 1 – 2 = versus Ha: 1 – 2 >

where 1 = no-stare population and 2 = stare population.

Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection.

Page 19: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 19

Example 13.3 Effect of Stare (cont) Data: n1 = 14 no stare and n2 = 13 stare responses

Step 2: Verify data conditions …

No outliers nor extreme skewness for either group.

Page 20: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 20

Example 13.3 Effect of Stare (cont) Step 2: … Summarizing data with a test statistic

Sample statistic: = 6.63 – 5.59 = 1.04 seconds

Standard error:

21 xx

43.013

822.0

14

36.1).(.

22

2

22

1

21

21 n

s

n

sxxes

41.2

43.0

004.10

2

22

1

21

21

ns

ns

xxt

Page 21: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 21

Example 13.3 Effect of Stare (cont)

Steps 3, 4 and 5: Determine the p-value and make a conclusion in context.

The p-value = 0.013, so we reject the null hypothesis,

the results are “statistically significant”.

The p-value is determined using a t-distribution with df = 21 (df using Welch approximation formula) and finding area to right of t = 2.41. Table A.3 p-value is between 0.009 and 0.015.

We can conclude that if all drivers were stared at, the mean crossing times at an intersection would be faster than under normal conditions.

Page 22: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 22

Lesson 2: Pooled Two-Sample t-Test

Based on assumption that the two populations have equal population standard deviations: 21

2

11 deviation standard Pooled

21

222

211

nn

snsnsp

2121

11).(. Pooled

nnsxxes p

Note: Pooled df = (n1 – 1) + (n2 – 1) = (n1 + n2 – 2).

21

2

21

11

0

error standard pooled

valuenullmean sample

nns

xxt

p

Page 23: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 23

Guidelines for Using Pooled t-Test

• If sample sizes are equal, pooled and unpooled standard errors are equal and so t-statistic is same. If sample standard deviations are similar, assumption of common population variance is reasonable and pooled procedure can be used.

• If sample sizes are very different, pooled test can be quite misleading unless sample standard deviations similar. If sample sizes very different and smaller standard deviation accompanies larger sample size, do not recommend using pooled procedure.

• If sample sizes are very different, standard deviations are similar, and larger sample size produced the larger standard deviation, pooled t-test is acceptable and will be conservative.

Page 24: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 24

The null and alternative hypotheses are:

H0: 1 – 2 = versus Ha: 1 – 2

where 1 = female population and 2 = male population.

Example 13.7 Male and Female Sleep Times

Data: The 83 female and 65 male responses from students in an intro stat class.

Note: Sample sizes similar, sample standard deviations similar. Use of pooled procedure is warranted.

Q: Is there a difference between how long female and male students slept the previous night?

Page 25: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 25

Example 13.5 Male and Female Sleep Times Two-sample T for sleep [without “Assume Equal Variance” option]

Sex N Mean StDev SE MeanFemale 83 7.02 1.75 0.19Male 65 6.55 1.68 0.21

95% CI for mu(f) – mu(m): (-0.10, 1.02)T-Test mu (f) = mu(m) (vs not =): T-Value = 1.62 P = 0.11 DF = 140

Two-sample T for sleep [with “Assume Equal Variance” option]

Sex N Mean StDev SE MeanFemale 83 7.02 1.75 0.19Male 65 6.55 1.68 0.21

95% CI for mu(f) – mu(m): (-0.10, 1.03)T-Test mu (f) = mu(m) (vs not =): T-Value = 1.62 P = 0.11 DF = 146Both use Pooled StDev = 1.72

Page 26: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 26

13.5 Relationship Between Tests and Confidence Intervals

For two-sided tests (for one or two means): H0: parameter = null value and Ha: parameter null value

Note: 95% confidence interval 5% significance level99% confidence interval 1% significance level

• If the null value is covered by a (1 – )100% confidence interval, the null hypothesis is not rejected and the test is not statistically significant at level .

• If the null value is not covered by a (1 – )100% confidence interval, the null hypothesis is rejected and the test is statistically significant at level .

Page 27: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 27

Confidence Intervals and One-Sided Tests

• If the null value is covered by the interval, the test is not statistically significant at level .

• For the alternative Ha: parameter > null value, the test is statistically significant at level if the entire interval falls above the null value.

• For the alternative Ha: parameter < null value, the test is statistically significant at level if the entire interval falls below the null value.

When testing the hypotheses: H0: parameter = null value versus a one-sided alternative, compare the null value to a (1 – 2)100% confidence interval:

Page 28: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 28

Example 13.9 Ear Infections and Xylitol

95% CI for p1 – p2 is 0.020 to 0.226

Reject H0: p1 – p2 = and accept Ha: p1 – p2 > with = 0.025, because the entire confidence

interval falls above the null value of 0.

Note that the p-value for the test was 0.01, which is less than 0.025.

Page 29: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 29

13.6 Choosing an Appropriate Inference Procedure

• Confidence Interval or Hypothesis Test?Is main purpose to estimate the numerical value of a parameter? …

or to make a “maybe not/maybe yes” conclusion about a specific hypothesized value for a parameter?

Page 30: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 30

13.6 Choosing an Appropriate Inference Procedure

• Determining the Appropriate ParameterIs response variable categorical or quantitative? Is there one sample or two? If two, independent or paired?

Page 31: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 31

13.7 Effect Size

Effect size is a measure of how much the truth differs from chance or from a control condition.

Effect size for a single mean: 01

d

Effect size for comparing two means:

21

d

Page 32: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 32

Estimating Effect Size

Estimated effect size for a single mean:

s

xd 0ˆ

Estimated effect size for comparing two means:

s

xxd 21ˆ

Relationship:

Test statistic = Size of effect Size of study

Page 33: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 33

13.8 Evaluating Significance in Research Reports

1. Is the p-value reported? If know p-value, can make own decision, based on severity of Type 1 error and p-value.

2. If word significant is used, determine whether used in everyday sense or in statistical sense only. Statistically significant just means that a null hypothesis has been rejected, no guarantee the result has real-world importance.

3. If you read “no difference” or “no relationship” has been found, determine whether sample size was small. Test may have had very low power because not enough data were collected to be able to make a firm conclusion.

Page 34: Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.

Copyright ©2011 Brooks/Cole, Cengage Learning 34

13.8 Evaluating Significance in Research Reports

4. Think carefully about conclusions based on extremely large samples. If very large sample size, even weak relationship or small difference can be statistically significant.

5. If possible, determine what confidence interval should accompany a hypothesis test. Intervals provide information about magnitude of effect as well as information about margin of error in sample estimate.

6. Determine how many hypothesis tests were conducted in study. Sometimes researchers perform multitude of tests, but only few achieve statistical significance. If all null hypotheses true, then ~1 in 20 tests will achieve statistical significance just by chance at the .05 level of significance.