STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

19
STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence

Transcript of STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

Page 1: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

STAT E100Section Week 11 – Hypothesis testing,

Paired t-test, Chi-square test for independence

Page 2: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

Course Review

- Project Proposals due Nov. 19th, email your TA.- Exam 2 is Nov 26th, practice tests have already been posted.- Exams are cumulative, about 20% future exams will be old stuff.- Email your TA to join the study group!

Page 3: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.
Page 4: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

Key Equations:

For 2-proportion z- significance:

with pooling:

For 2-proportion Z - interval:

21

21ˆnn

XXpp

Page 5: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

Key Equations:

For Paired t-tests:

Diff

Diff

DiffDiff

n

s

xt

1 Diffndf

In SPSS:Analyze → Compare Means → Paired -Samples T Test

Page 6: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

Key Equations:

For Chi-square test of Independence:

To calculate the contingency table:

To calculate the test statistic:

In order for this χ2 test to be valid, we need all the expected cell counts to be ≥ 5.

cellsall expected

)expectedobserved( 22

df = (#rows – 1) x (#cols – 1).

Page 7: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

Sample Question #1

  C 1B 2B SS 3B LF CF RF DH mean St. Dev.

NYY 0.23

6 0.294 0.2710.30

00.28

20.29

8 0.2590.28

1 0.247 0.272 0.0227

Boston0.22

2 0.312 0.3220.28

30.27

40.29

6 0.2750.28

0 0.264 0.282 0.0301

Difference0.01

4-

0.018 -0.0510.01

70.00

80.00

2 -0.0160.00

1-

0.017 -0.010 0.0225

2) In 2008, the Red Sox and Yankees starters’ batting averages were:

a) Perform a 2-sample t-test for these data. What is your conclusion?

b) Perform a paired t-test for these data. Do you results in parts a) and b) agree? Why or why not?

Page 8: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

Sample Question #1

  C 1B 2B SS 3B LF CF RF DH mean St. Dev.

NYY 0.23

6 0.294 0.2710.30

00.28

20.29

8 0.2590.28

1 0.247 0.272 0.0227

Boston0.22

2 0.312 0.3220.28

30.27

40.29

6 0.2750.28

0 0.264 0.282 0.0301

Difference0.01

4-

0.018 -0.0510.01

70.00

80.00

2 -0.0160.00

1-

0.017 -0.010 0.0225

2) In 2008, the Red Sox and Yankees starters’ batting averages were:

a) Perform a 2-sample t-test for these data. What is your conclusion?2- sample t- significance test Ho: μBOS - μNYY = 0 Ha: μBOS - μNYY ≠ 0

Since p > 0.05, we cannot reject the null hypothesis that there is no relationship Red Sox and Yankees starters’ batting averages. We do not have evidence to support the claim that the batting averages are statistically significantly different.

Page 9: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

Sample Question #1

  C 1B 2B SS 3B LF CF RF DH mean St. Dev.

NYY 0.23

6 0.294 0.2710.30

00.28

20.29

8 0.2590.28

1 0.247 0.272 0.0227

Boston0.22

2 0.312 0.3220.28

30.27

40.29

6 0.2750.28

0 0.264 0.282 0.0301

Difference0.01

4-

0.018 -0.0510.01

70.00

80.00

2 -0.0160.00

1-

0.017 -0.010 0.0225

2) In 2008, the Red Sox and Yankees starters’ batting averages were:

b) Perform a paired t-test for these data. Do you results in parts a) and b) agree? Why or why not?Paired t- test Ho: μDiff = 0 Ha: μDiff ≠ 0

Since p > 0.05, we cannot reject the null hypothesis. We do not have evidence to support the claim that the batting averages are statistically significantly different.The two tests agree here; but that is not always the case. For example, if n is different for the 2 groups, then a paired t-test cannot be performed in this manner.

Page 10: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

A study was conducted to determine if football helmets with newer anti-concussion technology (Riddell's Revolution helmet), actually led to a lower rate of concussions in high school football players compared to standard helmets. In an observational study in western Pennsylvania, 62 of 1173 Revolution helmet wearers suffered a concussion, while 74 of 968 standard helmet wearers suffered a concussion. Does this study provide evidence of a difference in the risk of suffering a concussion between wearing the two types of helmets?

http://journals.lww.com/neurosurgery/Abstract/2006/02000/Examining_Concussion_Rates_and_Return_to_Play_in.9.aspx

Sample Question #2

Page 11: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

A study was conducted to determine if football helmets with newer anti-concussion technology (Riddell's Revolution helmet), actually led to a lower rate of concussions in high school football players compared to standard helmets. In an observational study in western Pennsylvania, 62 of 1173 Revolution helmet wearers suffered a concussion, while 74 of 968 standard helmet wearers suffered a concussion. Does this study provide evidence of a difference in the risk of suffering a concussion between wearing the two types of helmets?

http://journals.lww.com/neurosurgery/Abstract/2006/02000/Examining_Concussion_Rates_and_Return_to_Play_in.9.aspx

There are 2 mathematically equivalent ways of doing this problem. This is a situation where the answers should agree.

Here is the first way:

2-proportion z- significance test Ho: p1 - p2 = 0 The proportion of individuals suffering a concussion between wearing the two types of helmets is the same. Ha: p1 - p2 ≠ 0 The proportion of individuals suffering a concussion between wearing the two types of helmets is not the same.

Since p < 0.05, we can reject the null hypothesis. We have sufficient evidence to suggest that there is a difference in the risk of suffering a concussion between wearing the two types of helmets.

Sample Question #2

Page 12: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

A study was conducted to determine if football helmets with newer anti-concussion technology (Riddell's Revolution helmet), actually led to a lower rate of concussions in high school football players compared to standard helmets. In an observational study in western Pennsylvania, 62 of 1173 Revolution helmet wearers suffered a concussion, while 74 of 968 standard helmet wearers suffered a concussion. Does this study provide evidence of a difference in the risk of suffering a concussion between wearing the two types of helmets?

http://journals.lww.com/neurosurgery/Abstract/2006/02000/Examining_Concussion_Rates_and_Return_to_Play_in.9.aspx

There are 2 mathematically equivalent ways of doing this problem. This is a situation where the answers should agree.

Here is the second way:

Chi-square test for Independence

H0: The risk of suffering a concussion is independent of the helmet typeHA: The risk of suffering a concussion is not independent of the helmet type.

This χ2 statistic has df = (2 – 1)*(2 – 1) = 1. Since the p-value < 0.05, reject the null hypothesis. We have enough evidence to suggest that the risk of suffering a concussion is associated with the helmet type.

Sample Question #2

Page 13: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

Below you will find a contingency table for the breakdown of gender within each class year in this semester’s Stat 104 class along with the χ2 test output.

Gender year | F M | Total

-----------+----------------------+---------- Freshman | 71 117 | 188 Junior | 18 12 | 30 Senior | 10 11 | 21Sophomore | 59 50 | 109 -----------+----------------------+----------

Total | 158 190 | 348

a) What are the hypotheses for this χ2 test in this situation?

b) What is the expected number of female Seniors?

Sample Question #3

Page 14: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

Below you will find a contingency table for the breakdown of gender within each class year in this semester’s Stat 104 class along with the χ2 test output.

Gender year | F M | Total -----------+----------------------+----------

Freshman | 71 117 | 188 Junior | 18 12 | 30 Senior | 10 11 | 21Sophomore | 59 50 | 109 -----------+----------------------+----------

Total | 158 190 | 348

a) What are the hypotheses for this χ2 test in this situation?H0: The gender breakdown is independent of class year in this semester’s Stat 104 class.HA: The gender breakdown is not independent of class year in this semester’s Stat 104 class.

b) What is the expected number of female Seniors?

(Row total *Column total)/n = 9.5344

Sample Question #3

Page 15: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

Below you will find a contingency table for the breakdown of gender within each class year in this semester’s Stat 104 class along with the χ2 test output.

Gender year | F M | Total

-----------+----------------------+---------- Freshman | 71 117 | 188 Junior | 18 12 | 30 Senior | 10 11 | 21Sophomore | 59 50 | 109 -----------+----------------------+----------

Total | 158 190 | 348

c) How many degrees of freedom are in this test?

d) SPSS report the chi-squared test statistic to be 10.39 for this table. What is the approximate p-value for this test?

e) What is your conclusion?

Sample Question #3

Page 16: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.

Below you will find a contingency table for the breakdown of gender within each class year in this semester’s Stat 104 class along with the χ2 test output.

Gender year | F M | Total -----------+----------------------+----------

Freshman | 71 117 | 188 Junior | 18 12 | 30 Senior | 10 11 | 21Sophomore | 59 50 | 109 -----------+----------------------+----------

Total | 158 190 | 348

c) How many degrees of freedom are in this test?

Df = (4-1)(2-1) = 3

d) SPSS report the chi-squared test statistic to be 10.39 for this table. What is the approximate p-value for this test?

0.02 > p > 0.01

e) What is your conclusion?

Since p < 0.05, we reject the null hypothesis. There is evidence to suggest that the gender breakdown is not independent of class year in this semester’s Stat 104 class.

Sample Question #3

Page 17: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.
Page 18: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.
Page 19: STAT E100 Section Week 11 – Hypothesis testing, Paired t-test, Chi-square test for independence.