Dan Piett STAT 211-019 West Virginia University Lecture 12.

17
Dan Piett STAT 211-019 West Virginia University Lecture 12

Transcript of Dan Piett STAT 211-019 West Virginia University Lecture 12.

Page 1: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Dan PiettSTAT 211-019

West Virginia University

Lecture 12

Page 2: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Last WeekHypothesis Tests on a difference in means Hypothesis Tests on a difference in

proportionsThe 2-sided alternative

Page 3: Dan Piett STAT 211-019 West Virginia University Lecture 12.

OverviewChi-Squared Goodness of Fit TestChi-Squared Test of Independence

Page 4: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Section 12.1

Chi-Squared Goodness of Fit Test

Page 5: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Multinomial DataPreviously we have looked at data coming

from a binomial distribution2 Outcomes (Success, Failure)Example: Flipping a coin (Heads, Tails)

Suppose we are interested in data with more than 2 outcomesExample: Rolling a die6 Outcomes (1, 2, 3, 4, 5, 6)

We obtain multinomial data from a multinomial experiment

Page 6: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Multinomial ExperimentsMultinomial Experiments follow these

properties1. Fixed number of trials, n2. Each trial results in exactly one of K

possible outcomes3. Probability pi, is the probability of getting

outcome i on a single trial p1 + p2 + p3 + … + pK = 1

4. Trials are independent

Page 7: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Finding Expected FrequenciesRemembering back to the binomial distribution

Expected Value = n*pFor our multinomial distribution we will have K

expected countsEach Expected Count; Ei = n*pi

Example: Rolling a fair 6-sided die 600 times (pi = 1/6)Outcome 1 2 3 4 5 6

Probability 1/6 1/6 1/6 1/6 1/6 1/6

Expected Counts

100 100 100 100 100 100

Page 8: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Observed FrequenciesWhen we do our multinomial experiment,

we will not always get exactly our expected counts.Example:

We expected 100 4’s on our dice experiment. Suppose we only get 85.

85 is our Observed Frequency; Oi

Our Observed Frequencies (Counts) are our actual data

Suppose on our 600 dice throws, these are our observed counts

Outcome 1 2 3 4 5 6

Expected Counts

100 100 100 100 100 100

Observed Counts

97 113 102 85 109 94

Page 9: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Chi-Squared Goodness of Fit TestSo the question to be asked when looking at a

table like this is “are our observed counts far enough from our expected counts to determine that the expected counts are wrong?”

This is what the Chi-Squared Goodness of Fit Test attempts to answer.

Note that our test will follow the 7 step procedure

Outcome 1 2 3 4 5 6

Expected Counts

100 100 100 100 100 100

Observed Counts

97 113 102 85 109 94

Page 10: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Chi-Square Goodness of Fit Test1. H0: p1 = #1, p2 = #2, … pK = #k

2. HA: At least one pi ≠ #i

3. Alpha is .05 if not specified

4. Test Statistic =

5. P-value will come from the Chi-Squared Table with df = k-1 P(Test Statistic > Chi Squared Tabled Value)

There is only 1 alternative hypothesis

6. Our decision rule will be to reject H0 if p-value < alpha

7. We have (do not have) enough evidence at the .05 level to conclude that the at least one of our probabilities is incorrect.

We require that our expected counts at each cell are at least 5 and that our sample is independent and random.

Page 11: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Example:For Fall 2013, 99 STAT 211 students were given

a choice of 3 section times (A,B,C) to take the final exam. The data that follows shows the number of students who selected each section. Does the data indicate that the students exhibit a preference, or indicate that all sections are equally likely to be chosen. Use alpha=.05 (Hint: If all 3 are equally likely, all pi’s will be 1/3)

Observed Counts:A – 40B – 30C – 29

Page 12: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Section 12.2

Chi-Squared Test for Independence

Page 13: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Association of Categorical VariablesThus far, all of our confidence intervals and

hypothesis tests have been done on numeric variables.

We will now shift our attention to categorical variablesEx: Eye Color, Class Rank

The question we wish to answer is, “is there an association between two categorical variables?”Ex: Is there an association between Eye Color and

Hair Color?We will use a Chi Squared Test to answer this

question, but first we need to discuss contingency tables.

Page 14: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Contingency Tables (Observed)We can organize categorical data in a

contingency table, with r rows and c columns. This is known as an r x c (r by c) contingency table. Note that the contingency tables contains observed counts

Example: Some Possible Values for Hair Color vs. Eye Color

Hair x Eye

Brown Blue Green

Black 90 20 8

Brown 65 22 9

Blonde 33 75 12

Page 15: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Contingency Tables (Expected)Much like the goodness of fit test, we will need

to calculate our expected counts.The formula for the expected counts isSo for the previous example

We now have Observed and Expected counts, so we can do a Chi-Squared Test for independence

Hair x Eye

Brown Blue Green Total

Black 110 (81.1) 20 (45.6) 8 (11.3) 138

Brown 65 (??) 22 (??) 9 (??) 96

Blonde 33 (??) 75 (??) 12 (??) 120

Total 208 117 29 354

Page 16: Dan Piett STAT 211-019 West Virginia University Lecture 12.

Chi-Squared Test for Independence1. H0: Variable 1 and Variable 2 are independent

2. HA: Variable 1 and Variable 2 are not independent (dependent)

3. Alpha is .05 if not specified

4. Test Statistic =

5. P-value will come from the Chi-Squared Table with df = (r-1)(c-1)

P(Test Statistic > Chi Squared Tabled Value) There is only 1 alternative hypothesis

6. Our decision rule will be to reject H0 if p-value < alpha

7. We have (do not have) enough evidence at the .05 level to conclude that the variables are dependent.

We require that our expected counts at each cell are at least 5 and that our sample is independent and random.

Page 17: Dan Piett STAT 211-019 West Virginia University Lecture 12.

ExampleDoes “test failure” reduce academic

aspirations and thereby contribute to a decision to drop out of school? A survey of 283 students is randomly selected from schools with low graduation rates. The contingency table below reports the results to the question “Do tests required for graduation discourage students from staying in school?” Does there appear to be a relationship between the schools’ location and the students’ responses?

Response x School

Urban Suburban Rural

Yes 57 27 47

No 23 16 12

Unsure 45 25 31