Chapter 11 The Chi-Square Test of Association/Independence

20
Chapter 11 Chapter 11 The Chi-Square Test of The Chi-Square Test of Association/Independence Association/Independence Target Goal: Target Goal: I can perform a chi-square test I can perform a chi-square test for association/independence to for association/independence to determine whether there is determine whether there is convincing evidence of an convincing evidence of an association between two association between two categorical variables. categorical variables. 11.2b 11.2b h.w: pg. 728: 49, 51, 53 - 58 h.w: pg. 728: 49, 51, 53 - 58

description

Chapter 11 The Chi-Square Test of Association/Independence. Target Goal: I can perform a chi-square test for association/independence to determine whether there is convincing evidence of an association between two categorical variables. 11.2b h.w : pg. 728: 49, 51, 53 - 58. - PowerPoint PPT Presentation

Transcript of Chapter 11 The Chi-Square Test of Association/Independence

Page 1: Chapter 11 The Chi-Square Test of Association/Independence

Chapter 11Chapter 11The Chi-Square Test of The Chi-Square Test of

Association/IndependenceAssociation/Independence

Target Goal:Target Goal:

I can perform a chi-square test for I can perform a chi-square test for association/independence to determine association/independence to determine whether there is convincing evidence of whether there is convincing evidence of an association between two categorical an association between two categorical variables.variables.

11.2b11.2b

h.w: pg. 728: 49, 51, 53 - 58h.w: pg. 728: 49, 51, 53 - 58

Page 2: Chapter 11 The Chi-Square Test of Association/Independence

The The chi-square testchi-square test can also be used to can also be used to show evidence that there is a show evidence that there is a relationship relationship between between two categorical variablestwo categorical variables. .

Use this if you have Use this if you have independent SRS’sindependent SRS’s from several populations where from several populations where one one variable is categoricalvariable is categorical and and the other is the the other is the sample number.sample number.

Or, if you have Or, if you have a single SRS a single SRS with each with each individual classified according to individual classified according to two two categorical variables.categorical variables.

Or, if you have an Or, if you have an entire populationentire population with with each individualeach individual classified according to classified according to two two categorical variables.categorical variables.

Page 3: Chapter 11 The Chi-Square Test of Association/Independence

Ex: Ex: Smoking and SESSmoking and SES

An example that classifies observations An example that classifies observations from from a single population a single population in two ways: in two ways: by smoking habits and SES.by smoking habits and SES.

In a study of heart disease in male In a study of heart disease in male federal employees, researchers federal employees, researchers classified 356 volunteer subjects classified 356 volunteer subjects according to their according to their socioeconomic status socioeconomic status (SES)(SES) and their and their smoking status.smoking status.

Page 4: Chapter 11 The Chi-Square Test of Association/Independence

Observed CountsObserved Counts for smoking and for smoking and SESSES

SESSES

Smoking Smoking High High Middle Low Middle Low Total Total

Current 51Current 51 2222 4343 116116

FormerFormer 9292 2121 2828 141141

NeverNever 6868 99 2222 99 99

TotalTotal 211211 5252 9393 356356

This is a 3x3 table with added margin totals.This is a 3x3 table with added margin totals.

Even though this example is different than Even though this example is different than comparing several proportions, we comparing several proportions, we can still can still apply the chi-square testapply the chi-square test because because the row the row and column variables and column variables are not related to are not related to each other.each other.

Page 5: Chapter 11 The Chi-Square Test of Association/Independence

The Chi-Square Test of The Chi-Square Test of Association/IndependenceAssociation/Independence

Use the chi-square test of Use the chi-square test of association/independence to test the association/independence to test the null hypothesis,null hypothesis,

HHoo: there is no relationship : there is no relationship between between two categorical variablestwo categorical variableswhen you have a two way table from a when you have a two way table from a single SRSsingle SRS, with each individual is , with each individual is classifiedclassified according according to both ofto both of two two categorical variables.categorical variables.

Page 6: Chapter 11 The Chi-Square Test of Association/Independence

SES cont.SES cont.

SES is the SES is the explanatory variableexplanatory variable therefore we need to compare the therefore we need to compare the column percents that give the column percents that give the conditional distribution of smoking conditional distribution of smoking within each SES category.within each SES category.

Page 7: Chapter 11 The Chi-Square Test of Association/Independence

Calculate Calculate ColumnColumn Percents:Percents:

51/211 = 0.242 about 24.2% of the 51/211 = 0.242 about 24.2% of the high-SES group are current smokers.high-SES group are current smokers.

Fill in the rest of the table.Fill in the rest of the table.

Page 8: Chapter 11 The Chi-Square Test of Association/Independence

Column percents for Smoking and SESColumn percents for Smoking and SES

SESSES

Smoking HighSmoking High Middle Low Middle Low

Current 24.2 42.3 46.2Current 24.2 42.3 46.2

FormerFormer 43.6 40.4 30.1 43.6 40.4 30.1

NeverNever 32.2 17.3 23.732.2 17.3 23.7

TotalTotal 100.0 100.0 100.0 100.0 100.0 100.0

What do the column percents suggest?What do the column percents suggest?

Page 9: Chapter 11 The Chi-Square Test of Association/Independence

There is a There is a negative associationnegative association between smoking and SES. between smoking and SES.

The lower the SES, the more likely to The lower the SES, the more likely to smoke.smoke.

Page 10: Chapter 11 The Chi-Square Test of Association/Independence

Computing Expected Cell Computing Expected Cell CountsCounts

116 x 211 116 x 211 = = 68.7568.75

356356

row total column totalexpected count

table total

Page 11: Chapter 11 The Chi-Square Test of Association/Independence

Expected CountExpected Count for Smoking for Smoking and SESand SES

SESSES

Smoking HighSmoking High Middle Low Middle Low Total Total

Current Current 68.7568.75 16.94 30.30 16.94 30.30 115.99115.99

FormerFormer 83.57 20.60 36.83 83.57 20.60 36.83 141.00141.00

Never Never 58.68 14.46 25.86 58.68 14.46 25.86 99.0099.00

TotalTotal 211 52 92.99 211 52 92.99 355.99355.99

Page 12: Chapter 11 The Chi-Square Test of Association/Independence

Chi-square Test for Chi-square Test for Association/IndependenceAssociation/Independence

Step 1: Step 1: State -State - We want to perform a We want to perform a test oftest of

HHoo: There is : There is no associationno association between between smoking and SES.smoking and SES.

HHaa: There is an association between : There is an association between smoking and SES.smoking and SES.

Page 13: Chapter 11 The Chi-Square Test of Association/Independence

Step 2: Step 2: Plan Plan If conditions are met, we should carry out a If conditions are met, we should carry out a

chi-square test of chi-square test of association/independence.association/independence.

Random: Random: The subjects were volunteers, we The subjects were volunteers, we may may not be able to generalize our results.not be able to generalize our results.

Large Sample Size:Large Sample Size:To use chi-square we must To use chi-square we must check all check all

expected countsexpected counts.. We did this and We did this and all counts ≥ 1all counts ≥ 1 and and no no

more than 20% < 5more than 20% < 5. .

Page 14: Chapter 11 The Chi-Square Test of Association/Independence

Independence:Independence:Because we are sampling without Because we are sampling without

replacement, we need to replacement, we need to check the 10% check the 10% conditioncondition.. It is safe to assume that the It is safe to assume that the total number of male federal employees total number of male federal employees is at least 10(356) = 3560.is at least 10(356) = 3560.

Thus, Thus, knowing the values of both knowing the values of both variables for one person gives us no variables for one person gives us no meaningful information about the meaningful information about the variables for another personvariables for another person. So, . So, individual observations are individual observations are independent.independent.

Page 15: Chapter 11 The Chi-Square Test of Association/Independence

Step 3: Step 3: Carry out the inference Carry out the inference procedure.procedure.

The test statisticThe test statistic

Calculate by hand Calculate by hand with df = (r-1)(c-1) = with df = (r-1)(c-1) = Or with calculator, need to enter Or with calculator, need to enter

observed countsobserved counts into into matrix table Amatrix table A..Note: the Note: the calculator will calculate the calculator will calculate the

expected countsexpected counts for you when you for you when you execute the Xexecute the X22 test. test.

22 O E

E

Page 16: Chapter 11 The Chi-Square Test of Association/Independence

Note: if doing by hand, could write Note: if doing by hand, could write calculator program to do “expected counts” calculator program to do “expected counts”

or must do by hand.or must do by hand.

Enter observed values in matrix A,Enter observed values in matrix A,Then Then STAT:TESTS: -TestSTAT:TESTS: -TestThe calculator enters expected The calculator enters expected

values in matrix B.values in matrix B.P-value = P-value = .00098.00098

Note: Note: the association does not mean the association does not mean that SES that SES causes smoking behavior.causes smoking behavior.

2

Page 17: Chapter 11 The Chi-Square Test of Association/Independence

Step 4: Step 4: Conclude –Conclude – Interpret the results in Interpret the results in context.context.

With a p-value this low, we reject the With a p-value this low, we reject the null hypothesis at the alpha = .01 null hypothesis at the alpha = .01 level and conclude that there is level and conclude that there is strong evidence of an association strong evidence of an association betweenbetween smoking and SES in the smoking and SES in the population of male federal population of male federal employees.employees.

Page 18: Chapter 11 The Chi-Square Test of Association/Independence

Computer OutputComputer Output

Page 19: Chapter 11 The Chi-Square Test of Association/Independence

Follow-up AnalysisFollow-up AnalysisIn

fere

nce

for

Infe

ren

ce fo

r R

ela

tion

ship

sR

ela

tion

ship

sStart by examining which cells in the two-way table show large deviations between the observed and expected counts. Then look at the individual components to see which terms contribute most to the chi-square statistic.Minitab output for the wine and music study displays the individual components that contribute to the chi-square statistic.

Page 20: Chapter 11 The Chi-Square Test of Association/Independence

Follow-up AnalysisFollow-up AnalysisIn

fere

nce

for

Infe

ren

ce fo

r R

ela

tion

ship

sR

ela

tion

ship

s

Looking at the output, we see that just two of the nine components that make up the chi-square statistic contribute about 14 (almost 77%) of the total χ2 = 18.28.

We are led to a specific conclusion: sales of Italian wine are strongly affected by Italian and French music.