Chi-square test or c 2 test
description
Transcript of Chi-square test or c 2 test
![Page 1: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/1.jpg)
![Page 2: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/2.jpg)
What if we are interested in seeing if my “crazycrazy” dice are considered “fair”?
What can I do?
![Page 3: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/3.jpg)
Used to test the countscounts of categorical data
ThreeThree types◦Goodness of fit (univariate)◦Independence (bivariate)◦Homogeneity (univariate with two samples)
![Page 4: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/4.jpg)
df=3
df=5
df=10
![Page 5: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/5.jpg)
Different df have different curves
Skewed rightAs df increases, curve shifts toward right & becomes more like a normal curvenormal curve
![Page 6: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/6.jpg)
SRS SRS – reasonably random sample Have countscounts of categorical data & we expect each category to happen at least once
Sample sizeSample size – to insure that the sample size is large enough we should expect at least five in each category.
***Be sure to list expected counts!!
Combine these together:
All expected counts are at
least 5.
![Page 7: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/7.jpg)
exp
expobs 22
![Page 8: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/8.jpg)
Uses univariate dataWant to see how well the observed counts “fit” what we expect the counts to be
Use 22cdf functioncdf function on the calculator to find p-valuesp-values
Based on df –Based on df –
df = number of df = number of categoriescategories - 1 - 1
![Page 9: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/9.jpg)
H0: the observed counts equal the expected countsHa: the observed counts are not
equal to the expected counts
Be sure to write in context!
![Page 10: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/10.jpg)
Let’s test our dice!Let’s test our dice!
![Page 11: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/11.jpg)
Does your zodiac sign determine how successful you will be? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Is there sufficient evidence to claim that successful people are more likely to be born under some signs than others?
Aries 23 Libra 18 Leo20
Taurus 20 Scorpio 21 Virgo 19
Gemini 18 Sagittarius19 Aquarius24
Cancer 23 Capricorn 22 Pisces29
How many would you expect in each sign if there were no difference between them?
How many degrees of freedom?
I would expect CEOs to be equally born under all signs.
So 256/12 = 21.333333Since there are 12 signs –
df = 12 – 1 = 11
![Page 12: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/12.jpg)
Assumptions:
•Have a random sample of CEO’s
•All expected counts are greater than 5. (I expect 21.33 CEO’s to be born in each sign.)
H0: The number of CEO’s born under each sign is the same.
Ha: The number of CEO’s born under each sign is the different.
P-value = 2cdf(5.094, 10^99, 11) = .9265 = .05
Since p-value > , I fail to reject H0. There is not
sufficient evidence to suggest that the CEOs are born under some signs more often than others.
094.5
3.21
3.2129...
3.21
3.2120
3.21
3.2123222
2
![Page 13: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/13.jpg)
A company says its premium mixture of nuts contains 10% Brazil nuts, 20% cashews, 20% almonds, 10% hazelnuts and 40% peanuts. You buy a large can and separate the nuts. Upon weighing them, you find there are 112 g Brazil nuts, 183 g of cashews, 207 g of almonds, 71 g or hazelnuts, and 446 g of peanuts. You wonder whether you mix is significantly different from what the company advertises?
Why is the chi-square goodness-of-fit test NOT appropriate here?
What might you do instead of weighing the nuts in order to use chi-square?
Because we do NOT have countscounts
of the type of nuts.We could countcount the
number of each type of nut and then perform a
2 test.
![Page 14: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/14.jpg)
Offspring of certain fruit flies may have yellow or ebony bodies and normal wings or short wings. Genetic theory predicts that these traits will appear in the ratio 9:3:3:1 (yellow & normal, yellow & short, ebony & normal, ebony & short) A researcher checks 100 such flies and finds the distribution of traits to be 59, 20, 11, and 10, respectively. What are the expected counts? df?
Are the results consistent with the theoretical distribution predicted by the genetic model? (see next page)
Expected counts:Y & N = 56.25Y & S = 18.75E & N = 18.75E & S = 6.25We expect 9/16 of the
100 flies to have yellow and normal
wings. (Y & N)
Since there are 4 categories,
df = 4 – 1 = 3
![Page 15: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/15.jpg)
Assumptions:
•Have a random sample of fruit flies
•All expected counts are greater than 5. Expected counts:Y & N = 56.25, Y & S = 18.75, E & N = 18.75, E & S = 6.25
H0: The distribution of fruit flies is the same as the theoretical model.
Ha: The distribution of fruit flies is not the same as the theoretical model.
P-value = 2cdf(5.671, 10^99, 3) = .129 = .05
Since p-value > , I fail to reject H0. There is not sufficient evidence to suggest that the distribution of fruit flies is not the same as the theoretical model.
671.5
25.625.610
...75.18
75.182025.56
25.5659 2222
![Page 16: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/16.jpg)
Used with categorical, bivariate data from ONE sample
Used to see if the two categorical variables are associated (dependent) or not associated (independent)
![Page 17: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/17.jpg)
![Page 18: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/18.jpg)
H0: two variables are independent
Ha: two variables are dependent
Be sure to write in context!
![Page 19: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/19.jpg)
A beef distributor wishes to determine whether there is a relationship between geographic region and cut of meat preferred. If there is no relationship, we will say that beef preference is independent of geographic region. Suppose that, in a random sample of 500 customers, 300 are from the North and 200 from the South. Also, 150 prefer cut A, 275 prefer cut B, and 75 prefer cut C.
![Page 20: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/20.jpg)
North South Total
Cut A 150
Cut B 275
Cut C 75
Total 300 200 500
90 60
165
110
45 30
![Page 21: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/21.jpg)
Assuming H0 is true,
totaltable
alcolumn tot totalrow counts expected
![Page 22: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/22.jpg)
)1c)(1(r df
Or cover up one row & one column & count the number of cells remaining!
![Page 23: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/23.jpg)
Now suppose that in the actual sample of 500 consumers the observed numbers were as follows:
Is there sufficient evidence to suggest that geographic regions and beef preference are not independent? (Is there a difference between the expected and observed counts?)
North South Total
Cut A 100 50 150
Cut B 150 125 275
Cut C 50 25 75
Total 300 200 500
![Page 24: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/24.jpg)
Assumptions:
•Have a random sample of people
•All expected counts are greater than 5.
H0: geographic region and beef preference are
independent Ha: geographic region and beef
preference are dependent
P-value = .0226 df = 2 = .05
Since p-value < , I reject H0. There is sufficient evidence to suggest that geographic region and beef preference are dependent.
576.7...
606050
9090100 22
2
Expected Counts:
N S
A 90 60
B 165 110
C 45 30
![Page 25: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/25.jpg)
Used with a single single categoricalcategorical variable from two (or more) two (or more) independent samplesindependent samples
Used to see if the two populations are the same (homogeneous)
![Page 26: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/26.jpg)
Assumptions & formula remain the same!
Expected counts & df are found the same way as test for independence.
OnlyOnly change is the hypotheses!
![Page 27: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/27.jpg)
H0: the two (or more) distributions are the sameHa: the distributions are different
Be sure to write in context!
![Page 28: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/28.jpg)
The following data is on drinking behavior for independently chosen random samples of male and female students. Does there appear to be a gender difference with respect to drinking behavior? (Note: low = 1-7 drinks/wk, moderate = 8-24 drinks/wk, high = 25 or more drinks/wk)
![Page 29: Chi-square test or c 2 test](https://reader036.fdocuments.in/reader036/viewer/2022062518/5681449c550346895db147fa/html5/thumbnails/29.jpg)
Assumptions:
•Have 2 random sample of students
•All expected counts are greater than 5.
H0: drinking behavior is the same for female & male
students Ha: drinking behavior is not the same for female & male students
P-value = .000 df = 3 = .05
Since p-value < , I reject H0. There is sufficient evidence to suggest that drinking behavior is not the same for female & male students.
53.96...
4.1674.167186
6.1586.158140 22
2
Expected Counts:
M F
0 158.6 167.4
L 554.0 585.0
M 230.1 243.0
H 38.4 40.6