Lecture 07 Category Shaoqi Rao Rev
-
Upload
sumit-prajapati -
Category
Business
-
view
640 -
download
1
description
Transcript of Lecture 07 Category Shaoqi Rao Rev
![Page 1: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/1.jpg)
1
Chapter 6
Chi-Square Test for Categorical Variable
Shaoqi Rao, PhD
2009.11.9
Slides adapted from Dr. Zhang Jinxin’s
![Page 2: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/2.jpg)
2
6.1 Basic logic of 6.1 Basic logic of 22 test testGiven a set of observed frequency distribution
A1, A2, A3 …
to test whether the data follow certain theory.If the theory is true, then we will have a set
of theoretical frequency distribution:
T1, T2, T3 …
Comparing A1, A2, A3 … and T1, T2, T3 …
If they are quite different, then the theory might not be true;
Otherwise, the theory is acceptable.
![Page 3: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/3.jpg)
3
6.1.16.1.1 Chi-square distribution Chi-square distribution
~ 2 distribution
—— Agreement between observed and expected frequencies
k
i i
iiP e
ef
1
22 )(
DF=k-1-# parameters estimating fi
For a contingency table,
DF=(# rows-1)(# columns-1)
![Page 4: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/4.jpg)
4
22 distributiondistribution
0 2 4 6 8 100.0
0.1
0.2
0.3
![Page 5: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/5.jpg)
5
6.1.2 χ2Test for Goodness of Fit (Large Sample)
Table1 Frequency distribution and goodness of fit based on 136 measurements to the phantom( 体模 )
intervals A Φ(X1) Φ(X2) P(X) T=n* P(X) (A-T)2/T
1.228- 2 0.00069 0.00466 0.00397 0.5405 3.94143
1.234- 2 0.00466 0.02275 0.01809 2.4601 0.08605
1.240- 7 0.02275 0.08076 0.05801 7.8889 0.10016
1.246- 17 0.08076 0.21186 0.13110 17.8294 0.03859
1.252- 25 0.21186 0.42074 0.20888 28.4083 0.40892
1.258- 37 0.42074 0.65542 0.23468 31.9167 0.80961
1.264- 25 0.65542 0.84134 0.18592 25.2855 0.00322
1.270- 16 0.84134 0.94520 0.10386 14.1244 0.24906
1.276- 4 0.94520 0.98610 0.04090 5.5618 0.43858
1.282- 1 0.98610 0.99744 0.01135 1.5434 0.19130
合 计 - - - - - 6.26692
00.201.0
26.1240.1
Z 40.1
01.0
26.1246.1
Z
![Page 6: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/6.jpg)
6
1. Setting up hypothesesH0 : the population follows N(1.26,0.012)
H1 : the population doesn’t follow N(1.26,0.012) α=0.05
2. Calculation of the statistic :
3. P-value : ν=k-1-2=10-1-2=7
4. Conclusion : With significance level α=0.05, H0 is not rejected. The measurement follows the normal distribution.
27.6
22
T
TA
5.0
07.14,35.62
7,5.02
27,05.0
27,5.0
P
![Page 7: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/7.jpg)
7
6.2 Comparison between Two Independent
Sample Proportions
In chapter 4 the Z test can only be used
for comparing with a given 0 (one sample)
or comparing 1 with 2 (two samples).
If we need to compare more than two
samples, Chi-square test is widely used.
![Page 8: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/8.jpg)
8
Example 6.1Example 6.1 In a clinical survey, 215 patients with pulmona
ry heart disease ( 肺心病 ) in a hospital were collected , of which 164 patients have taken digitalis ( 洋地黄 ) and 51 patients haven’t taken it. Each of them received an ECG examination. The results are listed in Table 6.2.
![Page 9: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/9.jpg)
9
Table 6.2 Data of patients of pulmonary heart disease with arrhythmia(心律失常)
ECG
Arrhythmia Normal Total Arrhythmia rate (%)
With digitalis 81 83 164 49.39
Without digitalis 19 32 51 37.25
Total 100 115 215 46.51
![Page 10: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/10.jpg)
10
Table 6.2 Data of patients of pulmonary heart disease with arrhythmia(心律失常)
ECG
Arrhythmia Normal Total Arrhythmia rate (%)
With digitalis 81(76.28) 83(87.72) 164 49.39
Without digitalis 19(23.72) 32(27.28) 51 37.25
Total 100 115 215 46.51
![Page 11: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/11.jpg)
11
2P =
11510051164
215)19833281( 2
=2.3028
2
1
2
1
22 )(
i j ij
ijijP e
ef
3028.228.27
)28.2732(
72.23
)72.2319(
72.87
)72.8783(
28.76
)28.7681( 22222
p
2121
2211222112 )(
ccrrp nnnn
nffff
ν = 1
![Page 12: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/12.jpg)
12
22 test and test and ZZ test test
According to (4.25)
5175.1)51/1164/1)(215/115)(215/100(
51/19164/81
Z
3028.22 Z)( 25.4
11)1(
2100
21
nnPP
PPz
![Page 13: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/13.jpg)
13
Correction for continuityCorrection for continuity
When n≥40, if there happens 1≤eij<5,
2
1
2
1
2
2)5.0(
i j ij
ijij
P e
ef
2121
2211222112 )2/(
ccrrP nnnn
nnffff
![Page 14: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/14.jpg)
14
Fisher’s exact testFisher’s exact test
When n<40, or eij<1, with SPSS, 2 test is not proper then. An exact P value will be obtained for us to give conclusion.
This can be easily fulfilled in SPSS.
![Page 15: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/15.jpg)
15
Example 6.9Example 6.9
Table 6.14 The results of treatment to embolic angitis(栓塞性脉管炎)patients Groups Recovery No recovery Total
New treatment 6(a) 1(b) 7(nr1) Control 1(c) 4(d) 5(nr2)
Total 7(nc1) 5(nc2) 12(n)
![Page 16: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/16.jpg)
16
Statistical descriptionStatistical description
group * result Crosstabulation
6 1 7
4.1 2.9 7.0
85.7% 14.3% 100.0%
1 4 5
2.9 2.1 5.0
20.0% 80.0% 100.0%
7 5 12
7.0 5.0 12.0
58.3% 41.7% 100.0%
Count
Expected Count
% within group
Count
Expected Count
% within group
Count
Expected Count
% within group
new treatment
control
group
Total
recovery no recovery
result
Total
![Page 17: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/17.jpg)
17
Chi-Square Tests
5.182b 1 .023
2.831 1 .092
5.555 1 .018
.072 .045
4.750 1 .029
12
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-LinearAssociation
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
4 cells (100.0%) have expected count less than 5. The minimum expected count is2.08.
b.
Statistical inferenceStatistical inference
![Page 18: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/18.jpg)
18
6.3 The 6.3 The 22 Tests for Binary Tests for Binary Variable under a Paired DesignVariable under a Paired Design
Example 6.2 There are 260 serum ( 血清 ) samples. Each sample is divided into two and tested by two different methods of immunological test of rheumatoid factor( 类风湿因子 ) respectively. The results are listed in Table 6.4. Now the question is that results of two methods are independent or not.
![Page 19: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/19.jpg)
19
test for independence between test for independence between two binary variablestwo binary variables
Table 6.4 The results of two immunological tests B
A + -
Total
+ 172 8 180 - 12 68 80
Total 184 76 260
2121
2211222112 )2/(
ccrrP nnnn
nnffff 2 2 =173.74=173.74
Example 6.2Example 6.2
12/80=15%12/80=15%172/180=95%172/180=95%
![Page 20: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/20.jpg)
20
6.3.2 Comparison between 6.3.2 Comparison between two sample proportionstwo sample proportions
McNemar testMcNemar test
2112
22112 )(
ff
ff
2 2 ==
![Page 21: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/21.jpg)
21
H0: 1=2, H1: 1≠2, α=0.05When H0 is true,
For large sample (b+c>40)
If the 2 > 2 , then reject H0
221
cbTT
cb
cbcb
cbc
cb
cbb
222
2 )(
2
)2
(
2
)2
(
0.05
![Page 22: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/22.jpg)
22
The Probability ExpressionsThe Probability Expressions
Trt A Trt B Total
+ -
+ 11 (a) 12 (b) r1
- 21 (c) 22 (d) r2
Total c1 c2 1.0
H0: c1= r1 H1: c1 r1
Since c1= 11+ 21, r1= 11+ 12,
This test becomes: H0: 12= 21, H1: 12 21
![Page 23: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/23.jpg)
23
Correction to McNemar testCorrection to McNemar test((ff1212++ff2121<40)<40)
2112
22112 )1(
ff
ff
2 2 == 112
128
)1128( 2
2 2 = =0.45= =0.45
![Page 24: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/24.jpg)
24
6.4 The 6.4 The 22 Test for R×C Test for R×C Contingency TableContingency Table
Table 6.6 Blood types of patient suffering from different diseases Blood type Total
Disease status A B O
Digestive ulcer 679 134 983 1796 Stomach cancer 416 84 383 883
Control 2625 570 2892 6087 Total 3720 788 4258 8766
![Page 25: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/25.jpg)
25
The statistic for hypothesis testThe statistic for hypothesis test
i j cjri
ij
nn
fn )1(
2
2 2 ==
543.40142586087
2892
37201796
6798766
222
P
4)1()1( CR
205.0 =9.488=9.488
![Page 26: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/26.jpg)
26
6.4.2 Multiple comparison6.4.2 Multiple comparison for R×C Table for R×C Table
group + -
I
II
III
IV
V
…
…
…
…
…
…
…
…
…
…
VI … …controlcontrol
![Page 27: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/27.jpg)
27
6.4.3 6.4.3 Measurement of Measurement of association for R×C tableassociation for R×C table
Table 6.11 Blood type of 1043 patients MN system ABO
system M N MN Total
O 85 100 150 335
A 56 78 120 254
B 98 132 170 400 AB 23 25 6 54
Total 262 335 446 1043
![Page 28: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/28.jpg)
28
Pearson contingency coefficientPearson contingency coefficient
2
2
P
PP nr
156.0925.251043
925.25
Pr
![Page 29: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/29.jpg)
29
Pre-requisite for 2 test
By experience, The theoretical frequencies should be grea
ter than 5 in more than 4/5 cells; The theoretical frequency in any cell shoul
d be greater than 1.
Otherwise, we need to use Fisher exact test.
![Page 30: Lecture 07 Category Shaoqi Rao Rev](https://reader034.fdocuments.in/reader034/viewer/2022051515/5538eab3550346f53d8b48b8/html5/thumbnails/30.jpg)
30