Post on 23-Dec-2015
The 2x2 table,The 2x2 table,RxCxK contingency tables,RxCxK contingency tables,
and pair-matched dataand pair-matched data
July 27, 2004July 27, 2004
Introduction to the 2x2 TableIntroduction to the 2x2 Table
Exposure (E) No Exposure (~E)
Disease (D) a b a+b = P(D)
No Disease (~D) c d c+d = P(~D)
a+c = P(E) b+d = P(~E)
Marginal probability of disease
Marginal probability of exposure
Cohort StudiesCohort Studies
Target population
Exposed
Not Exposed
Disease-free cohort
Disease
Disease-free
Disease
Disease-free
TIME
Exposure (E) No Exposure (~E)
Disease (D) a b
No Disease (~D) c d
a+c b+d
)/()/(
)~/(
)/(
dbbcaa
EDP
EDPRR
risk to the exposed
risk to the unexposed
The Risk Ratio, or Relative Risk (RR)
400 400
1100 2600
0.23000/4001500/400 RR
Hypothetical DataHypothetical Data
Normal BP
Congestive Heart Failure
No CHF
1500 3000
High Systolic BP
Case-Control StudiesCase-Control Studies
Sample on disease status and ask retrospectively about exposures (for rare diseases) Marginal probabilities of exposure for cases and
controls are valid.
• Doesn’t require knowledge of the absolute risks of disease
• For rare diseases, can approximate relative risk
Target population
Exposed in past
Not exposed
Exposed
Not Exposed
Case-Control StudiesCase-Control Studies
Disease
(Cases)
No Disease
(Controls)
bc
adOR
db
ca
Exposure (E) No Exposure (~E)
Disease (D) a = P (D& E) b = P(D& ~E)
No Disease (~D) c = P (~D&E) d = P (~D&~E)
The Odds Ratio (OR)
The Odds RatioThe Odds Ratio
RR
OR
EDPEDP
EDPEDP
EDPEDP
EDPEDP
EDPEDP
DEPDEP
DEPDEP
)~/()/(
)~/(~)~/(
)/(~)/(
)~&(~)&(~
)~&()&(
)~/(~)~/(
)/(~)/(
When disease is rare: P(~D) 1
“The Rare Disease Assumption”
Via Bayes’ Rule
1
1
0 0.35 0.7 1.05 1.4 1.75 2.1 2.45 2.8 3.15 3.5 0
1
2
3
4
5
6
P e r c e n t
Simulated Odds Ratio
Properties of the OR (simulation)
Properties of the lnOR
-1.05 -0.75 -0.45 -0.15 0.15 0.45 0.75 1.05 1.35 1.65 1.95 0
2
4
6
8
10
P e r c e n t
lnOR
Standard deviation =
dcba
1111
Standard deviation =
Hypothetical DataHypothetical Data
0.8)10)(6(
)24)(20(OR
25.8) - (2.47(8.0)e ,(8.0)e CI %95 24
1
10
1
6
1
20
196.1
24
1
10
1
6
1
20
196.1
Smoker Non-smoker
Lung Cancer 20 10
No lung cancer 6 24
30
30
Note that the size of the smallest 2x2 cell determines the magnitude of the variance
Example: Cell phones and brain Example: Cell phones and brain tumors (cross-sectional data)tumors (cross-sectional data)
22.10156.
019.
91
)982)(.018(.
352
)982)(.018(.
)033.014(.
018.453
8;
)1)(()1)((
0)ˆˆ(
033.91
3;014.
352
5
21
21
//
Z
p
n
pp
n
pp
ppZ
pp nophonetumorcellphonetumor
Brain tumor No brain tumor
Own a cell phone
5 347 352
Don’t own a cell phone
3 88 91
8 435 453
Same data, but use Chi-square testSame data, but use Chi-square testor Fischer’s exactor Fischer’s exact
48.122.1:note
48.17.345
345.7)-(347
3.89
88)-(89.3
7.1
1.7)-(3
3.6
6.3)-(8
df 11111
d cellin 89.3 b; cellin 345.7
c; cellin 1.7 6.3;453*.014 a cellin Expected
014.777.*018.
777.453
352;018.
453
8
22
2222
12
Z
NS
*))*(C-(R-
xpp
pp
cellphonetumor
cellphonetumor
Brain tumor No brain tumor
Own 5 347 352
Don’t own 3 88 91
8 435 453
Same data, but use Odds RatioSame data, but use Odds Ratio
Brain tumor No brain tumor
Own a cell phone
5 347 352
Don’t own a cell phone
3 88 91
8 435 453
05.;16.174.
86.
88
1
3
1
347
1
5
1
)423ln(.
1111
0-lnORZ
423.347*3
88*5OR
p
dcba
Adding a Third Dimension to Adding a Third Dimension to the the RxCRxC picture picture
Exposure Disease?
Mediator
Confounder
Effect modifier
ConfoundingConfounding
A confounding variable is associated with the exposure and it affects the outcome, but it is not an intermediate link in the chain of causation between exposure and outcome.
Examples of ConfoundingExamples of Confounding
Poor nutrition?
Menstrual irregularity
Low weight
Menstrual irregularity?
Low bone strength
Late menarche
Controlling for confounders in Controlling for confounders in medical studiesmedical studies
1. Confounders can be controlled for in the design phase of a study (randomization or restriction or matching).
2. Confounders can be controlled for in the analysis phase of a study (stratification or multivariate regression).
Analytical identification of Analytical identification of confounders through confounders through
stratification stratification
Mantel-Haenszel Procedure:Mantel-Haenszel Procedure:Non-regression technique used to Non-regression technique used to
identify confounders and to control identify confounders and to control for confounding in the for confounding in the statistical statistical
analysisanalysis phase rather than the phase rather than the designdesign phase of a study.phase of a study.
Stratification: “Series of 2x2 Stratification: “Series of 2x2 tables”tables”
Idea: Take a 2x2 table and break it into a series of smaller 2x2 tables (one table at each of J levels of the confounder yields J tables).
Example: in testing for an association between lung cancer and alcohol drinking (yes/no), separate smokers and non-smokers.
“It is more informative to estimate the strength of association than simply to test a hypothesis about it.”
“When the association seems stable across partial tables, we can estimate an assumed common value of the k true odds ratios.”
Controlling for confounding by Controlling for confounding by stratificationstratification
Example: Gender Bias at Berkeley?(From: Sex Bias in Graduate Admissions: Data from Berkeley, Science 187: 398-403; 1975.)
Crude RR = (1276/1835)/(1486/2681) =1.25
(1.20 – 1.32)
Denied
Admitted
1835 2681
Female Male
1276 1486
559 1195
Program AProgram A
Stratum 1 = only those who applied to program A
Stratum-specific RR = .90 (.87-.94)
Denied
Admitted
108 825
Female Male
19 314
89 511
Program BProgram B
Stratum 2 = only those who applied to program B
Stratum-specific RR = .99 (.96-1.03)
Denied
Admitted
25 560
Female Male
8 208
17 352
Program CProgram C
Stratum 3 = only those who applied to program C
Stratum-specific RR = 1.08 (.91-1.30)
Denied
Admitted
593 325
Female Male
391 205
202 120
Program DProgram D
Stratum 4 = only those who applied to program D
Stratum-specific RR = 1.02 (.89-1.18)
Denied
Admitted
375 407
Female Male
248 265
127 142
Program EProgram E
Stratum 5 = only those who applied to program E
Stratum-specific RR = .88 (.67-1.17)
Denied
Admitted
393 191
Female Male
289 147
104 44
Program FProgram F
Stratum 6 = only those who applied to program F
Stratum-specific RR = 1.09 (.84-1.42)
Denied
Admitted
341 373
Female Male
321 347
20 26
SummarySummary
Crude RR = 1.25 (1.20 – 1.32)
Stratum specific RR’s:.90 (.87-.94)
.99 (.96-1.03)1.08 (.91-1.30) 1.02 (.89-1.18).88 (.67-1.17)
1.09 (.84-1.42)
Maentel-Haenszel Summary RR: .97
Cochran-Mantel-Haenszel Test is NS. Gender and denial of admissions are conditionally independent given program.The apparent association (RR=1.25) was due to confounding.
The Mantel-Haenszel The Mantel-Haenszel Summary Risk RatioSummary Risk Ratio
k
i i
iii
k
i i
iii
T
bacT
dca
1
1
)(
)(
Disease
Not Disease
Exposure Not Exposed
a c
b d
k strata
E.g., for Berkeley…E.g., for Berkeley…
97.
714)341(347
584)393(147
782)375(265
918)593(205
585)25(208
933)108(314
714)373(321
584)191(289
782)407(248
918)325(391
585)560(8
933)825(19
The Mantel-Haenszel The Mantel-Haenszel Summary Odds RatioSummary Odds Ratio
Exposed
Not Exposed
Case Control
a b
c d
k
i i
ii
k
i i
ii
T
cbT
da
1
1
Country
OR = 1.32 Spouse smokes
Spouse does not smoke
137 363
71 249US
Spouse smokes
Spouse does not smoke
19 38
5 16
Great
Britain
OR = 1.6
Spouse smokes
Spouse does not smoke
Lung Cancer Control
73 188
21 82Japan OR = 1.52
Source: Blot and Fraumeni, J. Nat. Cancer Inst., 77: 993-1000 (1986).
Example
Summary ORSummary OR
38.1
820363*71
785*38
36421*188
820137*249
7816*19
36482*73
Not Surprising!
MH assumptionsMH assumptions
OR or RR doesn’t vary across strata. (Homogeneity!)
If exposure/disease association does vary for different subgroups, then the summary OR or RR is not appropriate…
advantages and limitationsadvantages and limitationsadvantages…• Mantel-Haenszel summary statistic is easy to interpret
and calculate• Gives you a hands-on feel for the datadisadvantages…• Requires categorical confounders or continuous
confounders that have been divided into intervals • Cumbersome if more than a single confounder
To control for 1 and/or continuous confounders, a multivariate technique (such as logistic regression) is preferable.
Analysis of matched dataAnalysis of matched data
Pair Matching: Why match?Pair Matching: Why match?Pairing can control for extraneous sources
of variability and increase the power of a statistical test.
Match 1 control to 1 case based on potential confounders, such as age, gender, and smoking.
ExampleExample Johnson and Johnson (NEJM 287: 1122-1125,
1972) selected 85 Hodgkin’s patients who had a sibling of the same sex who was free of the disease and whose age was within 5 years of the patient’s…they presented the data as….
Hodgkin’s
Sib control
Tonsillectomy None
41 44
33 52
From John A. Rice, “Mathematical Statistics and Data Analysis.
OR=1.47; chi-square=1.53 (NS)
ExampleExample But several letters to the editor pointed out that
those investigators had made an error by ignoring the pairings. These are not independent samples because the sibs are paired…better to analyze data like this:
From John A. Rice, “Mathematical Statistics and Data Analysis.
OR=2.14; chi-square=2.91 (p=.09)
Tonsillectomy
None
Tonsillectomy None
37 7
15 26
Case
Control
Pair MatchingPair Matching
Match each MI case to an MI control based on age and gender.
Ask about history of diabetes to find out if diabetes increases your risk for MI.
Pair MatchingPair Matching
Diabetes
No diabetes
25 119
Diabetes No Diabetes
9 37
16 82
46
98
144
MI cases
MI controls
Each pair is it’s own “age-Each pair is it’s own “age-gender” stratumgender” stratum
Diabetes
No diabetes
Case (MI) Control
1 1
0 0
Example: Concordant for
exposure (cell “a” from before)
Diabetes
No diabetes
Case (MI) Control
1 1
0 0
Diabetes
No diabetes
Case (MI) Control
1 0
0 1
x 9
x 37
Diabetes
No diabetes
Case (MI) Control
0 1
1 0
Diabetes
No diabetes
Case (MI) Control
0 0
1 1
x 16
x 82
Mantel-Haenszel for pair-Mantel-Haenszel for pair-matched datamatched data
We want to know the relationship between diabetes and MI controlling for age and gender.
Mantel-Haenszel methods apply.
RECALL: The Mantel-Haenszel RECALL: The Mantel-Haenszel Summary Odds RatioSummary Odds Ratio
Exposed
Not Exposed
Case Control
a b
c d
k
i i
ii
k
i i
ii
T
cbT
da
1
1
Diabetes
No diabetes
Case (MI) Control
1 1
0 0
Diabetes
No diabetes
Case (MI) Control
1 0
0 1
ad/T = 0
bc/T=0
ad/T=1/2
bc/T=0
Diabetes
No diabetes
Case (MI) Control
0 1
1 0
Diabetes
No diabetes
Case (MI) Control
0 0
1 1
ad/T=0
bc/T=1/2
ad/T=0
bc/T=0
16
37
21
*16
21
37
2
2144
1
144
1
x
cb
da
OR
i
ii
i
ii
MH
Mantel-Haenszel Summary ORMantel-Haenszel Summary OR
Diabetes
No diabetes
25 119
Diabetes No Diabetes
9 37
16 82
46
98
144
MI cases
MI controls
OR estimate comes only from discordant pairs!!
OR= 37/16 = 2.31
Makes Sense!
McNemar’s TestMcNemar’s Test
Diabetes
No diabetes
25 119
Diabetes No Diabetes
9 37
16 82
46
98
144
MI cases
MI controls
OR estimate comes only from discordant pairs!
The question is: among the discordant pairs, what proportion are discordant in the direction of the case vs. the direction of the control. If more discordant pairs “favor” the case, this indicates OR>1.
Diabetes
No diabetes
25 119
Diabetes No Diabetes
9 37
16 82
46
98
144
MI cases
MI controls
P(“favors” case/discordant pair) =
53
37
1637
37ˆ
cb
bp
Diabetes
No diabetes
25 119
Diabetes No Diabetes
9 37
16 82
46
98
144
MI cases
MI controls
odds(“favors” case/discordant pair) =
16
37
c
bOR
Diabetes
No diabetes
Diabetes No Diabetes
9 37
16 82
MI casesMI controls
McNemar’s TestMcNemar’s Test
...)5(.)5(.39
53)5(.)5(.
38
53)5(.)5(.
37
53 143915381637
valuep
01.;88.264.3
5.10
)5)(.5(.53
)2
53(37
pZ
Null hypothesis: P(“favors” case / discordant pair) = .5(note: equivalent to OR=1.0 or cell b=cell c)
By normal approximation to binomial:
McNemar’s Test: generallyMcNemar’s Test: generally
cb
cb
cb
cb
cb
cbb
Z
4
22)5)(.5)(.(
)2
(
By normal approximation to binomial:
Equivalently:
cb
cb
cb
cb
2
221
)()(
exp
No exp
exp No exp
a b
c d
casescontrols
From: “Large outbreak of Salmonella enterica serotype paratyphi B infection caused by a goats' milk cheese, France, 1993: a case finding and epidemiological study” BMJ 312: 91-94; Jan 1996.
Example: Salmonella Example: Salmonella Outbreak in France, 1996Outbreak in France, 1996
Epidemic CurveEpidemic Curve
Matched Case Control StudyMatched Case Control Study
Case = Salmonella gastroenteritis.
Community controls (1:1) matched for: age group (< 1, 1-4, 5-14, 15-34, 35-44,
45-54, 55-64, or >= 65 years) gender city of residence
ResultsResults
In 2x2 table form: any goat’s In 2x2 table form: any goat’s cheesecheese
Goat’s cheese
None
29 30
Goat’ cheese None
23 23
6 7
46
13
59
Cases
Controls
8.36
23
c
bOR
In 2x2 table form: Brand B In 2x2 table form: Brand B Goat’s cheeseGoat’s cheese
Goat’s cheese B
None
10 49
Goat’ cheese B None
8 24
2 25
32
27
59
Cases
Controls
0.122
24
c
bOR