Analysis of matched data HRP 261 02/02/04 Chapter 9 Agresti – read sections 9.1 and 9.2.

Post on 01-Jan-2016

214 views 0 download

Tags:

Transcript of Analysis of matched data HRP 261 02/02/04 Chapter 9 Agresti – read sections 9.1 and 9.2.

Analysis of matched dataAnalysis of matched data

HRP 261 02/02/04HRP 261 02/02/04

Chapter 9 Agresti – read sections 9.1 and 9.2Chapter 9 Agresti – read sections 9.1 and 9.2

Pair Matching: Why match?Pair Matching: Why match?Pairing can control for extraneous sources

of variability and increase the power of a statistical test.

Match 1 control to 1 case based on potential confounders, such as age, gender, and smoking.

ExampleExample Johnson and Johnson (NEJM 287: 1122-1125,

1972) selected 85 Hodgkin’s patients who had a sibling of the same sex who was free of the disease and whose age was within 5 years of the patient’s…they presented the data as….

Hodgkin’s

Sib control

Tonsillectomy None

41 44

33 52

From John A. Rice, “Mathematical Statistics and Data Analysis.

OR=1.47; chi-square=1.53 (NS)

ExampleExample But several letters to the editor pointed out that

those investigators had made an error by ignoring the pairings. These are not independent samples because the sibs are paired…better to analyze data like this:

From John A. Rice, “Mathematical Statistics and Data Analysis.

OR=2.14; chi-square=2.91 (p=.09)

Tonsillectomy

None

Tonsillectomy None

37 7

15 26

Case

Control

Pair Matching: Pair Matching: Agresti Agresti exampleexample

Match each MI case to an MI control based on age and gender.

Ask about history of diabetes to find out if diabetes increases your risk for MI.

Pair Matching: Pair Matching: AgrestiAgresti example example

Which cells are informative?

Just the discordant cells are informative!

Diabetes

No diabetes

25 119

Diabetes No Diabetes

9 37

16 82

46

98

144

MI cases

MI controls

Pair MatchingPair Matching

Diabetes

No diabetes

25 119

Diabetes No Diabetes

9 37

16 82

46

98

144

MI cases

MI controls

OR estimate comes only from discordant pairs!

The question is: among the discordant pairs, what proportion are discordant in the direction of the case vs. the direction of the control. If more discordant pairs “favor” the case, this indicates OR>1.

Diabetes

No diabetes

25 119

Diabetes No Diabetes

9 37

16 82

46

98

144

MI cases

MI controls

P(“favors” case/discordant pair) =

)~/(*)/(~)~/(~*)/(

)~/(~*)/(

DEPDEPDEPDEP

DEPDEP

=the probability of observing a case-control pair with only the control exposed

=the probability of observing a case-control pair with only the case exposed

Diabetes

No diabetes

25 119

Diabetes No Diabetes

9 37

16 82

46

98

144

MI cases

MI controls

P(“favors” case/discordant pair) =

53

37

1637

37ˆ

cb

bp

Diabetes

No diabetes

25 119

Diabetes No Diabetes

9 37

16 82

46

98

144

MI cases

MI controls

odds(“favors” case/discordant pair) =

16

37

c

bOR

Diabetes

No diabetes

25 119

Diabetes No Diabetes

9 37

16 82

46

98

144

MI cases

MI controls

OR estimate comes only from discordant pairs!!

OR= 37/16 = 2.31

Makes Sense!

Diabetes

No diabetes

Diabetes No Diabetes

9 37

16 82

MI casesMI controls

McNemar’s TestMcNemar’s Test

...)5(.)5(.39

53)5(.)5(.

38

53)5(.)5(.

37

53 143915381637

valuep

01.;88.264.3

5.10

)5)(.5(.53

)2

53(37

pZ

Null hypothesis: P(“favors” case / discordant pair) = .5(note: equivalent to OR=1.0 or cell b=cell c)

By normal approximation to binomial:

McNemar’s Test: generallyMcNemar’s Test: generally

cb

cb

cb

cb

cb

cbb

Z

4

22)5)(.5)(.(

)2

(

By normal approximation to binomial:

Equivalently:

cb

cb

cb

cb

2

221

)()(

exp

No exp

exp No exp

a b

c d

casescontrols

95% CI for difference in 95% CI for difference in dependent proportionsdependent proportions

Diabetes

No diabetes

25 119

Diabetes No Diabetes

9 37

16 82

46

98

144

MI cases

MI controls

24.05.)0024.(96.115.17.- 32. : CI %95

0024.144

)11.*26.57.*06(.2)83)(.17(.)68)(.32(.

),(2)1()1(

)(

),(2)()()(

~//~/~///

~//

212121

DEDE

controlscases

DEDE

controlscases

DEDE

DEDE

ppCovn

pp

n

pp

ppVar

ppCovpVarpVarppVar

Each pair is it’s own “age-Each pair is it’s own “age-gender” stratumgender” stratum

Diabetes

No diabetes

Case (MI) Control

1 1

0 0

Example: Concordant for

exposure (cell “a” from before)

Diabetes

No diabetes

Case (MI) Control

1 1

0 0

Diabetes

No diabetes

Case (MI) Control

1 0

0 1

x 9

x 37

Diabetes

No diabetes

Case (MI) Control

0 1

1 0

Diabetes

No diabetes

Case (MI) Control

0 0

1 1

x 16

x 82

Mantel-Haenszel for pair-Mantel-Haenszel for pair-matched datamatched data

We want to know the relationship between diabetes and MI controlling for age and gender.

Mantel-Haenszel methods apply.

RECALL: The Mantel-Haenszel RECALL: The Mantel-Haenszel Summary Odds RatioSummary Odds Ratio

Exposed

Not Exposed

Case Control

a b

c d

k

i i

ii

k

i i

ii

T

cbT

da

1

1

Diabetes

No diabetes

Case (MI) Control

1 1

0 0

Diabetes

No diabetes

Case (MI) Control

1 0

0 1

ad/T = 0

bc/T=0

ad/T=1/2

bc/T=0

Diabetes

No diabetes

Case (MI) Control

0 1

1 0

Diabetes

No diabetes

Case (MI) Control

0 0

1 1

ad/T=0

bc/T=1/2

ad/T=0

bc/T=0

16

37

21

*16

21

37

2

2144

1

144

1

x

cb

da

OR

i

ii

i

ii

MH

Mantel-Haenszel Summary ORMantel-Haenszel Summary OR

Mantel-Haenszel Test StatisticMantel-Haenszel Test Statistic(same as McNemar’s)(same as McNemar’s)

cb

cb

cb

cbCMH

nVar

nn

nnnnVar

n

nn

cellsdisc

cellsdisccon cellsdisccase

k

kk

kkkk

k

kk

22

.

..

2

.

21111k

22211

11k

1111k

)(

)25)(.(

)](5.)(5[.

25.

]5.5.[

4

1

)12(2

)1)(1)(1)(1()(;

2

1

2

)1)(1(

:cells discordant

0 contribute cells Concordant

)1()(n

)E(n :recall

From: “Large outbreak of Salmonella enterica serotype paratyphi B infection caused by a goats' milk cheese, France, 1993: a case finding and epidemiological study” BMJ 312: 91-94; Jan 1996.

Example: Salmonella Example: Salmonella Outbreak in France, 1996Outbreak in France, 1996

Epidemic CurveEpidemic Curve

Matched Case Control StudyMatched Case Control Study

Case = Salmonella gastroenteritis.

Community controls (1:1) matched for: age group (< 1, 1-4, 5-14, 15-34, 35-44,

45-54, 55-64, or >= 65 years) gender city of residence

ResultsResults

In 2x2 table form: any goat’s In 2x2 table form: any goat’s cheesecheese

Goat’s cheese

None

29 30

Goat’ cheese None

23 23

6 7

46

13

59

Cases

Controls

8.36

23

c

bOR

In 2x2 table form: Brand B In 2x2 table form: Brand B Goat’s cheeseGoat’s cheese

Goat’s cheese B

None

10 49

Goat’ cheese B None

8 24

2 25

32

27

59

Cases

Controls

0.122

24

c

bOR

Brand B

None

Case (MI) Control

1 1

0 0

Brand B

None

Case (MI) Control

1 0

0 1

Brand B

None

Case (MI) Control

0 1

1 0

Brand B

None

Case (MI) Control

0 0

1 1

x8

x24

x2

x25

0)12(4

1*0*1*2

)1()(n

011)n(

12

1*2)E(n :exposed concordant 8

22211

11k

11k11k

1111k11k

kk

kkkk

k

kk

nn

nnnnVar

Observed

n

nn

0)12(4

1*2*1*0

)1()(n

000)n(

02

1*0)E(n :unexposed concordant 25

22211

11k

11k11k

1111k11k

kk

kkkk

k

kk

nn

nnnnVar

Observed

n

nn

Summary: 8 concordant-exposed pairs (=strata) contribute nothing to the numerator (observed-expected=0) and nothing to the denominator (variance=0).

Summary: 25 concordant-unexposed pairs contribute nothing to the numerator (observed-expected=0) and nothing to the denominator (variance=0).

Summary: 2 discordant “control-exposed” pairs contribute -.5 each to the numerator (observed-expected= -.5) and .25 each to the denominator (variance= .25).

4

1

)12(4

1*1*1*1

)1()(n

5.5.1)n(2

1

2

)1)(1( :casefavor cells discordant 24

22211

11k

11k11k

11k

kk

kkkk

nn

nnnnVar

Observed

4

1

)12(4

1*1*1*1

)1()(n

5.5.0)n(2

1

2

)1)(1(:controlfavor cells discordant 2

22211

11k

11k11k

11k

kk

kkkk

nn

nnnnVar

Observed

Summary: 24 discordant “case-exposed” pairs contribute +.5 each to the numerator (observed-expected= +.5) and .25 each to the denominator (variance= .25).

cb

cb

CMH

2222

2

)(

26

)224(

26

22

)25(.26

)25(.22

)25(.2)25(.2400

)]5(.2)5(.24)0(25)0(8[

M:1 matched studiesM:1 matched studies

One-to-one pair matching provides the most cost-effective design when cases and controls are equally scarce.

But when cases are the limiting factor, as with rare diseases, statistical power may be increased by selecting more than 1 control matched to each case.

But with diminishing returns…

M:1 matched studiesM:1 matched studies

2:1 matched study of colorectal cancer. Background: Carcinoembryonic antigen (CEA) is

the classical tumor marker for colorectal cancer. This study investigated whether the plasma levels of carcinoembryonic antigen and/or CA 242 were elevated BEFORE clinical diagnosis of colorectal cancer.

From: Palmqvist R et al. Prediagnostic Levels of Carcinoembryonic Antigen and CA 242 in Colorectal Cancer: A Matched Case-Control Study. Diseases of the Colon & Rectum. 46(11):1538-1544, November 2003.

M:1 matched studiesM:1 matched studies Prediagnostic Levels of Carcinoembryonic Antigen and CA Prediagnostic Levels of Carcinoembryonic Antigen and CA

242 in Colorectal Cancer: A Matched Case-Control Study242 in Colorectal Cancer: A Matched Case-Control Study

Study design: A so-called “nested case-control study.”Idea: Study subjects who were members of an

ongoing prospective cohort study in Sweden had given blood at baseline, when they had no disease. Years later, blood can be thawed and tested for the presence of prediagnostic antigens.

Key innovation: The cohort is large, the disease is rare, and it’s too costly to test everyone’s blood; so only test stored blood of cases and matched controls from the cohort.

M:1 matched studiesM:1 matched studies

Two cancer-free controls were randomly selected to each case from the corresponding cohort at the time of diagnosis of the matched case.

Matched for: Gender age at recruitment (±12 months) date of blood sampling ±2 months fasting time (<4 hours, 4–8 hours, >8 hours).

2:1 matching:2:1 matching:

•stratum=matching groupstratum=matching group

•3 subjects per stratum3 subjects per stratum

•6 possible 2x2 tables…6 possible 2x2 tables…

CEA +

CEA -

Case (CRC) Controls

1 1

0 1

CEA +

CEA -

Case (CRC) Controls

1 2

0 0

Everyone exposed; non-informative

Case exposed; 1 control unexposed

CEA +

CEA -

Case (CRC) Controls

1 0

0 2Case exposed; both controls unexposed

CEA +

CEA -

Case (CRC) Controls

0 1

1 1

CEA +

CEA -

Case (CRC) Controls

0 2

1 0

Case unexposed; both controls exposed

Case unexposed; 1 control exposed

CEA +

CEA -

Case (CRC) Controls

0 0

1 2

Everyone unexposed; non-informative

CEA +

CEA -

Case (CRC) Controls

1 1

0 1

CEA +

CEA -

Case (CRC) Controls

1 2

0 00

2

CEA +

CEA -

Case (CRC) Controls

1 0

0 212

CEA +

CEA -

Case (CRC) Controls

0 1

1 1

CEA +

CEA -

Case (CRC) Controls

0 2

1 0

0

1

CEA +

CEA -

Case (CRC) Controls

0 0

1 2102

CEA +

CEA -

Case (CRC) Controls

1 1

0 1

CEA +

CEA -

Case (CRC) Controls

1 0

0 2

CEA +

CEA -

Case (CRC) Controls

0 2

1 0

2 Tables with 2 exposed

13 Tables with 1 exposed

CEA +

CEA -

Case (CRC) Controls

0 1

1 1

2

2

1

1

Represents all possible

discordant tables (either 2 or 1 total exposed)

CEA +

CEA -

Case (CRC) Controls

1 1

0 1

CEA +

CEA -

Case (CRC) Controls

0 2

1 0

2 Tables with 2 exposed

2

2

)1()() tablesecond(

)1()1()efirst tabl(

~/~//

~/~//

2

1

022

2

DEDEDE

DEDEDE

pppP

pppP

)1()()1(

)1()(

)exposed total2exposed/ case(

~/~//~//

~/~//

2

1

22

2

2

1

DEDEDEDEDE

DEDEDE

ppppp

ppp

P

12

2

2

2

2

2

)1(2)()1(

)1(2)(

)1()()1(

)1()(

~//

~/~/

~//

~//

~//

~/~/

~/~/~//

~/~/

~//~//

~//

~/~//~//

~/~//

~~

~

~

~

2

1

22

2

2

1

OR

OR

pp

pp

pp

pp

pp

pp

pppp

pp

pppp

pp

ppppp

ppp

DEDE

DEDE

DEDE

DEDE

DEDE

DEDE

DEDEDEDE

DEDE

DEDEDEDE

DEDE

DEDEDEDEDE

DEDEDE

CEA +

CEA -

Case (CRC) Controls

0 1

1 1

CEA +

CEA -

Case (CRC) Controls

1 0

0 2

13 Tables with 1 exposed

1

1

)1()1() tablesecond(

)1()efirst tabl(

~/~//

~/~//

2

1

202

0

DEDEDE

DEDEDE

pppP

pppP

)1()1()1(

)1(

)exposed total1exposed/ case(

~/~//~//

~//

2

1

22

0

22

0

DEDEDEDEDE

DEDE

ppppp

pp

P

22

2

2

)1()1()1(

)1(

~//

~//

~//

~//

~//

~//

~//~//

~//

~/~//~//

~//

~/~//~//

~//

~

~

~

~

~

~

~~

~

~~2

~

2~

2

1

22

0

22

0

OR

OR

pp

pp

pp

pp

pp

pp

pppp

pp

ppppp

pp

ppppp

pp

DEDE

DEDE

DEDE

DEDE

DEDE

DEDE

DEDEDEDE

DEDE

DEDEDEDEDE

DEDE

DEDEDEDEDE

DEDE

SummarySummary

P(case exposed/2 total exposed)=2OR/(2OR+1) P(case unexposed/2 total exposed)=1-2OR/(2OR+1) P(case exposed/1 total exposed) = OR/(OR+2) P(case unexposed/1 total exposed)= 1-OR/(OR+2)

Therefore, we can make a likelihood equation for our data that is a function of the OR, and use MLE to solve for OR

Applying to example dataApplying to example data

11202

11202

)2

2()

2()

12

1()

12

2(

)2

1()2

()12

21()

12

2()/(

OROR

OR

OROR

OROR

OR

OR

OR

OR

OR

OR

ORORdataP

A little complicated to solve further…

Applying to example dataApplying to example data

BD give a more simple robust estimate of OR for 2:1 matching:

0.26)1(1)0(2

)12(2)2(1

exposed) control & exposed total1 where1(#exposed) controls 2 & exposed total2 where#(2

exposed) case & exposed total1 where2(#exposed) case & exposed total2 where#(1

OR