Comparing Margins of M ultivariate B inary D ata

21
Comparing Margins of Multivariate Binary Data Bernhard Klingenberg Assoc. Prof. of Statistics Williams College, MA www.williams.edu/~bklingen

description

Comparing Margins of M ultivariate B inary D ata. Bernhard Klingenberg Assoc. Prof. of Statistics Williams College, MA www.williams.edu/~bklingen. Challenges: Associations of various degrees among binary variables Simultaneous Inference - PowerPoint PPT Presentation

Transcript of Comparing Margins of M ultivariate B inary D ata

Page 1: Comparing Margins of  M ultivariate  B inary  D ata

Comparing Marginsof Multivariate Binary Data

Bernhard KlingenbergAssoc. Prof. of Statistics

Williams College, MA

www.williams.edu/~bklingen

Page 2: Comparing Margins of  M ultivariate  B inary  D ata

Outline

Challenges:

• Associations of various degrees among binary variables• Simultaneous Inference• Sparse and/or unbalanced data, Test statistics with discrete

support• Asymptotic theory questionable

Setup:• Two indep. groups• Response: Vector of k

correlated binary variables (multivariate binary)

Goal:• Inference about k margins:

Marginal Risk Differences Marginal Risk Ratios

Page 3: Comparing Margins of  M ultivariate  B inary  D ata

Outline Motivating Examples

From drug safety or animal toxicity/carcinogenicity studies

Source: http://us.gsk.com/products/assets/us_advair.pdf

Page 4: Comparing Margins of  M ultivariate  B inary  D ata

Source: http://www.pfizer.com/files/products/uspi_lipitor.pdf

Page 5: Comparing Margins of  M ultivariate  B inary  D ata

Outline Example: AEs from a vaccine trial (flu shot):

> head(Y1) # ACTIVE Treatment n1=1971ID HEADACHE PAIN MYALGIA ARTHRALGIA MALAISE FATIGUE CHILLS2 1 1 1 1 1 1 14 0 1 1 0 0 1 05 1 0 0 0 0 0 06 1 1 1 1 1 1 17 0 0 0 0 0 1 09 1 0 1 1 1 1 1> head(Y2) # PLACEBO Treatment n2=1554ID HEADACHE PAIN MYALGIA ARTHRALGIA MALAISE FATIGUE CHILLS1 0 0 0 0 0 0 03 0 0 0 0 0 0 08 0 0 0 0 1 0 010 0 0 0 0 0 0 011 0 0 0 0 0 0 015 0 0 1 0 0 1 0

Page 6: Comparing Margins of  M ultivariate  B inary  D ata

Notation and Setupk-dimensional response vectors:

Group 1 Group 2

Random sample in each group:

Group 1 Group 2

Joint distrib. in each group depends on 2k-1 parameters Group 1 Group 2

),...,( 111 kYY1Y ),...,( 221 kYY2Y

11n11 YY ,,22n21 YY ,,

},{ ),,Pr( ),,Pr( 1021211111 jkkkk aaYaYaYaY

Page 7: Comparing Margins of  M ultivariate  B inary  D ata

Comparing Margins Usually only interested in k margins. Group 1

Group 2

With just two (k=2) adverse events:

Group 1 Group 2

kjYY jj ,, all for )Pr( )Pr( 111 21

No Yes

No Yes

Headache

Pain

No Yes

No Yes

Headache

Pain

Page 8: Comparing Margins of  M ultivariate  B inary  D ata

Comparing Margins

Group1 Group2 Diff

HEADACHE 0.2603 0.2407 0.0196

INJECTION SITE PAIN 0.6088 0.1384 0.4705

MYALGIA 0.2588 0.1088 0.1500

ARTHRALGIA 0.0893 0.0579 0.0314

MALAISE 0.2085 0.1332 0.0753

FATIGUE 0.2476 0.2098 0.0378

CHILLS 0.0928 0.0463 0.0465

Differences in marginal incidence rates between Group 1 (Treatment) and Group 2 (Control)

Page 9: Comparing Margins of  M ultivariate  B inary  D ata

Family of Tests j-th Null Hypothesis:

Unrestricted and restricted MLEs:

Page 10: Comparing Margins of  M ultivariate  B inary  D ata

Comparing Margins Estimates of marginal incidence rates and test statistics

comparing Group 1 (Treatment) and Group 2 (Control)

p-hat1 p-hat2 p-check p-tilde Wald Local GlobalHEADACHE 0.260 0.241 0.252 0.260 1.34 1.33 1.32

PAIN 0.609 0.138 0.401 0.405 33.47 28.29 28.26MYALGIA 0.259 0.109 0.193 0.210 11.87 11.21 10.85ARTHRALGIA 0.089 0.058 0.076 0.082 3.59 3.50 3.37MALAISE 0.209 0.133 0.175 0.196 5.99 5.84 5.60FATIGUE 0.248 0.210 0.231 0.244 2.66 2.64 2.59

CHILLS 0.093 0.046 0.072 0.085 5.51 5.29 4.93

Page 11: Comparing Margins of  M ultivariate  B inary  D ata

Asymptotic Test

Note: Asymptotically, multivariate

normal with covariance matrix determined by

Page 12: Comparing Margins of  M ultivariate  B inary  D ata

Asymptotic Test Correlation Matrix:

> round(cov2cor(Sigma),2) d1 d2 d3 d4 d5 d6 d7d1 1.00 0.04 0.29 0.26 0.38 0.41 0.27d2 1.00 0.18 0.09 0.08 0.10 0.01d3 1.00 0.46 0.35 0.36 0.30d4 1.00 0.33 0.33 0.32d5 1.00 0.51 0.44d6 1.00 0.37d7 1.00> qmvnorm(0.95, tail="both.tails", corr=cov2cor(Sigma))$quantile[1] 2.656222

Page 13: Comparing Margins of  M ultivariate  B inary  D ata

Asymptotic Test Correlation Matrix:

> round(cov2cor(Sigma),2) d1 d2 d3 d4 d5 d6 d7d1 1.00 0.06 0.33 0.28 0.41 0.41 0.29d2 1.00 0.28 0.11 0.15 0.12 0.09d3 1.00 0.46 0.41 0.36 0.35d4 1.00 0.32 0.34 0.28d5 1.00 0.50 0.47d6 1.00 0.37d7 1.00

> qmvnorm(0.95, tail="both.tails", corr=cov2cor(Sigma))$quantile[1] 2.653783

Page 14: Comparing Margins of  M ultivariate  B inary  D ata

Permutation Approach When testing

can use Permutation ApproachThis assumes distributions are exchangeable

(i.e. identical), much stronger assumption than under null

Need two extra conditions:i. Sequences of all 0's as or more likely to

occur under group 2 (Control)ii. Sequence of all 1's as or more likely to

occur under group 1 (Treatment)

Page 15: Comparing Margins of  M ultivariate  B inary  D ata

Permutation vs. AsymptoticPermutation vs. asymptotic distribution of

Critical Value:(a = 0.05)cperm = 2.655casympt = 2.654cBonf = 2.690

Permut. Distr.

Asympt. Distr.

Page 16: Comparing Margins of  M ultivariate  B inary  D ata

Family of Tests Results: Raw and Adjusted P-values

asymptotic exact Diff Global raw.P adj.P raw.P adj.P

HEADACHE 0.020 1.32 0.1876 0.7061 0.1830 0.7013

PAIN 0.471 28.25 0.0000 0.0000 0.0000 0.0000MYALGIA 0.150 10.85 0.0000 0.0000 0.0000 0.0000ARTHRALGIA 0.031 3.37 0.0007 0.0051 0.0005 0.0032MALAISE 0.075 5.60 0.0000 0.0000 0.0000 0.0000FATIGUE 0.038 2.59 0.0094 0.0589 0.0082 0.0516

CHILLS 0.047 4.93 0.0000 0.0000 0.0000 0.0000

Page 17: Comparing Margins of  M ultivariate  B inary  D ata

Simultaneous Confidence Intervals Invert family of tests:Confidence Region: Simplifies to simultaneous confidence

intervals if

Page 18: Comparing Margins of  M ultivariate  B inary  D ata

Simultaneous Confidence Intervals Results: Inverting Score test

diff LB UB

HEADACHE 0.0196 -0.0196 0.0583

PAIN 0.4705 0.4323 0.5069

MYALGIA 0.1500 0.1162 0.1835

ARTHRALGIA 0.0314 0.0078 0.0547

MALAISE 0.0753 0.0416 0.1086

FATIGUE 0.0378 -0.0002 0.0752

CHILLS 0.0465 0.0239 0.0692

Page 19: Comparing Margins of  M ultivariate  B inary  D ata

Simultaneous Confidence Intervals We used (and recommend) score statistic Could use Wald statistic instead This is equivalent to fitting marginal model via

GEE:

asympt. multiv. normal, with (sandwich) covariance matrix (same as before)

Use distribution of for multiplicity adjustment

Page 20: Comparing Margins of  M ultivariate  B inary  D ata

Simultaneous Confidence Intervals Results: GEE approach (= inverting Wald test)

diff LB UB

HEADACHE 0.0196 -0.0194 0.0586

PAIN 0.4705 0.4331 0.5078

MYALGIA 0.1500 0.1164 0.1836

ARTHRALGIA 0.0314 0.0082 0.0546

MALAISE 0.0753 0.0419 0.1087

FATIGUE 0.0378 0.0001 0.0755

CHILLS 0.0465 0.0241 0.0689

Page 21: Comparing Margins of  M ultivariate  B inary  D ata