1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to...

22
1 Controlling False Controlling False Positive Rate Due to Positive Rate Due to Multiple Analyses Multiple Analyses Unstratified vs. Stratified Logrank Test Peiling Yang, Gang Chen, George Y.H. Chi DBI/OB/OPaSS/CDER/FDA The view expressed in this talk are those of the authors and may not necessarily represent those

Transcript of 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to...

Page 1: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

1

Controlling False Positive Rate Controlling False Positive Rate Due to Multiple AnalysesDue to Multiple Analyses

Unstratified vs. Stratified Logrank Test

Peiling Yang, Gang Chen, George Y.H. Chi

DBI/OB/OPaSS/CDER/FDA

The view expressed in this talk are those of the authors and may not necessarily represent those of the Food and Drug Administration.

Page 2: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

2

Motivation: Example of Drug X

Primary endpoint: Survival

Hypothesis: Overall constant H.R. 1 vs. >1Primary Analysis: Unstratified logrank

Results Observedstatistic

P-value(1-sided)

Unstratified 1.762 0.039Stratified 2.228 0.013

Q: Is this finding statistically significant?

Page 3: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

3

Issues to Explore

• Implication of these tests/analyses.

• Eligibility of efficacy claim based on these tests/analyses.

• Practicability of multiple testing/analyses.

Page 4: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

4

Outline

• Notations / Settings• Introduction to logrank test

– Unstratified, stratified

• Comparisons– Hypotheses, test statistic, test procedure, inference

• Practicability of hypotheses Testing• Multiple testing/analyses• Example of Drug X• Summary

Page 5: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

5

Settings / Notations

• 2 arms (control j=1; experimental: j=2).

• K strata: k=1, .., K

• Patients randomized within strata

• t1 < t2 < …< tD: distinct death times

• dijk: # of deaths & Yijk: # of patients at risk at death time ti, in jth arm & kth stratum.

Page 6: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

6

Settings / Notations

# o f d e a t h s a tt i m e t i

# o f p a t i e n t s a tr i s k a t t i m e t i

I n S t r a t u m k : 2i . k i j kj = 1d = d 2

. 1i k i j kjY Y

I n A r m j : Ki j . i j kk = 1d = d . 1

Ki j i j kkY Y

T o t a l : 2i . . i j .j = 1d = d 2

. . .1i i jjY Y

Page 7: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

7

Settings / Notations

• Hazard ratio (ctrl./exper.): constant– Across strata: c

– Within stratum: ck

• Non-informative censoring

Page 8: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

8

Introduction: Unstratified Logrank

1c u0H : v s . > 1cu

1H :

T e s t s t a t i s t i c : . 1 . . 1 .

. 1 .

[ ]

[ ]

uu

u

d E dW

V A R d

, w h e r e

. 1 .[ ]uE d = 1 .1 .

. .

ii

ii

dY

Y

. 1 .[ ]uV A R d = 1 . 2 . . . . .. .

. . . . . . 1i i i i

ii i ii

Y Y Y dd

Y Y Y

Page 9: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

9

Introduction: Unstratified Logrank

• Wu ~ N(0,1) under least favorable parameter configuration (c=1) in .

• Reject if Wu > z.

• Type I error rate is controlled at level .

0uH

0uH

Page 10: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

10

Introduction: Stratified Logrank

1kc s0H : f o r a l l k v s .

1kc s1H : f o r a t l e a s t o n e k .

T e s t s t a t i s t i c : . 1 . . 1 .

. 1 .

[ ]

[ ]

ss

s

d E dW

V A R d

, w h e r e

. 1 .[ ]sE d = 11

.

i ki k

i kk i

dY

Y

. 1 .[ ]sV A R d = 1 2 . ..

. . . 1i k i k i k i k

i ki k i k i kk i

Y Y Y dd

Y Y Y

Page 11: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

11

Introduction: Stratified Logrank

• Ws ~ N(0,1) under least favorable parameter configuration (ck = 1 for all k) in .

• Reject if Ws > z.

• Type I error rate is controlled at level .

0sH

0sH

Page 12: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

12

Comparison of Hypotheses

• Different hypotheses formulations:

– U nstratified :

0 : 1uH c vs . 1 : 1uH c

– S tratified :

: 1s0 kH c for a ll k vs.s1H : 1kc for a t least one k .

Page 13: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

13

Comparison of Test Statistics

• Corr(Wu, Ws) = 1 because of same r.v. d.1.

• Ws = a Wu + b, wherewhere

• Wu ~ N(0, 1) Ws ~ N(b, a2)

a .1.

.1.

[ ]

[ ]

u

sVar d

Var d & b .1. .1.

.1.

[ ] [ ]

[ ]

u s

s

Ed Ed

Var d

.

Page 14: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

14

Comparison of Test Procedure

To test 1c u0H : vs. > 1cu

1H :

– Use uW and reject u0H if uW > z.

– If use sW , adjusted critical value (az b )required for a valid level- test.

Page 15: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

15

Comparison of Test Procedure

T o t e s t 1kc s0H : f o r a l l k v s .

1kc s1H : f o r a t l e a s t o n e k .

– U s e sW a n d r e j e c t s0H i f sW > z .

– I f u s e uW , a d ju s t e d c r i t i c a l v a lu e ( ) /z b a r e q u i r e d f o r a v a l id l e v e l - t e s t .

Page 16: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

16

Comparison of Inference

• Rejection of : – Infer overall positive treatment effect in entire

population.

• Rejection of : – Can only infer positive treatment effect in "at least one

stratum".

– Further testing to identify those strata required to make claim & error rate for identifying wrong strata also needs to be controlled.

u0H

s0H

Page 17: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

17

Practicability of Hypotheses Testing

• Unstratified hypotheses are tested when desired to infer overall positive treatment effect in entire population.

• Stratified hypotheses are tested when desired to infer positive treatment effect in certain strata.

• Multiple testing of both unstratified & stratified hypotheses ok when not sure whether treatment is effective in entire population or certain strata (but both nulls need to be prespecified in protocol).

Page 18: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

18

Multiple Testing/Analyses

• Multiple testing unstratified (use Wu) & stratified (use Ws) hypotheses.

• Error to control: strong familywise error (SFE), including the following:– When c1 & all ck1: falsely infer c or some ck’s>1.

– When c1 & some ck’s>1: falsely infer c>1 or wrong ck’s>1

Note: parameter space of “all ck1 but c>1” impossible.

Page 19: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

19

Multiple Testing/Analyses

c1 & all ck 1

c>1 & at least one ck>1

impossible space

c1 & at least one ck>1

Property of SFE: FE nested in another FE.

FE

Which ck>1?

Nested FE

Page 20: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

20

Example -- Drug X

• Ws = aWu+b, where a = 1.039, b=0.409

• Critical value using Ws should be adjusted to az+b.

• False positive error rate using Ws w/o adjustment = 0.066; – Inflation = 0.066 - 0.025 = 0.041.

• Ans.: This finding is not statistically significant.

Logrank Test Observedstatistic

P-value(1-sided)

Unstratified Wu 1.762 0.039Stratified Ws 2.228 0.013

1cu0H: vs. > 1cu

1H:

for s0H

Page 21: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

21

Figure 1: False positive rate vs. desired level (w/o adjustment)

Page 22: 1 Controlling False Positive Rate Due to Multiple Analyses Controlling False Positive Rate Due to Multiple Analyses Unstratified vs. Stratified Logrank.

22

Summary

• Hypotheses (unstratified or stratified or both) – should reflect what is desired to claim.– need to be prespecified in protocol.

• If stratified null is rejected, further testing required to identify in which strata treatment effect is positive.

• Strong family error rate needs to be controlled regardless of single or multiple testing.