Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of...

41
Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant Breast and Bowel Project (NSABP) University of Pittsburgh, Department of Statistics Joint Work With John Bryant, PhD Director of the NSABP University of Pittsburgh, Departments of

Transcript of Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of...

Page 1: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Controlling the Experimentwise Type I Error Rate When Survival

Analyses Are Planned for Subsets of the Sample.

Greg Yothers, MANational Surgical Adjuvant Breast and Bowel Project (NSABP)

University of Pittsburgh, Department of Statistics

Joint Work With John Bryant, PhD

Director of the NSABP

University of Pittsburgh, Departments of Statistics and Biostatistics

Page 2: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

• This work concerns the design and analysis of clinical trials to compare treatment to control where we wish to test the primary hypothesis on several subgroups in addition to the global test.

• Unless steps are taken to control for multiple comparisons, the type I error rate will be inflated in this situation.

• Controlling for multiple comparisons generally leads to a loss of power so that subgroup analyses are often avoided. However, subgroup analyses often serve a legitimate scientific purpose, and should not be entirely avoided.

Page 3: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

• To address this problem, we propose a method whereby a pre-specified experimentwise alpha is “spent” or allocated among the global (stratified) test and the constituent subset (stratum-level) tests.

• We find the method to be efficient in terms of experimentwise power when the treatment effect in each stratum is in the same direction and the magnitude of the range of treatment effects between strata is not too great.

• The procedure can be used to make the design of a clinical trial robust against the presence of a treatment by strata interaction when a significant interaction is not anticipated.

Page 4: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Outline• Motivating Example - NSABP Protocol B-29.• Define Experimentwise Type I Error Rate .• Common methods of dealing with subgroup testing:

How do they control Type I error rate?• Multiple testing approach: Perform all tests at reduced

nominal levels of significance so that the experimentwise Type I error rate is controlled.

• Exploration of how to spend alpha on the individual tests to achieve ‘good’ operating characteristics for the overall experiment.

Page 5: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

NSABP B-29 SchemaT1 or T2 or T3; pN0; M0

ER-Positive

Decision to useChemotherapy*

No Chemotherapy Chemotherapy

Stratification• Age• Pathologic Tumor Size

Stratification• Age• Pathologic Tumor Size

TamoxifenTamoxifen

+Octreotide

AC+

Tamoxifen

AC+

Tamoxifen+

OctreotideGroup 1 Group 2 Group 3 Group 4

* The decision to use AC chemotherapy must be made prior to randomization.

Page 6: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Design Considerations

• H0: Relative Risk = 1,

• Power .8 to detect Relative Risk .75, using a .05-level two-sided stratified log-rank test.

• Power requirements and assumptions about rates of accrual dictate the following:

i) Accrual of 3,000 patients over 5 years with 3 years additional follow-up.

ii) Final analysis following the 400th event.

Page 7: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

• Physicians involved in the design of the trial thought the effect of Octreotide would be unlikely to materially interact with chemotherapy status.

• In planning the trial it was felt to be important to provide for individual tests for the effect of Octreotide in the presence of chemotherapy as well as in its absence.

• It was considered unacceptable to treat these subgroup analyses as post-hoc, or exploratory, so it was necessary to design an analysis plan that controlled for the experimentwise error rate.

Page 8: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Definition

Experimentwise Type I Error Rate

The probability of finding a significant difference between treatment and control on either the overall stratified test or any of the

stratum-specific tests given that no difference exists.

Page 9: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Common approaches to controlling experimentwise Type I error rate

• Unprotected Subgroup Tests – Perform the overall stratified test at level ; follow-up with stratum-specific level tests.

• Protected Subgroup Tests – Perform the overall stratified test at level ; follow-up with stratum-specific level tests only if treatment-by-strata interaction is significant.

• Protected Subgroup Tests – Test for treatment-by-strata interaction at level . If interaction is significant, test for treatment effect individually in each stratum at level . If interaction is not significant, test for overall treatment effect at level .

Page 10: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

• The two alternatives for protecting the stratum specific tests are actually quite similar in operating characteristics, since if both interaction test and the overall stratified test are significant, it is almost certain that at least one stratum level test will also be significant.

• It can be shown that this is true with probability one in the case of k = 2 strata.

Equivalence of Protection Schemes

Page 11: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Experimentwise Level of Significance

0.0775

0.0825

0.0875

0.0925

0.0975

0.1025

0.1075

0.1125

0.1175

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

a = "The proportion of events in stratum 1"

Exp

erim

entw

ise

Lev

el o

f Sig

nifica

nce

Unprotected

Protected

Page 12: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Range of experimentwise type I error rate for protected and unprotected schemes. All

tests performed at = .05

Number of Strata Unprotected Tests Protected Tests 2 .098-.115 .080-.098 3 .143-.161 .090-.098 4 .185-.204 .094-.098 5 .226-.245 .095-.098 7 .302-.319 .096-.098

10 .401-.416 .097-.098

Page 13: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Multiple testing approach

• We now consider a multiple testing approach where one performs an overall test for treatment effect based on the stratified log-rank statistic followed by tests within each stratum.

• All tests are carried out at reduced levels of significance so that the experimentwise level of significance is maintained at a specified rate.

Page 14: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

1 2

1 2 1 2

0 1 2

1

and - the log-rank statistics from the individual stratum tests.

and - be the variances of the log-rank statistics and .

+ - the stratified log-rank statistic.

Then, since

L L

V V L L

L L L

L

2 0 1 2

0 0

0

1 2

and are independent, = + .

- the nominal level of significance of the test based on .

- the corresponding critical value from the standard normal distribution.

and - the nomina

L V V V

L

c

1 2

1 2

l levels of significance of the tests based on and .

and - the corresponding critical values from the standard normal dist.

Now, , 0,1,2 represents the standardized log-rank statisti

i

i

L L

c c

LZ i

V ics.

Page 15: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

0 1 2 0 0 1 1 2 2L L L Z V Z V Z V

0 1 2

1

1 2

1

where , 0 1

Z Z a Z a

Va a

V V

Page 16: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

H 0 : 1 , = 1 , 2iR R i d e n o t e s t h e n u l l h y p o t h e s i s

1 1 2 2 a n d A l t A l tR R R R R R R R a s p e c i f i c a l t e r n a t i v e h y p o t h e s i s . T h e n , w h e n t h e a l t e r n a t i v e h y p o t h e s i s i s t r u e , t h e t e s t s t a t i s t i c

ii

i

LZ

V i s a s y m p t o t i c a l l y d i s t r i b u t e d a s N l n , 1A l t

i iR R V .

L e t l n A l ti i iR R V , t h e n

l n A l tii i i i i

i

LW Z R R V

V

i s d i s t r i b u t e d a s s t a n d a r d n o r m a l u n d e r t h e a l t e r n a t i v e h y p o t h e s i s .

Let RRi represent the relative risk in the ith stratum.

Page 17: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Experimentwise Power

The probability of detecting at least one significant difference during the multiple testing procedure

given the true RR in each stratum. When the true RR in each stratum is 1, we refer to the power as the

Type I error rate.

Definition

Page 18: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

The experimentwise power against a specific alternate hypothesis can be written as:

0 1 2 1 2 1 1 2 2 1 2Power , , , , , 1a z z dz dz

Where denotes the standard normal density, and the integral is taken over the acceptance region defined by:

1 2 1 1 2 2 0 0, : , ,z z z c z c z c

Page 19: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Using the simplified region of integration, we can rewrite the power as follows:

2

2

0 1 2 1 2

0 21 1 1

2 2 2

0 21 1 1

Power , , , , ,

1max ,min ,

11

min ,max ,

c

c

a

c z ac c

az dz

c z ac c

a

Where is the CDF of the standard normal distribution.

Page 20: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

These results generalize to k strata as follows:

01 1 1

, where , 0 1, 1k k k

i i i i j i ji j j

Z Z a a V V a a

2

2

0 1 1 1

2 2

021 1

1

1

021 1

1

1

Power , , , , , , , , ,

1

max ,min ,

min ,max ,

k

k

k k k

c c

k k

c c

k

i ii

k

i ii

a a

z z

c z ac c

a

c z ac c

a

2 kdz dz

Page 21: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

• The multiple integral in the previous equation can be difficult to evaluate when the number of strata goes beyond about 3 or 4.

• Fortunately there is a recursive representation of the power function that facilitates computation when there are many strata.

Page 22: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

0 1 1 1G i v e n , , , , , , , , , ,k k ka a

1

1

1

1 1 1 1 1 1 1

1 1 1

d e f i n e P r , 1 , , ,

t h e n , m a x , m i n ,

a n d , r

r

r

r j j i ii

c

r r r rc

z z c j r z a z

z c c z a c

z u z u a d u

F o r k s t r a t a ,

0 1 1 1 0 0P o w e r , , , , , , , , , 1k k k k ka a c c

An S-Plus function implementing the recursive method of calculating power is available.

Page 23: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

How should we spend alpha?• The question arises as to how the type I error rates should be

divided between the overall and the stratum-specific tests, or rather, how much alpha should be spent on the stratum-specific tests.

• For k = 2 strata and exper = 0.05, the table and figure which

follow show a variety of combinations of the nominal size of the overall test (0) and the nominal size of the within stratum tests

(1 & 2).

• For simplicity, we only consider the case where 1 = 2. The

possibilities form a continuum between (.05, 0) (no stratum specific tests) to (0, .0253) (no overall test).

• Given exper, 0, and the constraint 1 = 2, the common value of

1 & 2 is a function of a (the proportion of events in the first

stratum), however the effect of varying a is weak.

Page 24: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Possible -spending schemes for k = 2 strata, exper = .05, and 1 = 2. a = 0.50 a = 0.25 a = 0.10

0

1 = 2 0

1 = 2 0

1 = 2 .050 .0000 .050 .0000 .050 .0000 .045 .0060 .045 .0060 .045 .0057 .040 .0099 .040 .0104 .040 .0108 .030 .0161 .030 .0168 .030 .0183 .020 .0207 .020 .0214 .020 .0229 .010 .0240 .010 .0244 .010 .0250 .000 .0253 .000 .0253 .000 .0253

Page 25: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Possible -spending schemes for k = 2 strata, exper = .05, and 1 = 2.

0

0.005

0.01

0.015

0.02

0.025

0 0.01 0.02 0.03 0.04 0.05

Size of Overall Test

Size

of S

trat

um L

evel

Tes

ts

a = .5

a = .25

a = .1

Page 26: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Experimentwise Power (1 = 2, a = .5) & Power of Overall Stratified Test

0.6

0.65

0.7

0.75

0.8

0.85

0.01 0.02 0.03 0.04 0.05

Size of Overall Test

Pow

er

Experimentwise PowerPower of Overall Test

Baseline Overall Power

Page 27: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Now we see how power is affected when there is no difference between strata and some of our alpha is spent on stratum specific tests. This figure shows the case where there is an overall 25% reduction in event rate, no treatment-stratum interaction, and there are 200 events in each of the k = 2 strata. The overall stratified log-rank test at the .05 level has power 0.82. Using the multiple testing procedure with

0 1 20.04 and 0.099 yields a power of 0.79 for the

overall test and an experimentwise power of 0.80! Thus, spending 1% of total alpha (setting 0 0.04 ) leads to a

very small loss of power even in the case of no interaction. Setting 0 less than about 0.03, on the other hand, leads to a

rather substantial loss in power.

Page 28: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Experimentwise Power (1 = 2) & Power of Overall Test

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0.01 0.02 0.03 0.04 0.05

Size of Overall Test

Pow

er

Experimentwise Power

Power of Overall Test

Baseline Overall Power

Page 29: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Next, we see how power is affected in the presence oftreatment-strata interaction when some of our alpha is spenton stratum specific tests. The figure shows the case of a41% reduction in event rate in one stratum and a 9%reduction in the other stratum. We again assume that thereare 400 total events and that the number of events on thecontrol arm of each stratum is equal. When the strata arepooled we have a 25% reduction in event rate. The overallstratified log-rank test at the .05 level has power 0.829.Reducing 0 to 0.04 reduces the power of the overall test to0.804, but increases the experimentwise power to 0.906!Spending any more than 1% of alpha on subgroup testsdoes not materially increase the experimentwise powereven in the presence of this very substantial interaction.

Page 30: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Experimentwise Power (1 = 2) for various pairs of reduction in event rates

0.7

0.75

0.8

0.85

0.9

0.95

1

0.01 0.02 0.03 0.04 0.05

Size of Overall Test

Exp

erim

entw

ise

Pow

er

(50, 0)

(40, 10)(35, 15)

(25, 25)Baseline Overall Power

Page 31: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Next, we see how varying the magnitude of interaction affectsexperimentwise power. The figure shows a variety of pairs ofreduction in event rates in the two strata such that when thestrata are pooled we have a 25% reduction in event rate. Weagain assume that there are 400 total events and that thenumber of events on the control arm of each stratum is equal.The overall stratified log-rank test at the .05 level has powerapproximately 0.82. Reducing 0 to 0.04 dramaticallyincreases the experimentwise power in the presence ofinteraction. Spending any more than 1% of alpha onsubgroup tests does not materially increase theexperimentwise power even in the presence of substantialinteraction. Note that for small interaction (the pair (35, 15)),the power is maximized near 0 = 0.04.

Page 32: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

0.25

0.35

0.45

0.55

0.65

0.75

0.85

0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

Size of Overall Test

Exp

erim

entw

ise

Pow

er

(50, 0)

(40, 10)

(35, 15)

(25, 25)

Baseline Overall Power

Experimentwise Power (1 = 2) for various pairs of reduction in event rates

Page 33: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Until now we have only considered what happens whenthe number of patients in the two strata are roughlyequal. We next consider the case where most of thepatients are assigned to the stratum with the smallertreatment effect. The figure shows a variety of pairs ofreduction in event rates for comparison with theprevious figure. We again assume that there are 400total events, but now the numbers of events on thecontrol arms of the two strata are not equal. Thenumber of events on the control arm of stratum 2 isthree times the number of events as the control arm ofstratum 1. We see that the power suffers when most ofthe patients are in the stratum with a small reduction inevent rate.

Page 34: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Experimentwise Power (1 = 2) for various pairs of reduction in event rates

0

0.1

0.2

0.3

0.4

0.5

0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

Size of Overall Test

Exp

erim

entw

ise

Pow

er

(25, -25)(20, -20)

(12, -12)(0, 0)

Page 35: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Now, we consider the case of no overall treatmenteffect but varying degrees of treatment-stratainteraction. The figure shows a variety of pairs ofreduction in event rates in the two strata such thatwhen the strata are pooled we have no reduction inevent rate. We again assume that there are 400 totalevents and that the number of events on the controlarm of each stratum is equal. Reducing 0 increasesthe experimentwise power in the presence ofinteraction. When there is no overall effect and thetreatment is beneficial in one stratum and detrimentalin the other the multiple testing approach is not verypowerful.

Page 36: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Average Experimentwise Power (1 = 2) for Various Allocations ofEvents to the Control Arms of the Two Strata.

0.74

0.76

0.78

0.8

0.82

0.84

0.86

0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

Size of Overall Test

Exp

erim

entw

ise

Pow

er

(1:1)

(1:2)

(1:3)

(1:5)

(1:9)

Baseline Overall Power

Page 37: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

The final figure shows average experimentwise power forvarious allocations of events to the control arms (rates ofaccrual) of the two strata. In each case, there is a 25%reduction in event rate when the strata are pooled. We place aprior probability distribution on the difference in percentreduction in event rate. The prior is normal with mean zeroand standard deviation such that there is a 5% probability ofqualitative interaction (treatment is beneficial in one stratumand detrimental in the other). The figure shows the expectedpower given the prior distribution.

We see that the multiple testing procedure is most effectivewhen the number of events on the control arms of the twostrata are not too far out of balance.

Page 38: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

Conclusion• The alpha spending approach described here is very

efficient and effective when the treatment effect is in the same direction in each stratum and there may or may not be small to moderate differences in the size of the effect between strata.

• The method is also sensitive to the balance of allocation of patients (events) to the two strata. When the sizes of the stratum level tests are equal, the approach seems to be quite effective when the balance is no worse than about 3 to 1. We suggest spending more alpha on the stratum with the most patients (events) when the number of patients is out of balance.

Page 39: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

• Spending between ½ and 1 percent of alpha (setting 0 equal to .045 to .04) would seem to be a prudent

choice for k = 2 strata and the range of circumstances explored in this paper when substantial interaction is thought to be unlikely apriori.

• When there is no overall effect but there may be offsetting effects between the strata, the alpha spending approach is not very powerful. Designing the trial for a test for interaction would be much more effective in this situation. If one were to use the multiple testing procedure in this situation, most of the alpha should be spent on the within strata tests.

Page 40: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

• In the design of NSABP protocol B-29, we expected little or no interaction and nearly equal accrual to the two stratum levels. Given our design assumptions in B-29, we spent about ½ % of alpha on stratum level tests and set the size of the stratum level tests equal. If we had anticipated unequal accrual to the strata or significant interaction, we likely would have altered our choices. Our choice of alpha spending (0 0.045,

1 = 2 0.006), proved to preserve power in the presence of mild perturbations of design assumptions.

Page 41: Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant.

• The tools described in this paper can be adapted to the design of other potential trials. Given prior beliefs regarding the likelihood of significant treatment-strata interaction, balance of accrual to the stratum levels, and other factors, one can explore the sensitivity of power to design assumptions and parameters much as we have in the latter part of this paper.