Download - Allocating Recycled Signi cance Levels in Group Sequential … · 2020. 6. 19. · ProblemGSHv and GSHf ProceduresGSP(r) ProcedureMethods for Constructing GSP(r)Performance Comparisons

Problem GSHv and GSHf Procedures GSP(r) Procedure Methods for Constructing GSP(r) Performance Comparisons for a Single Hypothesis Multiple Hypotheses Diabetes Trial Example Concluding Remarks

Allocating Recycled Significance Levels in GroupSequential Procedures for Multiple Endpoints

Dong XiNorthwestern University

Joint work with Ajit C. Tamhane, Northwestern University

(Thanks to Ekkehard Glimm)

IWSM (July 2013)

1 / 32


Outline

Problem

GSHv and GSHf Procedures

GSP(r) Procedure

Methods for Constructing GSP(r)

Performance Comparisons for a Single Hypothesis

Multiple Hypotheses

Diabetes Trial Example

Concluding Remarks

2 / 32


Problem

• Test hypotheses H1, . . . ,Hn concerning n ≥ 2 endpointsusing group sequential procedures (GSPs).

• Strong control of the familywise error rate (FWER):

FWER = P{Reject at least one true Hi} ≤ α.

• Previous works: Follman, Proschan & Geller (1994), Tang &Geller (1999) and others.

• Problem: How to incorporate recycling in a GSP?

3 / 32


Recycling

• More powerful multiple test procedures (MTPs) can beconstructed by recycling significance levels from rejectedhypotheses to unrejected hypotheses.

• Bretz et al. (2009) and Burman et al. (2009) proposedgraphical approaches to construct MTPs with recycling basedon weighted Bonferroni tests.

• Graphical representation of the Holm procedure for twohypotheses:

Initial graph

H1

0.025

H2

0.0251

1

Graph after H1 rejected

H2

0.05

4 / 32


GSPs with Recycling

• Maurer & Bretz (2013) and Ye et al. (2013) studied theproblem of constructing GSPs with recycling. We build onthese two papers.

• For GSPs, a new problem arises: How to allocate the recycledsignificance level to the stages of the GSP for the unrejectedhypothesis?

• Ye et al. (2013) proposed two procedures: Group SequentialHolm Variable (GSHv) and Group Sequential Holm Fixed(GSHf).

• Maurer & Bretz (2013) implicitly used the GSHv procedure.

5 / 32


GSHv and GSHf Procedures

• GSHv allocates the recycled significance level to all stages ofthe GSP for the unrejected hypothesis.

• GSHf allocates the recycled significance level only to the finalstage of the GSP for the unrejected hypothesis.

• If recycling occurs at stage s > 1, GSHv wastes the portion ofthe recycled significance level allocated to stages 1, . . . , s− 1since those stages can’t be revisited.

• GSHf does not waste any recycled significance level, but thetrial has to continue to the final stage to benefit fromrecycling.

• In general, neither GSHv nor GSHf minimizes E(N).

6 / 32


GSP(r) Procedure

• Consider m-stage GSPs to test H1 and H2 (m− 1 interimanalyses and a final analysis).

• Assume Bonferroni split of α: α1 and α2 s.t. α1 + α2 = α.

• Fix a common r (1 ≤ r ≤ m) for GSPs for both H1 and H2.

• Assume that H1 is rejected at Stage s before H2 is rejected.

• GSP(r) allocates α1 to stages r, r + 1, . . . ,m of GSP for H2.

• GSP(1) = GSHv, GSP(m) = GSHf.

• We call r the planned change point and s the recycling point.

7 / 32


GSP(r) Procedure

• Change in the GSP boundary for H2 due to recycling of α1

cannot take place before the rth or the sth stage, whicheveroccurs later.

• Let u = max(r, s), the effective change point.

• If s > r then the portion of α1 allocated to stagesr, r + 1, . . . , s− 1 is wasted.

• If s < r then full α1 is utilized but not until the rth stage.

• Ideally, we would like to set r = s, but such an adaptive GSPdoes not always control FWER.

8 / 32


Adaptive GSP(s) Procedure

• Adaptive GSP(s) procedure controls FWER when the teststatistics for H1 and H2 are independent (more generally forn ≥ 2 hypotheses).

• max FWER > α for ρ > 0 where ρ is the correlationcoefficient between the test statistics for H1 and H2 and themax is taken over δ1 = noncentrality parameter of H1

assuming H1 is false and H2 is true.

0 1 2 3 4 5 6 7 8 9 100.02

0.025

0.03

0.035

0.04

0.045

0.05

0.055

0.06

0.065

δ1

FWER

ρ=0ρ=0.1

ρ=0.2

ρ=0.3ρ=0.4ρ=0.5

ρ=0.6

ρ=0.7

ρ=0.8

ρ=0.9

ρ=1

9 / 32


Example

• α1 = α2 = 0.025, m = 3.

• Initial boundary for both H1 and H2: Pocock at 0.025 level:

(c1, c2, c3) = (2.289, 2.289, 2.289).

• Modified Pocock boundaries for H2 if H1 is rejected and its0.025 level is recycled to H2:

GSP(1): (1.992, 1.992, 1.992)GSP(2): (2.289, 1.890, 1.890)GSP(3): (2.289, 2.289, 1.737)

• If s = 2, then the effective boundary for GSP(1): (2.289,1.992, 1.992).

10 / 32


Problem Setting

• Consider testing a single hypothesis H0 initially at level γusing an m-stage GSP.

• At some random stage s (1 ≤ s ≤ m) recycling takes placeand total level for testing H0 is raised to γ′ > γ; e,g,,γ = 0.025, γ′ = 0.05.

• Let (Z1, . . . , Zm) be the test statistics with m-variate normaldistribution s.t. under H0,

E(Zi) = 0, var(Zi) = 1, corr(Zi, Zj) =√i/j for 1 ≤ i < j ≤ m.

• Let (c1(γ), . . . , cm(γ)) denote the initial γ-level boundary.

11 / 32


Boundary Method for Constructing GSP(r)

• The modified γ′-level boundary(c1(γ), . . . , cr−1(γ), c∗r(γ

′), . . . , c∗m(γ′)) obtained by solvingthe equation

1− γ′ = P{Z1 ≤ c1(γ), . . . , Zr−1 ≤ cr−1(γ),

Zr ≤ c∗r(γ′), . . . , Zm ≤ c∗m(γ′)}

such that c∗k(γ′) ≤ ck(γ) for k = r, . . . ,m.

• May choose the same form for c∗k(γ′) as the initial boundary,e.g., if the initial boundary is Pocock then set

c∗r(γ′) = · · · = c∗m(γ′).

• This method is used in the previous example.

12 / 32


Error Spending Function Method for Constructing GSP(r)

• Error spending function introduced by Lan & DeMets (1983)as a flexible method for constructing GSPs.

• Let ε(γ, t) be the initial error spending function which is ↑ int ∈ [0, 1] s.t. ε(γ, 0) = 0 and ε(γ, 1) = γ.

• Approximate error spending functions for Pocock (POC) andO’Brien-Fleming (OBF):

εPOC(γ, t) = γ ln[1 + (e− 1)t], εOBF(γ, t) = 2Φ(−zγ/2/√t).

• Let 0 = t0 < t1 < · · · < tm = 1 be the information times.

13 / 32



• The modified error spending function ε∗(γ′, r, t) can beexpressed as

ε∗(γ′, r, t) =

{ε(γ, t) for 0 ≤ t ≤ tr−1ε(γ, tr−1) + f(γ′, r, t) for tr−1 < t ≤ 1,

where f(γ′, r, t) is ↑ in t with f(γ′, r, tr−1) = 0 andf(γ′, r, 1) = γ′ − ε(γ, tr−1) so that ε∗(γ′, r, 1) = γ′.

• Any other error spending function than the original ε(γ, t) canbe used subject to a certain monotonicity condition.

14 / 32



• A good choice for f(γ′, r, t):

f(γ′, r, t) = ε(γ∗, t)− ε(γ∗, tr−1),

where γ∗ satisfies

γ∗ − ε(γ∗, tr−1) = γ′ − ε(γ, tr−1).

• Check: f(γ′, r, tr−1) = 0 and

f(γ′, r, 1) = ε(γ∗, 1)−ε(γ∗, tr−1) = γ∗−ε(γ∗, tr−1) = γ′−ε(γ, tr−1).

15 / 32



Error spending functions for GSP(1), GSP(2) and GSP(3) usingthe POC boundary (m = 3, γ = 0.025, γ′ = 0.05)

0 1/3 2/3 1

Information fraction

Errorspent

γ = 0.025

γ′ = 0.05

r = 1 r = 3r = 2

16 / 32



• How to calculate the boundary from the modified errorspending function?

• Calculate “spent”levels:

α∗k(γ′) = ε∗(γ′, r, tk)− ε∗(γ′, r, tk−1) (1 ≤ k ≤ m).

Note that∑m

k=1 α∗k(γ′) = γ′.

• Then solve for the c∗k(γ′) recursively from the following set ofequations for 1 ≤ k ≤ m:

α∗k(γ′) = P

k−1⋂j=1

[Zj ≤ c∗j (γ′)

]⋂[Zk > c∗k(γ′)

].

17 / 32



• c∗k(γ′) = ck(γ) for k < r and c∗k(γ′) < ck(γ) for k ≥ r.

• Maurer & Bretz (2013) showed that ensure consonance andhence a stepwise shortcut, we need monotonicity: c∗k(γ′) ↓ asγ′ ↑ which requires αk(γ′) ↑ as γ′ ↑. This is a condition onboth ε(γ, t) and ε∗(γ′, r, t).

• Both POC and OBF boundaries satisfy this monotonicitycondition.

18 / 32


Example

• For the POC error spending function withm = 3, r = 2, γ = 0.025, γ′ = 0.05, we can calculateγ∗ = 0.0707. So for t > 1/3

ε∗(.05, 2, t) = ε(0.025, 1/3) + ε(0.0707, t)− ε(0.0707, 1/3).

• This gives ε∗(.05, 2, 1/3) = 0.0113, ε∗(.05, 2, 2/3) = 0.0333and ε∗(.05, 2, 1) = 0.05. Hence the spent levels are

α∗2(.05) = 0.0333− 0.0113 = 0.0220,

α∗3(.05) = 0.05− 0.0333 = 0.0167.

• c∗2(.05) and c∗3(.05) can be determined recursively fromα∗2(.05) and α∗3.05) as c∗2(.05) = 1.925 and c∗3(.05) = 1.865.Note that they are not equal.

• Using the boundary method, c∗2(.05) = c∗3(.05) = 1.890.

19 / 32


Expected Sample Size Comparisons

• Consider testing H0 : θ = 0 vs. H1 : θ > 0 using an m-stageGSP(r) with γ = 0.025 and γ′ = 0.05.

• For fixed total sample size M = mn where n is the samplesize per stage (assuming a common sample size), powerincreases with r for each s, so maximum power is attainedwith GSP(m).

• However, E(N) is also higher for GSP(m) since it stops late,often at the last stage. So we fix power and find r thatminimizes E(N).

• Power requirement: Power using GSP(r) = 1− β whenθ = δ > 0.

• Determine M to guarantee power and then calculate

E(N) = n

m−1∑k=1

kP (GSP stops and rejects H0 at Stage k|θ = δ)

+M × P (GSP stops at Stage m|θ = δ).20 / 32


Expected Sample Size Comparisons

Expected sample sizes (expressed as percentages of the fixedsample size) for GSP(r) conditional on s(m = 4, γ = 0.025, γ′ = 0.05, Power 1− β = 0.80 at δ = 1).

Initialr

E(N)Boundary s = 1 s = 2 s = 3 s = 4

OBF

1 81.42 81.74 85.15 90.132 81.71 81.71 85.12 90.103 84.17 84.17 84.17 89.334 86.13 86.13 86.13 86.13

POC

1 79.30 83.42 88.04 92.732 78.55 78.55 84.09 89.843 79.25 79.25 79.25 86.164 80.43 80.43 80.43 80.43

21 / 32


Multiple Hypotheses

• Use the graphical approach and the algorithm of Bretz et al.(2009) for updating weights on hypotheses, significance levelsand transition parameters.

• Calculate modified error spending function and correspondingmodified boundary for each rejection at each stage.

• Allows multiple rejections at each stage.

22 / 32



• Maurer & Bretz (2013) used a 3-stage GSP(1) with equalsample sizes. We will use GSP(2).

• Primary endpoint: HbA1c, Secondary endpoint: Bodyweight.

• Low dose and high dose vs. placebo.

• Gatekeeping restriction: Within each dose test the secondaryendpoint only if the primary endpoint is significant.

• Overall α = 0.025. Initial significance levels:α1 = 0.0125, α2 = 0.0125, α3 = 0, α4 = 0.

23 / 32



Initial graph

Primary

Secondary

H1

12

H2

12

H3

0

H4

0

1/2

12

1/2

12

1 1

24 / 32



• Use the O’Brien-Fleming (OBF) boundary for all hypotheses.

• Stage 1 Test Statistics:

Z11 = 2.50, Z21 = 2.12, Z31 = 2.61, Z41 = 1.13.

The OBF boundary for H1 and H2 is (3.935, 2.782, 2.272).Neither H1 nor H2 can be rejected.

• Stage 2 Test Statistics:

Z12 = 3.04, Z22 = 2.63, Z32 = 2.86, Z42 = 1.55.

Since Z12 = 3.04 > 2.782, reject H1. but not H2.

25 / 32



H1 rejected

H2

34

H3

14

H4

0

13

23

1

1/2

12

26 / 32



• New significance levels: α2 = 0.01875, α3 = 0.00625, α4 = 0.

• The modified OBF boundary using GSP(2):(3.935, 2.591, 2.118) for H2 and (∞, 3.085, 2.519) for H3.

• Since Z22 = 2.63 > 2.591, reject H2.

• New graph

H1, H2 rejected

H3

12

H4

12

1/2

1/2

27 / 32



• New significance levels: α3 = 0.0125, α4 = 0.0125.

• The modified OBF boundary for H3 and H4 using GSP(2):(∞, 2.780, 2.272).

• Since Z32 = 2.86 > 2.780, reject H3.

• New graph

H1, H2, H3 rejected

H4

1

28 / 32



• New significance level: α4 = 0.025.

• The modified OBF boundary for H4 using GSP(2):(∞, 2.452, 2.003).

• Since Z42 = 1.55 < 2.452 we can’t reject H4.

• At this point, the trial may proceed to Stage 3 or the DMCmay decide to terminate the trial.

29 / 32


Concluding Remarks

• For given power requirement, E(N) is generally minimized forsome r between 1 and m.

• This value of r can be determined by simulation.

• The goal may be other than minimizing E(N), e.g., theEMEA guideline “Often it may not be acceptable to stop atrial very early, despite convincing efficacy results, becauseinsufficient data on safety, or on secondary endpoints may beavailable,” so a larger r may be chosen.

• Ongoing work: incorporate futility boundaries.

30 / 32


References I

Bretz, F., Maurer, W., Brannath W. and Posch, M. (2009). A graphicalapproach to sequentially rejective multiple test procedures. Statistics inMedicine, 28, 586–604.

Burman, C.F., Sonesson, C. and Guilbaud, O. (2009). A recyclingframework for the construction of Bonferroni-based multiple tests.Statistics in Medicine, 28, 739-761.

European Medicines Agency (EMA). (2007). Reflection paper onmethodological issues in confirmatory clinical trials with flexible designand analysis plan. London, UK: EMA.

Geller, N.L., Proschan, M.A. and Follmann, D.A. (1995). Group sequentialmonitoring of multi-armed clinical trials. Drug information journal, 29,705–713.

Lan, K.K.G. and DeMets, D.L. (1983). Discrete sequential boundaries forclinical trials. Biometrika, 70, 659–663.

31 / 32


References II

Maurer, W. and Bretz, F. (2013). Multiple testing in group sequentialtrials using graphical approaches. Statistics in BiopharmaceuticalResearch, published online.

O’Brien, P.C. and Flemming, T.R. (1979). A Multiple Testing Procedurefor Clinical Trials. Biometrics, 35, 549–556.

Pocock, S.J. (1977). Group sequential methods in the design and analysisof clinical trials. Biometrika, 64, 191–199.

Tang, D.I. and Geller, N.L. (1999). Closed testing procedures for groupsequential clinical trials with multiple endpoints. Biometrics, 55,1188–1192.

Ye, Y., Li, A., Liu, L. and Yao, B. (2013). A group sequential Holmprocedure with multiple primary endpoints. Statistics in Medicine, 32,1112–1124.

32 / 32