The role of the upper sample size limit in two-stage bioequivalence designs

8
International Journal of Pharmaceutics 456 (2013) 87–94 Contents lists available at ScienceDirect International Journal of Pharmaceutics j o ur nal ho me page: www.elsevier.com/locate/ijpharm The role of the upper sample size limit in two-stage bioequivalence designs Vangelis Karalis Laboratory of Biopharmaceutics and Pharmacokinetics, Faculty of Pharmacy, National and Kapodistrian University of Athens, 15771 Athens, Greece a r t i c l e i n f o Article history: Received 18 June 2013 Received in revised form 6 August 2013 Accepted 8 August 2013 Available online 14 August 2013 Keywords: Bioequivalence Two-stage designs Adaptive designs Pharmacokinetic simulations a b s t r a c t Two-stage designs (TSDs) are currently recommended by the regulatory authorities for bioequivalence (BE) assessment. The TSDs presented until now rely on an assumed geometric mean ratio (GMR) value of the BE metric in stage I in order to avoid inflation of type I error. In contrast, this work proposes a more realistic TSD design where sample re-estimation relies not only on the variability of stage I, but also on the observed GMR. In these cases, an upper sample size limit (UL) is introduced in order to prevent inflation of type I error. The aim of this study is to unveil the impact of UL on two TSD bioequivalence approaches which are based entirely on the interim results. Monte Carlo simulations were used to investigate several different scenarios of UL levels, within-subject variability, different starting number of subjects, and GMR. The use of UL leads to no inflation of type I error. As UL values increase, the % probability of declaring BE becomes higher. The starting sample size and the variability of the study affect type I error. Increased UL levels result in higher total sample sizes of the TSD which are more pronounced for highly variable drugs. © 2013 Elsevier B.V. All rights reserved. 1. Introduction Classically, assessment of bioequivalence (BE) relies on the concept of average BE. In this case, two medicinal products are considered bioequivalent when the estimated ninety percent confi- dence interval (90% CI) for the difference of mean pharmacokinetic metrics lies within the predefined limits of acceptance (usually, 0.80–1.25%) (EMA, 2010; FDA, 2001, 2003). BE studies are in essence clinical trials and therefore their design obeys the same general principles of clinical studies. In this context, sample size estimation is of crucial importance in BE studies. This implies that one should have a prior knowledge of: (a) the expected difference in the mean values of the pharmacokinetic metric (e.g. AUC, C max ) between and test and reference formulation and (b) the within-subject vari- ability of the active moiety. Wrong estimates of variability and/or Abbreviations: ANOVA, analysis of variance; BE, bioequivalence; CI, confi- dence interval; CVw, coefficient of variation of the within-subject variability of the bioequivalence metric; GMR, geometric mean ratio (test/reference) of the bioequiv- alence metric; N, total number of subjects participating in the study; N1, starting sample size; N2, additional number of subjects recruited at the second stage; TSD, two-stage design; TSD-1, first type of two-stage design used in this study; TSD-2, second type of two-stage design used in this study; UL, upper sample size limit; ˛, type I error of the nominal statistical hypothesis. Correspondence address: Laboratory of Biopharmaceutics and Pharmacokine- tics, Faculty of Pharmacy, National and Kapodistrian University of Athens, University Campus, 15771 Athens, Greece. Tel.: +30 2107274267; fax: +30 2107274027. E-mail address: [email protected] difference in the pharmacokinetic parameters may lead to BE stud- ies which are under- or over-powered. Both situations are not desired, since low statistical power results in inability to prove the alternate statistical hypothesis (i.e. bioequivalence), whereas over- powered studies lead to increased study costs and unnecessary exposure of humans to drugs. In contrast, adaptive design methods can be used instead of typical single-stage studies in order to face the above mentioned problems. Adaptive methods allow modifications made to trial and/or statistical procedures of ongoing clinical trials. The con- cept of adaptive design was first considered back to 70s when the adaptive randomization and a class of designs for sequential clin- ical trials were introduced. Since then, several types of adaptive designs have been proposed in the literature, such as group sequen- tial, sample size re-estimation, drop loser, response adaptive randomization, adaptive dose escalation, adaptive hypotheses, and seamless designs (Gallo et al., 2006; Dragalin, 2006; Chow and Chang, 2008). Even though, adaptive designs brought many advan- tages in clinical research, some difficulties also exist (Emerson and Fleming, 2010; Mehta and Pocock, 2011). Two-stage design (TSD) approaches rely on the basis that if BE cannot be demonstrated on the first stage of the study, then the sponsor can enroll more volunteers during the second stage of the study (Pong and Chow, 2011). At stage II of the study sample size re-estimation takes place based on the interim results of the first stage. Even though, adaptive designs offer many advantages in clinical research some problems may arise. For example, many adaptations of the study may lead to a significantly different trial. In addition, as the number of interim 0378-5173/$ see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.ijpharm.2013.08.013

Transcript of The role of the upper sample size limit in two-stage bioequivalence designs

Page 1: The role of the upper sample size limit in two-stage bioequivalence designs

Tb

VL

a

ARRAA

KBTAP

1

ccdm0cpihvta

dbastst

tC

0h

International Journal of Pharmaceutics 456 (2013) 87– 94

Contents lists available at ScienceDirect

International Journal of Pharmaceutics

j o ur nal ho me page: www.elsev ier .com/ locate / i jpharm

he role of the upper sample size limit in two-stageioequivalence designs

angelis Karalis ∗

aboratory of Biopharmaceutics and Pharmacokinetics, Faculty of Pharmacy, National and Kapodistrian University of Athens, 15771 Athens, Greece

r t i c l e i n f o

rticle history:eceived 18 June 2013eceived in revised form 6 August 2013ccepted 8 August 2013vailable online 14 August 2013

eywords:

a b s t r a c t

Two-stage designs (TSDs) are currently recommended by the regulatory authorities for bioequivalence(BE) assessment. The TSDs presented until now rely on an assumed geometric mean ratio (GMR) value ofthe BE metric in stage I in order to avoid inflation of type I error. In contrast, this work proposes a morerealistic TSD design where sample re-estimation relies not only on the variability of stage I, but also on theobserved GMR. In these cases, an upper sample size limit (UL) is introduced in order to prevent inflationof type I error. The aim of this study is to unveil the impact of UL on two TSD bioequivalence approaches

ioequivalencewo-stage designsdaptive designsharmacokinetic simulations

which are based entirely on the interim results. Monte Carlo simulations were used to investigate severaldifferent scenarios of UL levels, within-subject variability, different starting number of subjects, and GMR.The use of UL leads to no inflation of type I error. As UL values increase, the % probability of declaringBE becomes higher. The starting sample size and the variability of the study affect type I error. IncreasedUL levels result in higher total sample sizes of the TSD which are more pronounced for highly variable

drugs.

. Introduction

Classically, assessment of bioequivalence (BE) relies on theoncept of average BE. In this case, two medicinal products areonsidered bioequivalent when the estimated ninety percent confi-ence interval (90% CI) for the difference of mean pharmacokineticetrics lies within the predefined limits of acceptance (usually,

.80–1.25%) (EMA, 2010; FDA, 2001, 2003). BE studies are in essencelinical trials and therefore their design obeys the same generalrinciples of clinical studies. In this context, sample size estimation

s of crucial importance in BE studies. This implies that one shouldave a prior knowledge of: (a) the expected difference in the mean

alues of the pharmacokinetic metric (e.g. AUC, Cmax) between andest and reference formulation and (b) the within-subject vari-bility of the active moiety. Wrong estimates of variability and/or

Abbreviations: ANOVA, analysis of variance; BE, bioequivalence; CI, confi-ence interval; CVw, coefficient of variation of the within-subject variability of theioequivalence metric; GMR, geometric mean ratio (test/reference) of the bioequiv-lence metric; N, total number of subjects participating in the study; N1, startingample size; N2, additional number of subjects recruited at the second stage; TSD,wo-stage design; TSD-1, first type of two-stage design used in this study; TSD-2,econd type of two-stage design used in this study; UL, upper sample size limit; ˛,ype I error of the nominal statistical hypothesis.∗ Correspondence address: Laboratory of Biopharmaceutics and Pharmacokine-

ics, Faculty of Pharmacy, National and Kapodistrian University of Athens, Universityampus, 15771 Athens, Greece. Tel.: +30 2107274267; fax: +30 2107274027.

E-mail address: [email protected]

378-5173/$ – see front matter © 2013 Elsevier B.V. All rights reserved.ttp://dx.doi.org/10.1016/j.ijpharm.2013.08.013

© 2013 Elsevier B.V. All rights reserved.

difference in the pharmacokinetic parameters may lead to BE stud-ies which are under- or over-powered. Both situations are notdesired, since low statistical power results in inability to prove thealternate statistical hypothesis (i.e. bioequivalence), whereas over-powered studies lead to increased study costs and unnecessaryexposure of humans to drugs.

In contrast, adaptive design methods can be used instead oftypical single-stage studies in order to face the above mentionedproblems. Adaptive methods allow modifications made to trialand/or statistical procedures of ongoing clinical trials. The con-cept of adaptive design was first considered back to 70s when theadaptive randomization and a class of designs for sequential clin-ical trials were introduced. Since then, several types of adaptivedesigns have been proposed in the literature, such as group sequen-tial, sample – size re-estimation, drop loser, response – adaptiverandomization, adaptive dose escalation, adaptive hypotheses, andseamless designs (Gallo et al., 2006; Dragalin, 2006; Chow andChang, 2008). Even though, adaptive designs brought many advan-tages in clinical research, some difficulties also exist (Emerson andFleming, 2010; Mehta and Pocock, 2011). Two-stage design (TSD)approaches rely on the basis that if BE cannot be demonstratedon the first stage of the study, then the sponsor can enroll morevolunteers during the second stage of the study (Pong and Chow,2011). At stage II of the study sample size re-estimation takes place

based on the interim results of the first stage. Even though, adaptivedesigns offer many advantages in clinical research some problemsmay arise. For example, many adaptations of the study may lead toa significantly different trial. In addition, as the number of interim
Page 2: The role of the upper sample size limit in two-stage bioequivalence designs

8 l of Ph

ae

suMudAslPmH

oMmmm(acFa(

su2ifttei(se

rdassmcae

2

2

tdep

(

((

8 V. Karalis / International Journa

nalyses increases, there is an increased risk of inflation of type Irror (i.e., the significance level ˛).

Recently, two-stage designs are allowed to be applied to BEtudies. Several regulatory authorities worldwide recommend these of TSD design for BE assessment. It is quoted in the Europeanedicines Agency (EMA) 2010 guideline that TSD methods can be

sed as alternatives to the standard single stage or the replicateesigns (EMA, 2010, 2013). In the same vein, the US Food and Drugdministration (FDA) allows the application of two-stage group-equential design approaches (FDA, 2012). Also other authorities,ike World Health Organization, Health Canada, and the Japaneseharmaceuticals and Medical Devices Evaluation Agency recom-end the use of add-on designs in BE assessment (WHO, 2006;ealth Canada, 2012; NIPHJ, 2012).

Several articles have recently appeared in the literature focusingn the properties of TSD in case of BE assessment. Potvin et al. andontague et al. published two articles which examined the perfor-ance of four two-stage methods; a naïve TSD approach and threeodifications of adaptive methods assuming a pre-defined geo-etric mean ratio (GMR) value for the pharmacokinetic parameter

Potvin et al., 2008; Montague et al., 2012). In these articles, theuthors did not use the actual GMR observed on stage I, but theyonsidered a fixed prior value of either 0.90 or 0.95. Very recently,uglsang presented a study which focused on two-stage bioequiv-lence designs with increased power and controlled type I errorsFuglsang, 2013).

At the same time, our group published a study where sampleize re-estimation is not based on a prior GMR estimate, but on these of the actual GMR observed in stage I (Karalis and Macheras,013). This situation is more realistic since the true GMR observed

n stage I, apart from the actual within-subject variability, is usedor sample size re-estimation. In the past, Cui et al. have noticedhat increasing sample size based on an interim estimate of thereatment difference can lead to a substantial inflation of type Irror (Cui et al., 1999). In order to deal with this issue, our methodncluded also a pre-defined upper limit (UL) to the total sample sizeN), namely, to the sample size occurring by adding the number ofubjects in stage I (N1) and those enrolled after sample size re-stimation in stage II (N2).

The aim of this study is to elaborate on the necessity and theole of a pre-defined upper sample size limit in two-stage clinicalesigns which are based entirely on interim study results. Two TSDpproaches are used in order to examine: (a) the impact of upperample size limit on the percent of BE acceptance, (b) the relation-hip between UL and type I error, and (c) the effect of UL on theagnitude of the utilized total sample size from stages I and II. Con-

lusions about the role of UL and its appropriate setting in BE studiesre derived. Monte Carlo simulations were used to investigate sev-ral different scenarios that may be encountered in practice.

. Materials and methods

.1. Two-stage designs

Two TSD methods were assessed in this study (Fig. 1). All ofhem were originated from the basic idea of the TSD approachesescribed by Potvin et al. (2008) and Montague et al. (2012). How-ver, our TSD methods were changed appropriately in the followingoints:

a) Sample size re-estimation is based on the actual GMR estimatedat stage I rather than an assumed population GMR of 0.90 or

0.95.

b) A pre-defined upper total sample size is used.c) An initial GMR criterion is used; if GMR lies only within the

0.80–1.25 region, the TSD design will be followed.

armaceutics 456 (2013) 87– 94

Each TSD was split into three segments: A, B1, and B2. StageI of the study includes segments A and B1. Besides, segment B2refers to stage II of the study (Fig. 1). Each stage of the TSD methodsconsists of a two-sequence, two-period (2 × 2) crossover design.Sample size re-estimation always takes place in stage II and it isbased on the observed GMR and the coefficient of variation of thewithin-subject variability (CVw) for the active moiety calculated insegment A. The initial step of each TSD approach is GMR estima-tion relying on the data of segment A. When the point estimate forGMR, of the bioequivalence metric under study, falls outside theregion 0.80–1.25, then the TSD stops and BE failure is declared. Ifthe point estimate for GMR lies within the 0.80–1.25 interval thenBE assessment using each TSD (TSD-1 or TSD-2) method continuesas follows:

2.1.1. TSD-1The statistical power of the study is estimated assuming ̨ = 5%

and using the observed GMR and CVw values. If the estimatedpower is higher than or equal to 80%, then BE assessment is madeat ̨ = 5% (Fig. 1A). The algorithm stops regardless of the BE out-come (pass or fail). In cases when the statistical power is lowerthan 80%, evaluation proceeds into segment B where initially anassessment of BE is made at ̨ = 2.80%. Two alternatives are possi-ble here; either the algorithm stops if BE is declared or estimationcontinues to segment B2 if BE was not shown earlier. In B2, sam-ple size re-estimation takes place setting ̨ = 2.80% and using theCVw and GMR observed in stage I. The final step is assessment ofBE using all data from stage I and II and setting ̨ = 2.80% (Potvinet al., 2008; Montague et al., 2012).

2.1.2. TSD-2TSD-2 algorithm is quite different from TSD-1. In this case, after

the initial GMR criterion, the process continues with BE assessmentat ̨ = 2.94% (Fig. 1B). If BE is met, then the algorithm stops (segmentA). When BE is not declared, then the statistical power of the studyis estimated using the observed GMR and CVw values calculated inthe previous step and setting ̨ = 2.94%. If the so-derived power ishigher than or equal to 80%, then the procedure stops (segment B1).However, in cases when the statistical power is lower than 80%, thealgorithm continues to segment B2 where sample-re-estimationtakes place. The latter is based on the actual values of GMR andCVw found in stage I and ̨ = 2.94%. The final step of the algorithminvolves estimation of BE on a type I error ̨ = 2.94% using all datafrom both stages I and II (Potvin et al., 2008; Montague et al., 2012).

2.1.3. Important pointsObviously, according to the utilized algorithms, TSD-1 or TSD-

2, statistical power calculation and sample size re-estimation arebased on the observed CVw and the GMR estimated of stage 1 ratherthan an assumed population GMR of 0.90 or 0.95. In addition, itshould be highlighted that in this study an upper limit to the totalsample size (the sum from stages I and II) was set. Several valuesof the UL were considered in this analysis (see Section 2.3 for moredetails). The minimum number of subjects recruited at stage II istwo. Therefore, assuming that the total number of subjects fromboth stages is N = (N1 + N2) ≤ UL, the number of additional subjectsestimated at stage II could range from 2 to UL–N1 (EMA, 2013).

In the utilized algorithm (either TSD-1 or TSD-2), the conditionwhether N is lower than or equal to UL was checked after estimationof N2 at segment II. If the required N2 resulted in a value of N whichexceeded UL, then algorithm stopped and a BE failure was declared.Otherwise, the algorithm continued to the assessment of BE using

data from both stages. To this point it should be mentioned thatif one defines N1 to be higher than UL, then it is implied that BEassessment does not proceed into stage II; in other words the TSDdesign reduces to a simple one-stage design.
Page 3: The role of the upper sample size limit in two-stage bioequivalence designs

V. Karalis / International Journal of Pharmaceutics 456 (2013) 87– 94 89

F y. Keyb numb

imasf

mt

2

2ab2ltPISP22tu

ig. 1. Schematic representation of the two-stage designs (TSDs) used in this studioequivalence metric under study; N1, the starting sample size; N2, the additional

The initial GMR criterion was adopted in order to mimic the real-stic conditions. If the estimated point GMR of the bioequivalence

etric is outside the acceptance limits, nobody would continue study hoping that he will able to satisfy the BE limits after theecond stage. Plausibly, he would stop the study, and try to re-ormulate the product.

Finally, no futility criterion, regarding the probability toeet the BE criteria, was used in any of the TSD methods

ested.

.2. Bioequivalence assessment – statistics

Bioequivalence was based on the concept of average BE (EMA,010; FDA, 2003). The two drug products were declared bioequiv-lent if the 90% CI around the difference in the ln-domain of theioequivalence metric was within the 0.80–1.25 limits (Chen et al.,001; EMA, 2010; FDA, 2001, 2003; Schuirmann, 1987). A general

inear model (ANOVA) was applied to the ln-transformed values ofhe PK metric. In case of stage I, the ANOVA effects were Sequence,eriod, Treatment, and Subject (Sequence) (EMA, 2010). In stageI, the effects included in the ANOVA were: Sequence, Treatment,tage, Period (Stage), and Subject (Sequence × Stage) (EMA, 2010;otvin et al., 2008; Montague et al., 2012; Karalis and Macheras,

013). In all cases, the effects were considered to be ‘fixed’ (EMA,010, 2013). The residual variability resulted from ANOVA referredo the within-subject variability, since a 2 × 2 crossover design wassed.

: ˛, the significance level; GMR, the geometric mean ratio (test/reference) of theer of subjects estimated at stage II; UL, upper sample size limit of the study.

A detailed description of the design matrix construction wasprovided in previous works and for this reason only a brief depictionof the TSD will be provided in this study (Karalis and Macheras,2013). Simulated values for a single pharmacokinetic parameterwere generated assuming a log-normal distribution. These valueswere appropriately assigned to the two treatments of the study(test or reference) which were then divided into the two sequencesand periods of stage I. The entire procedure was performed in a waywhich ensured randomness and balance with respect to sequenceand period. In the same vein, data for stage II were generated in asimilar way ensuring balance with respect to sequence, period, andtreatment. Sample size re-estimation, in stage II, was accomplishedthrough an automated iterative algorithm (Julious, 2004; Karalisand Macheras, 2013).

Finally, using either data from stage I or the combined data fromstages I and II, BE was assessed by applying ANOVA to the entiredesign matrix. The within-subject variability was used to constructthe (1–2˛)% confidence intervals around the GMR (test/reference)of the PK parameter. It should be stated that although the samplesizes N1 and N2 might be different, BE assessment was based onPocock’s method (Pocock et al., 1977).

2.3. Monte Carlo simulations

In case of the estimation of the type I error rate values, threelevels (20%, 40%, and 60%) of the theoretical CVw values of the ini-tial population were considered in the simulations. In addition, fourstarting sample size values were considered: 12, 24, 36, and 60. Four

Page 4: The role of the upper sample size limit in two-stage bioequivalence designs

90 V. Karalis / International Journal of Pharmaceutics 456 (2013) 87– 94

F 1 appT % (lefs

dUci

tTusecc

(uweK

ig. 2. Percentage of bioequivalence (BE) studies accepted versus GMR for the TSD-he coefficient of variation of the within-subject variability (CVw) was equal to 20ample sizes, N1, were assumed: 24 and 48.

ifferent upper sample size limiting values were examined: UL = 62,L = 100, UL = 150, and UL = 500. Under each condition, namely, theombinations of TSD, CVw, N1, and UL, a number of 1,000,000 stud-es were generated.

Monte Carlo simulations were also used to study the impact ofhe upper sample size limit on the percent of BE acceptance of eachSD method and the total sample size used. In this case, UL val-es equal to 62, 72, and 150 were considered. The starting sampleize values were 62, 72, and 150, while two levels of CVw werexamined: 20% and 40%. The theoretical GMR value was graduallyhanged, from 1.00 to 1.25 applying a step of 0.025. Under eachondition, a number of 100,000 studies were simulated.

The entire programming work was implemented in MATLAB®

The MathWorks, Inc.). All functions were validated prior to their

se, while the Monte Carlo simulation approach was in accordanceith other published studies and our previous works (Tothfalusi

t al., 2001; Tothfalusi and Endrenyi, 2003; Karalis et al., 2004;aralis and Macheras, 2013).

roach. Two upper sample size (UL) limiting values were used: UL = 62 and UL = 150.t column) and 40% (right column) for the initial population. Two different starting

3. Results and discussion

Tables 1 and 2 list the type I error values for the TSD-1 and TSD-2approaches analyzed in this study. Three levels of within-subjectvariability (20%, 40%, and 60%) as well as four different startingsample sizes (12, 24, 36, and 60) were used. The upper sample sizevalues were set to 62, 100, 150, and 500. Plausibly, the UL = 62 wasselected to reflect the boundary condition where only two addi-tional subjects can be recruited during sample size re-estimationin case of N1 = 60.

In all cases the estimated values are lower than 5% and there-fore no inflation of type I error beyond 5% becomes apparent. As theupper limit value of sample size increases, from 62 to 500, there isa consistent increase of type I error rates. However, in none of the

cases the limit of 5% was exceeded. The TSD-1 method presents amore conservative approach, than TSD-2 method, since lower per-centages are observed (Tables 1 and 2). The reason for the stricterperformance of TSD-1 method, than TSD-2 can be ascribed to the
Page 5: The role of the upper sample size limit in two-stage bioequivalence designs

V. Karalis / International Journal of Pharmaceutics 456 (2013) 87– 94 91

Fig. 3. Percentage of bioequivalence (BE) studies accepted versus GMR for the TSD-2 approach. Two upper sample size (UL) limiting values were used: UL = 62 and UL = 150.The coefficient of variation of the within-subject variability (CVw) was equal to 20% (left column) and 40% (right column) for the initial population. Two different startingsample sizes, N1, were assumed: 24 and 48.

Table 1Type I error rate for the TSD-1 approach for upper sample size limiting values equalto 62, 100, 150, and 500. Three levels of within-subject variability (CVw) are listed:20%, 40%, and 60%. Starting sample size (N1) is assumed to be equal to 12, 24, 36,and 60. Under each condition 1,000,000 studies were simulated.

CVw N1 Upper sample size limit

62 100 150 500

20 12 3.80 4.09 4.27 4.6624 3.09 3.58 3.89 4.4736 2.81 3.18 3.56 4.2960 2.81 2.81 3.03 4.07

40 12 1.01 1.46 1.75 2.2524 1.82 2.48 2.79 3.4136 2.61 3.03 3.44 4.1660 2.81 2.81 3.03 4.07

60 12 0.08 0.28 0.55 1.2024 0.09 0.35 0.78 1.6436 0.29 0.56 1.10 2.0660 2.09 2.09 2.39 3.56

Table 2Type I error rate for the TSD-2 approach in case of upper sample size limiting valuesequal to 62, 100, 150, and 500. Three levels of within-subject variability (CVw) arelisted: 20%, 40%, and 60%. Starting sample size (N1) is assumed to be equal to 12, 24,36, and 60. Under each condition 1,000,000 studies were simulated.

CVw N1 Upper sample size limit

62 100 150 500

20 12 3.95 4.25 4.46 4.9024 3.23 3.70 4.06 4.6736 2.95 3.34 3.71 4.4760 2.95 2.95 3.18 4.21

40 12 1.08 1.55 1.86 2.3724 1.97 2.60 2.99 3.6236 2.77 3.22 3.59 4.3860 2.95 2.95 3.18 4.21

60 12 0.09 0.30 0.60 1.2724 0.10 0.39 0.86 1.7436 0.35 0.61 1.24 2.2260 2.26 2.26 2.57 3.75

Page 6: The role of the upper sample size limit in two-stage bioequivalence designs

9 l of Pharmaceutics 456 (2013) 87– 94

fa

s(rHTlTyHewcpabtoiii2

oa(oIs

teitft(Tt

oiitvtstboildv

to1%d

tIh

Fig. 4. The joint effect of the upper sample size limit (UL) and starting sample size(N1). Percentage of bioequivalence (BE) studies accepted versus GMR for the TSD-2 approach for the following conditions: UL = 72 with N1 = 72 versus UL = 150 withN1 = 48. In the first case, setting N1 to equal to the UL implies that no sample size re-estimation at stage II can take place. The coefficient of variation of the within-subject

2 V. Karalis / International Journa

act that TSD-1 utilizes a lower value of the significance level of BEssessment: 2.80% vs. 2.94% for TSD-1 and TSD-2, respectively.

Cui et al. have stated that inflation of type I error may occur whenample size increases are based on the results of interim analysesCui et al., 1999). In our case, both TSD-1 and TSD-2 approachesely on the CVw and GMR values observed in stage I of the study.owever, no inflation of type I error was noticed (Tables 1 and 2).he underlying reason refers to the use of an upper sample sizeimit. The lower the value of the UL, the more conservative eachSD approach becomes. A value of UL = 500 was used in these anal-ses which is a rather large sample size value in case of BE studies.owever, even in this case no inflation of type I error was appar-nt. If no UL value was set or an extremely high (but unrealistic) ULas assumed (e.g., 10,000), then possibly inflation of type I error

ould have been observed. It is also worth mentioned that the pro-osed TSDs consisted of only one interim analysis; if more interimnalyses were used, then a further increase of type I error wouldecome apparent. In any case, the use of a predefined UL controlshe increase of type I error. In case of realistic conditions and whenne refers to two-stage designs, which are only allowed in BE stud-es, the inclusion of a UL leads to desired results. Obviously, no ULs necessary when sample size re-estimation is not based on thenterim results (Cui et al., 1999; Potvin et al., 2008; Montague et al.,012)

An also interesting task is trying unveiling the role of thether factors in the rate of type I error. As within-subject vari-bility increases, lower values of type I error rates are observedTables 1 and 2). This effect of variability has a significant impactn the results. For example, in case of TSD-2 when CV = 20% the type

error value declines from 3.95% to 0.09% with a starting sampleize equal to 12 and UL = 62.

In the same context, the impact of starting sample size onhe % type I error rate was assessed. In this case, two differ-nt performances were noticed. When within-subject variabilitys low or medium (e.g. CVw = 20%), then increases of N1 leado lower values of type I error rate. However, as N1 rises thenor high or very high variabilities (e.g., CVw = 40% or CVw = 60%),he type I error values show an increase in most of the casesTables 1 and 2). This behavior was apparent for both TSD-1 andSD-2 methods and it was observed for all UL values examined inhis study.

The impact of the upper sample size limit on the percentagef bioequivalence acceptance of each TSD method was also exam-ned for a variety of GMR values of the bioequivalence metric. Fig. 2llustrates the % BE acceptance of the TSD-1 approach as a func-ion of the theoretical GMR (test/reference). The results for two ULalues are depicted: UL = 62 and UL = 150. The coefficient of varia-ion is equal to 20% and 40%, while two levels of starting sampleize are used: N1 = 24 and N1 = 48. Visual inspection of Fig. 2 revealshat as UL values increase, then the % probability of declaring BEecomes higher. The discrepancy in the performance between a ULf 62 and 150 is more pronounced when a low starting sample sizes used (Fig. 2A and B). Also, the increase of variability results inower % BE acceptances for either UL = 150 or UL = 62. However, theecrease in % BE acceptance is much more influenced for low ULalues.

Fig. 3 illustrates the % BE acceptance of TSD-2 versus the GMR ofhe study for the same conditions as those quoted in Fig. 2. In casef TSD-2 almost identical results to those quoted above for TSD-

were observed. The only difference relies on the slightly higher BE acceptances for TSD-2; however, this issue cannot be easilyetected by visual comparison of Figs. 2 and 3.

In order to further verify the role of upper sample size limit inwo-stage BE designs, the situation depicted in Fig. 4 is examined.n this case, the TSD-2 design is applied in two different ways; aigh N1 (72) with equal UL (72) is contrasted against a scenario of

variability (CVw) for the initial population was 20%.

low N1 (48) with high UL (150). Plausibly, setting N1 equal to ULin the first case implies that no sample size re-estimation at stageII could take place. In essence, this is a hypothetical scenario sincenone would design a TSD where stage II will never be reached. How-ever, this scenario is suitable for the purposes of this investigation,since it allows the concomitant comparison of UL and N1. Fig. 4clearly reveals that in both cases almost the same % BE acceptancesoccur even though a quite different N1 is employed. If higher ULwas assumed when N1 = 48, then the TSD approach would be morepermissive.

The level of upper sample size also affects the number of timesBE is assessed at stage II where sample size re-estimation takesplace. The value of N1 represents only the starting sample size atstage I and if BE assessment is transferred to stage II, then plau-sibly the total sample size will be increased. As more times BE isevaluated at stage II, the total sample size, N, is increased. In orderto assess the impact of UL on the magnitude of N, Fig. 5 was con-structed which depicts the mean total sample size as a functionof the theoretical GMR of the stage I study. Two UL values wereassumed 62 and 150, while CVw was 20% and 40%. The startingsample size took values equal to 24 and 48.

A first look at Fig. 5 clearly reveals an expected result; as ULincreases, more subjects are recruited during stage II. Plausibly, ifUL was set to a higher value (e.g. 500), a greater number of subjectswould have been allowed to be recruited in Stage II. In any case,this effect is more pronounced for highly variable drugs (Fig. 5B andD). In case of the low UL (i.e. UL = 62) total sample size rises onlyslightly when N1 = 24, where a number of 2–3 subjects are addi-tionally recruited. When N1 = 48, no rise at all can be observed withthe UL limit of 62. However, many more subjects can be recruitedwhen UL is set to 150. In this case, two different performances wereobserved depending on the assumed within-subject variability. IfCVw is 20%, the estimate of N shows a maximum for GMR val-ues in the region of 1.15–1.20 and declines as GMR becomes eitherhigher or lower. Besides, when CVw = 40%, the mean total samplesize shows its highest value initially at GMR close to unity and then

afterwards N declines toward the value of N1.

Similar results, with those depicted in Fig. 5, were also observedwhen the mean total sample size of the TSD-2 approach was plottedversus the theoretical GMR of the stage I study (data not shown).

Page 7: The role of the upper sample size limit in two-stage bioequivalence designs

V. Karalis / International Journal of Pharmaceutics 456 (2013) 87– 94 93

Fig. 5. Mean total sample size (N) as a function of the theoretical GMR of the stage I study. Upper sample size (UL) was either set to 62 or to 150. The coefficient of variation oft t colum2

4

ppupfi

((

(

(

rnGsFU

he within-subject variability (CVw) for the initial population was equal to 20% (lef4 and 48.

. Conclusions

The purpose of this study was to unveil the role of setting aredefined sample size limit in two stage clinical designs and inarticular BE studies. Two different TSD methods were investigatednder different conditions of within-subject variability, upper sam-le size limits, and different starting number of subjects. The majorndings of this study are the following:

a) No inflation of type I error is observed in the simulated studies.b) As the UL value increases, the type I error rate also rises.

(c) The starting sample size and the variability of the study affectthe level of type I error.

d) As UL values increases, then the % probability of declaring BEbecomes higher.

e) Increases in the UL levels lead to a higher mean total samplesize of the TSD. This effect is more pronounced in case of highlyvariable drugs.

Two-stage clinical designs which are entirely based on interimesults are more realistic. The TSD methods analyzed in this studyot only rely on the variability of stage I, but also on the observed

MR difference. It is justified in this study that setting an upperample size limit is necessary to avoid inflation of type I error.inally, typical examples are presented for the choice of a suitableL.

n) and 40% (right column). Two levels of starting sample sizes N1, were assumed:

Acknowledgement

I am grateful to Professor Panos Macheras for his perceptivecomments during the preparation of this manuscript.

References

Chen, M., Shah, V., Patnaik, R., Adams, W., Hussain, A., Conner, D., Mehta, M., Mali-nowski, H., Lazor, J., Huang, S., Hare, D., Lesko, L., Sporn, D., Williams, R., 2001.Bioavailability and bioequivalence: an FDA regulatory overview. Pharm. Res. 18,1645–1650.

Chow, S., Chang, M., 2008. Adaptive design methods in clinical trials – a review.Orphanet J. Rare Dis. 3, 1–13.

Cui, L., Hung, J., Wang, S., 1999. Modification of sample size in group sequentialclinical trials. Biometrics 55, 853–857.

Dragalin, V., 2006. Adaptive designs: terminology and classification. Drug Inform. J.40, 425–435.

EMA (European Medicines Agency), 2010. Committee for Medicinal Products forHuman Use, CHMP. Guideline on the Investigation of Bioequivalence, London.

EMA (European Medicines Agency), 2013. Committee for Medicinal Products forHuman Use, CHMP. Questions & Answers: Positions on Specific QuestionsAddressed to the Pharmacokinetics Working Party (EMA/618604/2008 Rev. 7).

Emerson, S., Fleming, T., 2010. Adaptive methods: telling the rest of the story. J.Biopharm. Stat. 20, 1150–1165.

FDA (Food and Drug Administration), 2001. Center for Drug Evaluation and Research(CDER), Statistical Approaches to Establishing Bioequivalence, Rockville, MD.

FDA (Food and Drug Administration), 2003. Center for Drug Evaluation and Research(CDER), Bioavailability and Bioequivalence Studies for Orally Administered Drug

Products. General Considerations, Rockville, MD.

FDA (Food and Drug Administration), 2012. Draft Guidance onDexamethasone–Tobramycin (Rev. June 2012).

Fuglsang, A., 2013. Sequential bioequivalence trial designs with increased power andcontrolled type I error rates. AAPS J. (accessed 17.06.13; epub ahead of print).

Page 8: The role of the upper sample size limit in two-stage bioequivalence designs

9 l of Ph

G

H

J

K

K

M

M

N

4 V. Karalis / International Journa

allo, P., Chuang-Stein, C., Dragalin, V., Gaydos, B., Krams, M., Pinheiro, J., 2006.Adaptive designs in clinical drug development – an Executive Summary of thePhRMA Working Group. J. Biopharm. Stat. 16, 275–283.

ealth Canada, 2012. Ministry of Health, Health Products and Food Branch. GuidanceDocument Conduct and Analysis of Comparative Bioavailability Studies.

ulious, S., 2004. Sample sizes for clinical trials with normal data. Stat. Med. 23,1921–1986.

aralis, V., Macheras, P., 2013. An insight into the properties of a two-stage designin bioequivalence studies. Pharm. Res. 30, 1824–1835.

aralis, V., Symillides, M., Macheras, P., 2004. Novel scaled average bioequiva-lence limits based on GMR and variability considerations. Pharm. Res. 21,1933–1942.

ehta, C., Pocock, S., 2011. Adaptive increase in sample size when interim resultsare promising: a practical guide with examples. Stat. Med. 30, 3267–3284.

ontague, T., Potvin, D., DiLiberti, C., Hauck, W., Parr, A., Schuirmann, D., 2012. Addi-tional results for ‘Sequential design approaches for bioequivalence studies withcrossover designs’. Pharm. Stat. 11, 8–13.

IPHJ (National Institute of Public Health of Japan), 2012. Division of Drugs. Guide-line for Bioequivalence Studies of Generic Products.

armaceutics 456 (2013) 87– 94

Pocock, S., 1977. Group sequential methods in the design and analysis of clinicaltrials. Biometrika 64, 191–199.

Pong, A., Chow, S., 2011. Handbook of Adaptive Designs in Pharmaceutical and Clin-ical Development. CRC Press/Taylor and Francis Group, Boca Raton, FL.

Potvin, D., DiLiberti, C., Hauck, W., Parr, A., Schuirmann, D., Smith, R., 2008. Sequentialdesign approaches for bioequivalence studies with crossover designs. Pharm.Stat. 7, 245–262.

Schuirmann, D., 1987. A comparison of the two one-sided tests procedure and thepower approach for assessing the equivalence of average bioavailability. J. Phar-macokinet. Biopharm. 15, 657–680.

Tothfalusi, L., Endrenyi, L., 2003. Limits for the scaled average bioequivalence ofhighly variable drugs and drug products. Pharm. Res. 20, 382–389.

Tothfalusi, L., Endrenyi, L., Midha, K., Rawson, M., Hubbard, J., 2001. Evaluation ofthe bioequivalence of highly-variable drugs and drug products. Pharm. Res. 18,

728–733.

WHO (World Health Organization), 2006. Expert Committee on Specifications forPharmaceutical Preparations. 40th Report, Annex 7, Regulatory Guidance onInterchangeability for Multisource (Generic) Pharmaceutical Products. WHOTechnical Report 937., pp. 347–390, Geneva.