D. Gillen, FDA Repro, Jan 23-24 1
Statistical Issues in Contraceptive Trials
Daniel L. Gillen, PhD
Department of Statistics
University of California, Irvine
FDA Reproductive Drugs Advisory Committee Meeting, Jan 23-24
D. Gillen, FDA Repro, Jan 23-24 2
Minimum requirements of a clinical trial
Appropriate target population
Use of appropriate comparison groups
Use of appropriate outcome measure
Ability to maintain statistical criteria for evidence Controlling type I and II errors in the Frequentist setting
D. Gillen, FDA Repro, Jan 23-24 3
Outline
Outcome measures Pearl Index vs. life-table methods
Comparison populations Historical vs. active control trials
Defining statistical evidence Testing for superiority vs. non-inferiority
D. Gillen, FDA Repro, Jan 23-24 4
Outcome Measures:Pearl Index vs. Life Table Methods
D. Gillen, FDA Repro, Jan 23-24 5
The Pearl Index
The Pearl Index (number of pregnancies per 100 woman years) is a common measure used to summarize contraceptive effectiveness
However, a drawback of the Pearl Index is that in most situations it is dependent on time and must be interpreted accordingly Such dependence occurs because of the changing
baseline risk of pregnancy within study samples as time marches forward
D. Gillen, FDA Repro, Jan 23-24 6
Ex: Sensitivity of Pearl Index to duration of follow-up
Suppose our study population consists of two groups
“Low risk” group (90% of population): Constant risk of pregnancy 1 year probability of pregnancy is 5%
“High risk” group (10% of population): Constant risk of pregnancy 1 year probability of pregnancy is 50%
D. Gillen, FDA Repro, Jan 23-24 7
Ex (cont’d): One-year Pearl Index
Now consider the Pearl Index calculated over the first year
Expected number of pregnancies 5000*(0.90*0.05 + 0.10*0.50) = 475
Expected person-years at risk with censoring for pregnancy 4525*1 + 475*.5 = 4762.5
Pearl Index (475 / 4762.5)*100 = 9.97 pregnancies per 100 per year
D. Gillen, FDA Repro, Jan 23-24 8
Ex (cont’d): Two-year Pearl Index
For the Pearl Index calculated over 2 years, we need to consider the impact of censoring the “high risk” group at pregnancy
By the end of one year
Number left in low risk group: 5000*0.90*(1-0.05) = 4275 Number left in high risk group: 5000*0.10*(1-0.50) = 250 Percent of total population in high risk group at one year is
250/4275 = 5.8%
D. Gillen, FDA Repro, Jan 23-24 9
Ex (cont’d): Two-year Pearl Index
Now consider the Pearl Index calculated between years 1 and 2
Expected number of pregnancies occurring between 1 and 2 years of follow-up 4525*(0.942*0.05 + 0.058*0.50) = 344.4
Expected person-years at risk between year 1 and year 2 4180.6*1 + 344.4*.5 = 4352.8 person-years
Pearl Index calculated between years 1 and 2 (344.4 / 4352.8)*100 = 7.92 pregnancies per 100 per year
D. Gillen, FDA Repro, Jan 23-24 10
Ex (cont’d): Two-year Pearl Index
Now consider the Pearl Index calculated over 2 years
Expected number of pregnancies observed over 2 years 475 + 344.4 = 819.4
Expected person-years at risk over 2 years 4762.5 + 4352.8 = 9115.3 person-years
Pearl Index calculated over 2 years (819.4 / 9115.3)*100 = 8.99 pregnancies per 100 per year
D. Gillen, FDA Repro, Jan 23-24 11
When is the Pearl Index independent of study support?
The Pearl Index will change with the length of follow-up unless:
1. The rate of pregnancies is homogeneous across all possible subgroups
2. This rate remains constant with time
D. Gillen, FDA Repro, Jan 23-24 12
When is the Pearl Index independent of study support?
In the previous example, it should be noted that even if we allow participants with failures to re-enter the risk set the Pearl Index will still depend upon time
This is because a failure results in less at-risk time, thus total years of follow-up will be proportionately less in the “high risk” group as duration of maximal follow-up increases
D. Gillen, FDA Repro, Jan 23-24 13
A further issue in quantifying the Pearl Index… Most confidence intervals for the Pearl Index assume a
Poisson Distribution This distribution is defined as having variance equal to the
mean (or rate)
However, count or rate data is typically characterized as stemming from an overdispersed Poisson distribution
That is, the true variance in the rate that we observe is more that we assume from the Poisson distribution
Overdispersion in Poisson rates typically arises from heterogeneity of patient populations
D. Gillen, FDA Repro, Jan 23-24 14
Computation of confidence intervals for the Pearl Index
Consider our previous example with a “low risk” and a “high risk” group
Low risk group (90% of population): Constant risk of pregnancy 1 year probability of pregnancy is 5%
High risk group (10% of population): Constant risk of pregnancy 1 year probability of pregnancy is 50%
D. Gillen, FDA Repro, Jan 23-24 15
Computation of confidence intervals for the Pearl Index We previously calculated the (true) 1 year Pearl Index
to be 9.97 pregnancies per 100 per year
Suppose that in reality, we observed 457 pregnancies over 1 year with a total of 4763 years of followup, resulting in a Pearl Index of 9.60 per 100 per year
Assuming a Poisson distribution the corresponding 95% confidence interval for the 1 year Pearl Index would be (8.73, 10.51)
D. Gillen, FDA Repro, Jan 23-24 16
Computation of confidence intervals for the Pearl Index However, because the Pearl Index is really composed of a
mixture of Poisson distributions (from the high and low risk groups) the true variance is actually 19.2% larger than assumed by the usual (single) Poisson model
This means that we have underestimated the variance, ie. Our confidence interval is shorter than it should be!
In this case, a 95% confidence interval accounting for the heterogeneity of groups is (8.63, 10.55).
This is approximately 8% wider than the previous interval
D. Gillen, FDA Repro, Jan 23-24 17
How to deal with the changing composition of the risk set?
We illustrated one way in our example
Consider the probability of failure at specific time points by using conditional probability
For example, if T is the time of failure we can compute the probability of failure within two years as
Pr[T<2] = 1-Pr[T>2] = 1 - Pr[T>2|T>1]Pr[T>1] = 1-(1-0.0792)*(1-0.0997) = 0.171
D. Gillen, FDA Repro, Jan 23-24 18
How to deal with the changing composition of the risk set?
This is called a life-table estimate In the setting of contraceptive failure, these
conditional probabilities are typically computed monthly to more accurately incorporate the risk set (see eg. Potter, 1966)
When the life-table estimate is evaluated at all (distinct) failure times, this is called a Kaplan-Meier estimate.
D. Gillen, FDA Repro, Jan 23-24 19
Are there any benefits of to using the Pearl Index? Clearly, the Pearl Index has been in wide use
The reasons for this are
Ease of interpretation Although the Kaplan-Meier estimator also has a clinically
relevant interpretation (probability of failure over T years of use)
For historically controlled trials, there is a great deal of data summarized in terms of the Pearl Index
This will, of course, change as the popularity of Kaplan-Meier estimates grow in the field
D. Gillen, FDA Repro, Jan 23-24 20
Can we incorporate changing treatment regiments?
Patients may discontinue use or use additional contraceptives for some intervals of time
Technically, the Kaplan-Meier estimator could incorporate such left and right censoring.
However, it is not clear when patients should re-enter the risk set
D. Gillen, FDA Repro, Jan 23-24 21
Can we incorporate changing treatment regiments? For example, consider the case where a participant uses back-up
contraception during the interval (t1, t2).
This individual could be considered at risk for the interval (0, t1) then re-entered into the risk set at time t2.
However, by doing this we are implicitly making the assumption that this person’s hazard (or risk of pregnancy) at time t2 is the same as all others who have been at risk from (0, t2)
This is not a reasonable assumption to me and I would advise against it
D. Gillen, FDA Repro, Jan 23-24 22
Can we incorporate changing treatment regiments?
Another option for incorporating changing treatment regiments would come from post-hoc analyses
Stratified Kaplan-Meier estimates Number of strata could become large
Time-dependent covariates Eg. Consider a proportional hazards framework
D. Gillen, FDA Repro, Jan 23-24 23
Regardless of the measure, what defines a failure and who is at risk?
For all new interventions we must consider:
Safety: Are there adverse effects that clearly outweigh any potential benefit?
Efficacy: Can the intervention reduce the probability of unintended pregnancy in a beneficial way?
Effectiveness: Would adoption of the intervention as a standard reduce the probability of unintended pregnancy in the population?
D. Gillen, FDA Repro, Jan 23-24 24
One difference between evaluation of efficacy and effectiveness is in what defines a failure and who should be included in the risk set
In a clinical trial setting we can truly only evaluate efficacy because of possible selection bias of patients entering contraceptive trials
However, even in the clinical trial setting it is useful to evaluate
Intervention failure rates during actual use (including inconsistent or incorrect use)
Intervention failure rates during perfect use (see eg. Trussell, Contraception, 2004)
Regardless of the measure, what defines a failure and who is at risk?
D. Gillen, FDA Repro, Jan 23-24 25
To assess true method efficacy, counting only “method failures” during perfect use, we must only include perfect use exposure patients in the risk set
Also, need to consider if those who are lost to follow-up should be considered at risk all the way up to the time of drop-out
One reasonable approach is to censor patients three months prior to the time at which they become lost to follow-up (Trussell, SIM, 1991)
Regardless of the measure, what defines a failure and who is at risk?
D. Gillen, FDA Repro, Jan 23-24 26
Historical vs. Active Control Trials
D. Gillen, FDA Repro, Jan 23-24 27
Historical control trials vs. active control trials
In the past many methods have been assessed via a historical control trial
Eg. Criteria such as a Pearl Index of 1.5 (or more recently 2) or less has been used an efficacy criteria
Such criteria stems from the experience of historical controls
However, biases resulting from historical control studies can be numerous. Particularly when study samples are not comparable with respect to baseline risk, evaluative measure of outcome, or duration of study.
D. Gillen, FDA Repro, Jan 23-24 28
Criteria for superiority in historical control trials As noted, past studies have considered point estimates of
the (one year) Pearl Index of less than 1.5 or 2 unintended pregnancies per 100 per year
However, we must also acknowledge uncertainty of these estimates
EMEA requires sufficient sample size to guarantee the width of the 95% CI for the Pearl Index to be no larger than 1
Better (in my opinion) to require that upper bound of CI is less than the chosen threshold
In either case, if the Pearl Index is used the previous notes on computation of the CI need to be considered
D. Gillen, FDA Repro, Jan 23-24 29
Historical control trials vs. active control trials
Because it is impossible to guarantee comparability between historical controls and current study samples, it is almost always advantageous to employ randomization when ethically feasible
Given a wide use of standard contraceptives, it is not feasible to consider a placebo controlled trial
However, one can (and should) consider the use of an active control when comparable interventions are in use
Also allows for comparison of entire survival curve (logrank test or proportional hazards model?)
D. Gillen, FDA Repro, Jan 23-24 30
Superiority vs. Non-Inferiority in Active Control Trials
D. Gillen, FDA Repro, Jan 23-24 31
Superiority vs. non-inferiority in active control trials Statistical criteria for evidence in a superiority trial
Evidence to rule out equality of effect as measured by the chosen parameter (eg. Pearl Index, 1-year survival estimate, or a hazard ratio)
Example: Contrast may be difference in 1-year failure rates as measured
by the Kaplan-Meier estimator KMTx(1) - KMAC(1) Test: H0: KMTx(1) - KMAC(1) 0
Vs. H1: KMTx(1) - KMAC(1) < 0
Rejection of null hypothesis corresponds to upper bound of CI for KMTx(1) - KMAC(1) being less than 0
D. Gillen, FDA Repro, Jan 23-24 32
Superiority vs. non-inferiority in active control trials Statistical criteria for evidence in a non-inferiority trial
Evidence to rule out some margin of efficacy less than the active control
Example:
Contrast may be difference in 1-year failure rates as measured by the Kaplan-Meier estimator
KMTx(1) - KMAC(1) Test: H0: KMTx(1) - KMAC(1)
Vs. H1: KMTx(1) - KMAC(1) < for some > 0
Rejection of null hypothesis corresponds to upper bound of CI for KMTx(1) - KMAC(1) being less than
D. Gillen, FDA Repro, Jan 23-24 33
Superiority vs. non-inferiority in active control trials
When is it reasonable to consider non-inferiority instead of superiority?
ICH E-10 Guidelines Active control treatment must truly be active in the study
population
If active control is truly active in the study population Can a margin to define non-ineferiority be established? If active control is standard of care, is new treatment also
superior on secondary endpoints?
D. Gillen, FDA Repro, Jan 23-24 34
Superiority vs. non-inferiority in active control trials
Issues in setting the non-inferiority “margin”?
What measure compares distributions?
Is the treatment effect random?
How much of a decrease in effect is acceptable?
How to account for variability in the estimate(s) from historical trials?
D. Gillen, FDA Repro, Jan 23-24 35
Superiority vs. non-inferiority in active control trials
Precedence for setting the non-inferiority “margin”
Is the treatment effect random? Ideally use meta-analysis of multiple trials Careful! Do trials have same duration of follow-up?
How much of a decrease in effect is acceptable? 10%, 20%, 50% of active control effect?
How to account for variability in the estimate(s) from historical trials?
Use worst case from historical 95% CI? Explicitly account for variability in historical trial
D. Gillen, FDA Repro, Jan 23-24 36
Summary
D. Gillen, FDA Repro, Jan 23-24 37
Summary Need to define appropriate target population, comparison
group, outcome measure, and maintain statistical criteria for evidence
Pearl Index is (usually) implicitly dependent on the length of follow-up, whereas Kaplan-Meier (life table) estimates make this dependence explicit
In either case, we need to obtain correct inference (CI’s) and the definition of the risk set must correspond to the definition of failure
When ethically and logistically possible, active controls should be used
If historical controls are used, uncertainty should be accounted for in defining superiority criteria
Top Related