When is Small Beautiful?
Richard Simon, D.Sc.Chief, Biometric Research Branch
National Cancer Institutehttp://brb.nci.nih.gov
For Demonstrating a Large Treatment Effect
2
21 12
2
22
1
ˆ ,
=true treatment effect; average difference in outcome between treatment groups
1/
For normal endpoint with inter-patient variance in outcome
2
so 2
N
z z
nz z
n
21
When the Size of the Treatment Effect is Large Relative to Inter-
patient Variability
2
21 12
1 2
2
1 12 1 1 2 2
1 1 2 2
ˆ ,
1/
For binomial endpoint with =p p
so
N
z z
z zp q p q nn p q p q
2
21 12
21 12
ˆ ,
1/
For survival endpoint with =log HR
4 so events2
N
z z
z zevents
=.05, z1-=1.96
• =.10, z1-=1.28• HR=0.67, =log(.67)=.40, Events=263• HR=0.5, =log(.5)=.69, Events=88
Clinical Trials Show Small Treatment Effects Because
(choose one)
1. Treatments are minimally effective uniformly across patients
2. Ineffectiveness of treatments for most patients dilutes average effects
Using phase II data, develop predictor of response to new drugDevelop Predictor of Response to New Drug
Patient Predicted Responsive
New Drug Control
Patient Predicted Non-Responsive
Off Study
Evaluating the Efficiency of Targeting Clinical Trials to Best Candidates
• Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006
• Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005.
• reprints and interactive sample size calculations at http://linus.nci.nih.gov
• Relative efficiency of targeted design depends on – proportion of patients test positive– effectiveness of new drug (compared to control) for
test negative patients• When less than half of patients are test positive
and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients
• The targeted design may require fewer or more screened patients than the standard design
Treatment Hazard Ratio for Marker Positive Patients
Number of Events for Targeted Design
Number of Events for Traditional Design
Percent of Patients Marker Positive
20% 33% 50%
0.5 74 2040 720 316
Comparison of Targeted to Untargeted DesignSimon R, Development and Validation of Biomarker Classifiers for Treatment Selection, JSPI
TrastuzumabHerceptin
• Metastatic breast cancer• 234 randomized patients per arm• 90% power for 13.5% improvement in 1-year
survival over 67% baseline at 2-sided .05 level• If benefit were limited to the 25% assay +
patients, overall improvement in survival would have been 3.375%– 4025 patients/arm would have been required
Small is Beautiful
• When treatment effect can be measured with precision on individual patients– Little placebo effect– Comparative treatment effect not of interest
Small is Beautiful
• When there is substantial prior information about the effect of the treatment compared to control
Frequentist Meta-Analysis of Two Trials of the Same Treatment
1 11 1 2 2 1 2 1 2ˆ ˆPr[ / and / | 0]z z
• Random effects meta-analysis tests whether hypothetical distribution F from which 1 and 1 are drawn has mean zero.
• With only two-trials, random effects meta-analysis does not have any information on variance of F and so no meaningful combined inference is possible
Principles of Bayesian Analysis
• Evidence from data for a hypothesis should be based upon the likelihood of the actual data given the hypothesis, not upon the probability of data “as extreme”
• Evidence from data for a hypothesis should be modulated by the prior probability of the hypothesis
Pr[ | 0]Pr[ 0]Pr[ 0 | ]Pr[ | ]Pr[ ]
DDD x x dx
Bayes Theorem
Specifying Prior Distributions
• Non-informative• Elicit opinion• Skeptical/optimistic• Past data• Community concensus
Frequentist Methods are in many cases equivalent to
Bayesian Methods Based on “Non-informative” Prior
Distributions
“Non-informative” Prior Distributions are Sometimes
Extreme and Unrealistic
Fallacies about Bayesian Methods
• Require smaller sample sizes• Require less planning• Are preferable for most problems in clinical
trials• Have been limited in application primarily
by computing problems
Facts About Bayesian Methods
• Require careful selection of prior distributions
• Are valuable for some problems in clinical trials
Simple Bayesian Model
2
treatment effectˆ = mleˆ | ~N( , )Pr[ 0] 1Pr[ ] ( )f
1
ˆPr[ 0 | ]
ˆ( )
1 ˆ1
f d
Bayesian Analysis May Be More Conservative Than Frequentist Analysis
• Two hypotheses =0 and = 1
• Trial data ˆ /
1
1 1
0
ˆPr[ / | ] Pr[ ]ˆPr[ 0 | / ] 1 ˆ Pr[ ]Pr[ / | 0]
• If trial is designed for power and results are just significant at level then
(Simon, Statistical Science 15:103-105, 2000)
1
1
0
( ) Pr[ ]ˆPr[ 0 | / ] 1( ) Pr[ ]z
zz
1
1
For =.025, =.10(z ) .1758 3( ) .0584
Pr[ ]ˆPr[ 0 | / 1.96] 1 3Pr[ 0]
z
Bayesian Posterior Probability of Null Hypothesis When Trial Results are Just
SignificantPrior Probability =0 Posterior Probability =0
0.75 0.5
0.5 0.25
0.25 0.1
1
ˆPr[ 0 | / ]
(0)1ˆ1
for flat ( ).f
1
1
1
ˆPr[ 0 | / ]
(0)11
.39911 .025
1 161
z
Bayesian Posterior Probability of Null Hypothesis When Trial Results are Just Significant
(flat prior under alternative)Prior Probability =0 Posterior Probability =0
0.75 0.158
0.5 0.059
0.25 0.020
Small May Be Beautiful For
• Randomized phase II study comparing a new regimen to control– Objective to obtain unbiased estimate; better than
using historical control– Phase II endpoint may provide more events than
phase III endpoint and therefore a smaller trial– Phase II endpoint may permit more sensitive estimate
of treatment effect but not be a suitable phase III endpoint
• Partial surrogate endpoint
– Phase II study can be sized based on inflated
Randomized Phase II Design Comparing Vaccine Regimen to Control
= 0.10 type 1 error rate• Endpoint PFS• Detect large treatment effect• E.g. Power 0.8 for detecting 40% reduction in 12
month median time to recurrence with =0.10 requires 44 patients per arm with all patients followed to progression
• Two vaccine regimens can share one control group in a 3 arm randomized trial
Small May Be Beautiful
• When the objective is to select the most promising regimen from a set of candidates– May or may not contain control arm– Null hypothesis is never tested– All candidate regimens should be equal with
regard to endpoints other than the one used as the basis for selection
Randomized Phase II Multiple-Arm Designs Using Immunological
Response
• Randomized selection design to select most promising regimen for further evaluation. 90% probability of selecting best regimen if it’s mean response is at least standard deviations above the next best regimen
Number of Patients Per Arm for Randomized Selection Design
PCS = 90%Number of treatment
arms = 0.5 = 0.75 = 1.0
2 13 6 4
3 21 9 6
4 24 11 6
5 27 13 7
6 30 14 8
7 31 14 8
8 35 15 9
Patients per Arm for 2-arm Randomized Selection Design Assures Correct Selection When True Response Probabilites Differ by
10%
Response Probability of Inferior Rx
85% Probability of Correct Selection
90% Probability of Correct Selection
5% 20 29
10% 28 42
20% 41 62
40% 54 82
Randomized Selection Design With Binary Endpoint
• K treatment arms• n patients per arm• Select arm with highest observed response
rate• pi = true response probability for i’th arm
• pi = pgood with probability , otherwise pbad
• With N total patients, determine K and n to maximize probability of finding a good rx
Probability of Selecting a Good Treatment When pbad=0.1, pgood=0.5 and =0.1
n K Probability
5 20 0.626
10 10 0.590
15 7 0.511
20 5 0.414
25 4 0.344
Probability of Selecting a Good Treatment When pbad=0.1, pgood=0.3 and =0.1
n K Probability
5 20 0.319
10 10 0.375
15 7 0.383
20 5 0.341
25 4 0.309
Probability of Selecting a Good Treatment When pbad=0.1, pgood=0.3 and =0.25
n K Probability
5 20 0.615
10 10 0.708
15 7 0.717
20 5 0.673
25 4 0.642
Probability of Selecting a Good Treatment When pbad=0.02, pgood=0.15 and =0.15
n K Probability
5 20 0.45
10 10 0.52
15 7 0.52
20 5 0.47
25 4 0.44
Probability of Selecting a Good Treatment When pbad=0.02, pgood=0.10 and =0.15
n K Probability
5 20 0.29
10 10 0.37
15 7 0.39
20 5 0.37
25 4 0.36
Small May Be Beautiful
• When the objective is to effectively treat the largest number of patients when the population of patients is small and several good candidate treatments are available
• N patients in horizon• 2 treatments• Perform RCT with n pts per rx• Select treatment with best observed
response rate and use that treatment for the remaining N-2n patients– Binary endpoint with unknown response
probabilities p1 and p2
Approximate Total Number of Responses
1 2 1 2
1 2 2
( ) ( ) ( 2 )[ Pr(select 1)+p {1 Pr(select 1)}]
Pr[select 1] {( ) }n
T n p p n N n p
p p
N=1000, p1=0.6, p2 =0.4
N=200, p1=0.6, p2 =0.4
Conclusions
• In clinical trial sizing, small is often not beautiful. It is often uninformative, duplicative and results in misleading results.
• In some cases however, small is appropriate and valuable.
• Having clear objectives is essential to properly sizing a clinical trial.
Top Related