Download - When is Small Beautiful?

When is Small Beautiful?

Richard Simon, D.Sc.Chief, Biometric Research Branch

National Cancer Institutehttp://brb.nci.nih.gov

For Demonstrating a Large Treatment Effect

2

21 12

2

22

1

ˆ ,

=true treatment effect; average difference in outcome between treatment groups

1/

For normal endpoint with inter-patient variance in outcome

2

so 2

N

z z

nz z

n

21

When the Size of the Treatment Effect is Large Relative to Inter-

patient Variability

2

21 12

1 2

2

1 12 1 1 2 2

1 1 2 2

ˆ ,

1/

For binomial endpoint with =p p

so

N

z z

z zp q p q nn p q p q

2

21 12

21 12

ˆ ,

1/

For survival endpoint with =log HR

4 so events2

N

z z

z zevents

=.05, z1-=1.96

• =.10, z1-=1.28• HR=0.67, =log(.67)=.40, Events=263• HR=0.5, =log(.5)=.69, Events=88

Clinical Trials Show Small Treatment Effects Because

(choose one)

1. Treatments are minimally effective uniformly across patients

2. Ineffectiveness of treatments for most patients dilutes average effects

Using phase II data, develop predictor of response to new drugDevelop Predictor of Response to New Drug

Patient Predicted Responsive

New Drug Control

Patient Predicted Non-Responsive

Off Study

Evaluating the Efficiency of Targeting Clinical Trials to Best Candidates

• Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006

• Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005.

• reprints and interactive sample size calculations at http://linus.nci.nih.gov

• Relative efficiency of targeted design depends on – proportion of patients test positive– effectiveness of new drug (compared to control) for

test negative patients• When less than half of patients are test positive

and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients

• The targeted design may require fewer or more screened patients than the standard design

Treatment Hazard Ratio for Marker Positive Patients

Number of Events for Targeted Design

Number of Events for Traditional Design

Percent of Patients Marker Positive

20% 33% 50%

0.5 74 2040 720 316

Comparison of Targeted to Untargeted DesignSimon R, Development and Validation of Biomarker Classifiers for Treatment Selection, JSPI

TrastuzumabHerceptin

• Metastatic breast cancer• 234 randomized patients per arm• 90% power for 13.5% improvement in 1-year

survival over 67% baseline at 2-sided .05 level• If benefit were limited to the 25% assay +

patients, overall improvement in survival would have been 3.375%– 4025 patients/arm would have been required

Small is Beautiful

• When treatment effect can be measured with precision on individual patients– Little placebo effect– Comparative treatment effect not of interest

Small is Beautiful

• When there is substantial prior information about the effect of the treatment compared to control

Frequentist Meta-Analysis of Two Trials of the Same Treatment

1 11 1 2 2 1 2 1 2ˆ ˆPr[ / and / | 0]z z

• Random effects meta-analysis tests whether hypothetical distribution F from which 1 and 1 are drawn has mean zero.

• With only two-trials, random effects meta-analysis does not have any information on variance of F and so no meaningful combined inference is possible

Principles of Bayesian Analysis

• Evidence from data for a hypothesis should be based upon the likelihood of the actual data given the hypothesis, not upon the probability of data “as extreme”

• Evidence from data for a hypothesis should be modulated by the prior probability of the hypothesis

Pr[ | 0]Pr[ 0]Pr[ 0 | ]Pr[ | ]Pr[ ]

DDD x x dx

Bayes Theorem

Specifying Prior Distributions

• Non-informative• Elicit opinion• Skeptical/optimistic• Past data• Community concensus

Frequentist Methods are in many cases equivalent to

Bayesian Methods Based on “Non-informative” Prior

Distributions

“Non-informative” Prior Distributions are Sometimes

Extreme and Unrealistic

Fallacies about Bayesian Methods

• Require smaller sample sizes• Require less planning• Are preferable for most problems in clinical

trials• Have been limited in application primarily

by computing problems

Facts About Bayesian Methods

• Require careful selection of prior distributions

• Are valuable for some problems in clinical trials

Simple Bayesian Model

2

treatment effectˆ = mleˆ | ~N( , )Pr[ 0] 1Pr[ ] ( )f

1

ˆPr[ 0 | ]

ˆ( )

1 ˆ1

f d

Bayesian Analysis May Be More Conservative Than Frequentist Analysis

• Two hypotheses =0 and = 1

• Trial data ˆ /

1

1 1

0

ˆPr[ / | ] Pr[ ]ˆPr[ 0 | / ] 1 ˆ Pr[ ]Pr[ / | 0]

• If trial is designed for power and results are just significant at level then

(Simon, Statistical Science 15:103-105, 2000)

1

1

0

( ) Pr[ ]ˆPr[ 0 | / ] 1( ) Pr[ ]z

zz

1

1

For =.025, =.10(z ) .1758 3( ) .0584

Pr[ ]ˆPr[ 0 | / 1.96] 1 3Pr[ 0]

z

Bayesian Posterior Probability of Null Hypothesis When Trial Results are Just

SignificantPrior Probability =0 Posterior Probability =0

0.75 0.5

0.5 0.25

0.25 0.1

1

ˆPr[ 0 | / ]

(0)1ˆ1

for flat ( ).f

1

1

1

ˆPr[ 0 | / ]

(0)11

.39911 .025

1 161

z

Bayesian Posterior Probability of Null Hypothesis When Trial Results are Just Significant

(flat prior under alternative)Prior Probability =0 Posterior Probability =0

0.75 0.158

0.5 0.059

0.25 0.020

Small May Be Beautiful For

• Randomized phase II study comparing a new regimen to control– Objective to obtain unbiased estimate; better than

using historical control– Phase II endpoint may provide more events than

phase III endpoint and therefore a smaller trial– Phase II endpoint may permit more sensitive estimate

of treatment effect but not be a suitable phase III endpoint

• Partial surrogate endpoint

– Phase II study can be sized based on inflated

Randomized Phase II Design Comparing Vaccine Regimen to Control

= 0.10 type 1 error rate• Endpoint PFS• Detect large treatment effect• E.g. Power 0.8 for detecting 40% reduction in 12

month median time to recurrence with =0.10 requires 44 patients per arm with all patients followed to progression

• Two vaccine regimens can share one control group in a 3 arm randomized trial

Small May Be Beautiful

• When the objective is to select the most promising regimen from a set of candidates– May or may not contain control arm– Null hypothesis is never tested– All candidate regimens should be equal with

regard to endpoints other than the one used as the basis for selection

Randomized Phase II Multiple-Arm Designs Using Immunological

Response

• Randomized selection design to select most promising regimen for further evaluation. 90% probability of selecting best regimen if it’s mean response is at least standard deviations above the next best regimen

Number of Patients Per Arm for Randomized Selection Design

PCS = 90%Number of treatment

arms = 0.5 = 0.75 = 1.0

2 13 6 4

3 21 9 6

4 24 11 6

5 27 13 7

6 30 14 8

7 31 14 8

8 35 15 9

Patients per Arm for 2-arm Randomized Selection Design Assures Correct Selection When True Response Probabilites Differ by

10%

Response Probability of Inferior Rx

85% Probability of Correct Selection

90% Probability of Correct Selection

5% 20 29

10% 28 42

20% 41 62

40% 54 82

Randomized Selection Design With Binary Endpoint

• K treatment arms• n patients per arm• Select arm with highest observed response

rate• pi = true response probability for i’th arm

• pi = pgood with probability , otherwise pbad

• With N total patients, determine K and n to maximize probability of finding a good rx

Probability of Selecting a Good Treatment When pbad=0.1, pgood=0.5 and =0.1

n K Probability

5 20 0.626

10 10 0.590

15 7 0.511

20 5 0.414

25 4 0.344


n K Probability

5 20 0.319

10 10 0.375

15 7 0.383

20 5 0.341

25 4 0.309


n K Probability

5 20 0.615

10 10 0.708

15 7 0.717

20 5 0.673

25 4 0.642


n K Probability

5 20 0.45

10 10 0.52

15 7 0.52

20 5 0.47

25 4 0.44


n K Probability

5 20 0.29

10 10 0.37

15 7 0.39

20 5 0.37

25 4 0.36

Small May Be Beautiful

• When the objective is to effectively treat the largest number of patients when the population of patients is small and several good candidate treatments are available

• N patients in horizon• 2 treatments• Perform RCT with n pts per rx• Select treatment with best observed

response rate and use that treatment for the remaining N-2n patients– Binary endpoint with unknown response

probabilities p1 and p2

Approximate Total Number of Responses

1 2 1 2

1 2 2

( ) ( ) ( 2 )[ Pr(select 1)+p {1 Pr(select 1)}]

Pr[select 1] {( ) }n

T n p p n N n p

p p

N=1000, p1=0.6, p2 =0.4

N=200, p1=0.6, p2 =0.4

Conclusions

• In clinical trial sizing, small is often not beautiful. It is often uninformative, duplicative and results in misleading results.

• In some cases however, small is appropriate and valuable.

• Having clear objectives is essential to properly sizing a clinical trial.