Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD...

25
Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc

Transcript of Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD...

Page 1: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies

Michael Gaffney, PhDPfizer Inc

Page 2: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Outline

General description of PRECISION

Three issues in planning PRECISION Issues and design modifications during the conduct of PRECISION

Potential issues in the analysis of PRECISION

EAGLES Study

Personal Perspective/Points of Interest

Page 3: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

PRECISION

2005 FDA Advisory Committee meeting on CV risk of COX-2 NSAIDS

FDA mandated study - Commitment by Pfizer to FDA

Funded by Pfizer

Independent Executive CommitteePrincipal Investigator and Study Chair- Steven Nissen, MD – Cleveland Clinic

Independent Data Monitoring CommitteeChair - Thomas R. Fleming, PhD – University of Washington

Experts in Cardiology, Rheumatology and Gastroenterology on both committees

Page 4: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

PRECISION Design

Primary Objective To assess the relative cardiovascular effects of celecoxib, ibuprofen and naproxen in the treatment of osteo and rheumatoid arthritis

Primary Endpoint Time to first occurrence of the composite cardiovascular endpoint of CV death, non-fatal MI, non-fatal stroke (APTC)

Secondary Endpoint APTC + hospitalization for unstable angina, hospitalization for TIA, revascularization

Non-inferiority Study

Randomization to one of the three treatment options stratified according to:– Treatment Center– OA-RA indications– Aspirin use at baseline

Page 5: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Original PRECISION Assumptions

APTC rate per year: 0.020Non-Inferiority Margin (NIM): hazard ratio (HR) =1.333 Off-Treatment rate: Cumulative 40% over 3 years

18 month minimum follow-up36 month maximum follow-up time

Power = 0.90 Conclude non-inferiority if one-sided 97.5% upper confidence limit excludes HR=1.333 Number of APTC events 762

Page 6: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Design Issue1 Considerations For A Composite Endpoint

Increase in CV events was observed in the APTC endpoints

Relationship of hospitalized UA, TIA and revascularization with APTC endpoints

Is noise being added to the composite endpoint by inclusion, or

Is there informative censoring of the composite endpoint by exclusion

Accuracy of the adjudicated diagnosis

Interpretation of the results when the less severe endpoints dominate the composite

(The broader composite will always lead to a smaller, shorter trial but that should never be the reason for choosing it)

PRECISION leadership team, in consultation with FDA, determined that the APTC composite endpoint was the proper primary endpoint

Page 7: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Effects on Study Size of NIM and ITT/MITT Events

ITT Events - All APTC events over the 3-year observation timeMITT Events - All APTC events on randomized treatment + 30 days post-treatment

Design NIM Analysis Events Sample Size 1 1.33 MITT 762 20,000 1.30 ITT 925

2 1.40 MITT 556 14,700 1.36 ITT 680

3 1.37 MITT 626 1.33 ITT 762 16,500

4 1.45 MITT 455 1.40 ITT 556 12,000NIM and ITT/MITT events are strong determinants of study size and consequently the time to clinical knowledge of the study results.

Page 8: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Design Issue 2Purpose/Determination of NIM

What is the purpose of NIM in a safety study?

An NIM, when able to be determined by a strong scientific/clinical method;

- provides important scientific context for the design, conduct and analysis of safety trials and serves the purpose of ruling out an unacceptable increase in risk.

- serves the same role in a safety study as 1 does in an efficacy study, i.e., it sets up the null and alternative hypotheses.

- can serve as an objective regulatory criterion that the study results have or have not ruled out an unacceptable increase in risk.

For the above reasons it is essential to establish a rigorous and defensible NIM

The PRECISION NIM of 1.333 was established on a strong/clinical scientific basis. The NIM was established by considering a potential benefit of COX-2 on serious GI events in conjunction with a clinically acceptable excess risk based on the expected APTC event rate in the NSAID control group.

(

Page 9: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Design Issue 3ITT and MITT Analyses

Statistical, clinical/scientific and practical points to consider

Statistical Potential informative censoring regarding estimation of HR (MITT) -Event rate is dependent on censoring mechanism -Censoring mechanism (either time, type or degree) differs between the treatment groups

Decreasing the HR towards one (ITT) - Inclusion of events from patients no longer receiving randomized treatment may attenuate the HR and increase the chance of concluding non-inferiority.

Page 10: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

ITT and MITT Analyses

Clinical/ScientificThe MITT analysis (under the assumption of non-informative censoring) assesses the on-treatment effects of each of the study treatments (direct exposure effect over a variable, censored time) The ITT analysis (with no assumptions) assesses the effects of treatment strategies beginning with each of the study treatments(“real-world” effects over a fixed observation time) PracticalPRECISION will have to show non-inferiority for both the MITT and ITT analyses because each analysis alone has weaknesses.

Page 11: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

PRECISION Approach

All three results must occur in order to conclude non-inferiority

ITT - Upper limit of the one-sided 97.5% confidence interval for the HR< 1.33

MITT-Upper limit of the one-sided 97.5% confidence interval for the HR< 1.33

Point estimate of the HR does not exceed 1.12

Thus, with respect to these 3 important design issues, composite endpoint, NIM, and analytical approach to MITT/ITT events, the PRECISION trial was rigorously and properly designed to address the study’s objective.

Page 12: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Conduct of PRECISION

Rigorous, ongoing, monitoring of the conduct of PRECISION with particular intention given to:

APTC event rate Achieving real world adherence to the randomized regimens Retention in the study.

The APTC event rate, pooled across the three treatment regimens, was meaningfully lower than expected. This occurrence along with the pooled rate of adherence and the pooled rate of retention led to recognition by the study leadership that refinements to the design of PRECISION were necessary.

Page 13: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

PRECISION Modifications

All modifications resulted from interaction among DMC, Study Chair, Sponsor and FDA

Due to the higher than expected off-treatment rate:

- ITT observation time was decreased from 36 months to 30 months

- MITT observation time was increased from 36 months to 42 months

Page 14: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

PRECISION Modifications

Reduce power from 0.90 to 0.80

- Number of required APTC events decreases from 762 to 580

- Chance of not concluding non-inferiority when there is no difference increases from 10% to 20%

- The point estimate that rules out the 1.333 margin is reduced from 1.12 to 1.092

Increase the NIM for the MITT analysis to 1.40

- Number of required APTC events decreases from 580 to 420

- No change in Power

- The point estimate that rules out the 1.40 margin is 1.107

Page 15: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Rationale for the MITT margin of 1.4

First, in the ITT analysis, in order to rule out the 1.333 NIM for a pairwise comparison the estimated HR < 1.092.

Second, given the concern that the estimated HR in the ITT analysis could be attenuated toward unity by follow-up that occurs well after discontinuation of randomized intervention, it would be reassuring if the estimated HR from the MITT analysis also were to be ≈ 1.092, suggesting that the ITT analysis did not achieve this target because of that attenuation.

Third, in the MITT analysis, achieving a point estimate < 1.107 occurs if and only if the NIM margin for that analysis is 1.40.

Added benefit that 580 ITT events and 420 MITT are expected to occur at approximately the same time

Page 16: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Analytical Issues in PRECISION

Primary Analysis

Cox proportional hazards model with region, diagnosis (OA/RA) and baseline ASA usage as covariates in the model

When randomization is stratified, how to use these variables in the Cox model? Covariates or stratification variable

Analysis and interpretation of treatment HR by strata used in the randomization

Page 17: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Analytical Issues in PRECISION

Understanding ITT and MITT Analyses

If one is consistent with the other or if ITT HR < MITT HR, maybe fine, but

Off-Treatment Issue: Amount of censoring, the type of censoring, the time of censoring and the characteristics of patients censored need to be explored for potential informative censoring in the MITT analysis.

Non-retention Issue: Amount, time and characteristics of subjects dropping out of the study need to be explored for potential informative censoring due to non-retention in the ITT analysis.

In a study with non-negligible off-treatment and non-retention rates and with different MITT and ITT observation times this may be a complicated issue in analysis and interpretation.

Page 18: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Analytical Issues in PRECISION

Cross-ins to study NSAIDS, any NSAID

Cross-ins likely reduce sensitivity to distinguish CV risk among randomized treatment groups More of an issue with ITT analysis but will also affect MITT

Sensitivity analyses based on subgroups or censoring have weaknesses

Time-dependent Analyses?

Deconstructing the Primary Composite Endpoint

Fatal, Non-Fatal MI Fatal, Non-Fatal Stroke

Page 19: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Summary

PRECISION was rigorously designed and monitored with rigorous performance standards.

Enrollment in PRECISION was terminated at 24,333 randomized subjects, projected to provide the targeted number of APTC events by the end of this year.

The study was challenging from all aspects and medical/statistical challenges remain in the analysis and interpretation of results.

The information from this randomized study of 24,000 subjects, with a strong design and conducted in a rigorous manner, will dwarf the current safety information regarding the three study drugs.

Page 20: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Safety Context of EAGLES Trial

EAGLES Study – Chantix (varenicline) Smoking Cessation Safety Study

Signal for serious neuropsychiatric adverse events came from spontaneous post-marketing reports to FDA

FDA determined that varenicline was associated with serious neuropsychiatric adverse events including suicidal ideation, suicidal behavior, changes in behavior, agitation, depressed mood, and worsening of preexisting psychiatric illness.

Led to a Boxed Warning in the Chantix label regarding these serious neuropsychiatric adverse events

Led to EAGLES, a post-marketing commitment by Pfizer to FDA and to EMA

Page 21: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

EAGLES Design

Randomized Design: placebo, varenicline, bupropion and nicotine patch

Study Objective: To characterize the neuropsychiatric safety profiles of varenicline, bupropion, nicotine patch and placebo

Composite primary endpoint of Serious/Severe Neuropsychiatric (NPS) AEs occurring over the 12 week treatment period

Design and protocol were approved by FDA and EMA

External Data Monitoring Committee

Determination of NIM by a strong scientific/clinical method was not possible

Page 22: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

EAGLES Design

Guidance: “The study should be sufficiently powered to adequately assess clinically significant neuropsychiatric adverse events with each treatment” Estimation study - not designed to test a specific hypothesis

Sample size of 8,000 (2,000 per treatment group) was determined by agreement with FDA on a pre-specified width of the 95% CI in estimating NPS AE rate differences

A sample size of 2,000 per treatment group yields a 95% confidence interval about the risk difference of ± 1.59%.

Page 23: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

EAGLES Approach Applied to HR

In studies where an NIM is not able to be determined by a strong clinical/ scientific method, the 95% CI for the HR is a way to size the study and to assess the cost benefit of relative trial size.

Example of total number of planned events and 95% CL for the HR Events 95% CL

508 (HR/1.19, 1.19HR) 388 (HR/1.22, 1.22HR)

280 (HR/1.26, 1.26HR)

The use of the 95% CI to determine study size in a non-NIM study does not necessarily lead to smaller trials than NIM-based trials.

Page 24: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Summary

EAGLES is a rigorously designed safety study in which an NIM was not possible.

When EAGLES is completed the point estimates of the risk difference and the 95% confidence interval and absolute levels of risk will contribute to clinical knowledge.

Because of the rigor of the design of EAGLES, FDA anticipates adding the results of EAGLES to the label and using the results to evaluate the boxed warning for Chantix generated by the spontaneous reports.

Page 25: Statistical Issues in the Design, Conduct and Analysis of Large Safety Studies Michael Gaffney, PhD Pfizer Inc.

Personal Perspective/ Points of Interest

A NIM, when able to be determined by a strong scientific/clinical method, provides important scientific context for the design, conduct and analysis of safety trials and serves the purpose of ruling out an unacceptable increase in risk.

When an NIM cannot be established by a strong scientific/clinical method an arbitrary one should not be used. As shown by EAGLES, a safety study can be sized by the 95% CI. This approach calls for as rigorous a trial as does one with an NIM.

A hypothesis rejecting, all or nothing, interpretive approach to the results of a safety study based on a NIM should be discouraged. While the NIM does give the pre-specified level of unacceptable risk, a binary interpretation distracts from the clinical information from the study, such as point estimates, confidence intervals and absolute levels of risk.

Medical/scientific considerations should inform the on-treatment risk period (e.g., treatment period + 30 days after stopping) for the MITT analysis . Without indications of informative censoring (although maybe hidden) the MITT events over the on-treatment risk period may better estimate risk than ITT events with the expectation of HR=1 after the risk period.

All subjects must be followed to the end of the study in order to conduct the ITT analysis. Only the ITT analysis preserves the integrity of randomization which is the statistical basis of inference and the assessment of causality.