SECONDARY ANALYSES IN CLINICAL TRIALS
description
Transcript of SECONDARY ANALYSES IN CLINICAL TRIALS
2013 CTN Web Seminar Series
Produced by: NIDA CTN CCC Training Office"This training has been funded in whole or in part with Federal funds from the National Institute on Drug Abuse,
National Institutes of Health, Department of Health and Human Services, under Contract No.HHSN271201000024C."
SECONDARY ANALYSES IN CLINICAL TRIALS
Presented by:George Bigelow, PhDDaniel J. Feaster, PhD
Abigail G. Matthews, PhD
December 6, 2013
2
Objectives• Review the statistical issues with analyzing
and interpreting secondary analyses, and demonstrate the multiple testing burden
• Explain the importance of secondary outcome and analysis identification during protocol development
• Discuss reporting and interpretation of secondary analyses, including the perspective of the CTN Publications Committee
3
Outline• Introduction and motivation• Statistician’s perspective• Multiplicity• Implementing secondary analyses• Summary• Discussion
4
PUBLICATIONS COMMITTEE PERSPECTIVE ON SECONDARY ANALYSES: OPPORTUNITIES & CAUTIONS
5
Opportunities• CTN encourages multiple publications• We want to learn as much as possible• Large-N, diverse, multi-site studies• Broad study teams with diverse interests• Extensive investment in assessments• Repeated assessments over time• Assessment commonalities across studies
6
Cautions• CTN studies typically yield multiple
publications• Question: Are we over-analyzing the data?• Discussions in Publications and Executive
Committees• Consequence: This webinar
7
Cautions (cont.)• Multiple testing incurs risk of false
conclusions• Proper planning can reduce this risk• Acknowledgement of limitations is essential
8
Example of Multiple Publications• CTN006/007: MIEDAR – Abstinence-
Contingent Incentives• Report of primary outcome• Do contingent incentives reduce stimulant
use?
9
Example of Multiple PublicationsCTN006/007:
MIEDAR – Abstinence-Contingent IncentivesReports of secondary outcomes:
Do contingent incentives affect……HIV risk behavior?
…gambling?…cost or cost effectiveness?…methamphetamine use?
…staff attitudes?
10
Example of Multiple PublicationsCTN006/007: MIEDAR – Abstinence-Contingent
Incentives
Reports of moderator variable associations: Are incentive effects related to…
…gender, race, ethnicity?…treatment history?
…criminal justice involvement?…urinalysis result at intake?
…gambling history?
11
A Caution About Demographic Subgroup DifferencesBe cautious of thinking of subgroup differences
as inherent characteristics of those groups or of individuals within those groups
Demographic subgroup differences are very likely the result of some correlated
confounding variable; they likely reflect differences in life experiences and
opportunities, and the contexts in which drugs are encountered
12
Example of Multiple PublicationsCTN006/007: MIEDAR – Abstinence-
Contingent Incentives
Reports of associations unrelated to the study intervention:
What symptoms are related to dependence on various drugs?
13
Types of Analyses• Intervention Effects on Primary Outcome
• Intervention Effects on Secondary Outcomes
• Analysis of Moderators or Mediation
• Associations
14
Common Errors• Mistaking correlations for causes
• Mis-describing the study methods
• Overlooking explanatory confounding variables
• Failing to acknowledge limitations
15
Correlation is Not CausationAvoid language that implies causality when reporting associations
Examples
“effect on” “effect of”
“impact on” “impact of”
“consequently” etc.
16
Describe Methods Accurately• Understand and describe original study
accurately
• Explain origins and methods of secondary analysis
• Idea should precede looking through the data
• Describe types and numbers of analyses performed
17
Consider Confounding Factors• One report examined relationship between
study pay and proportion of Ss present at the final assessment
• Proposed implausible conclusion that greater pay led to less retention
• Failed to note that pay amount was related to study duration and difficulty
18
Proceed with CautionMany of the factors that must be considered in conducting and reporting secondary analyses are the same as those important for careful and thoughtful reporting of primary analyses
However, secondary analyses can also involve some special statistical considerations, as will be discussed by the following speakers
19
STATISTICIAN’S PERSPECTIVE AND THE ISSUE OF MULTIPLICITY
20
What do we mean by primary and secondary outcomes and analyses?
• The primary outcome is the main outcome variable for the study– Research hypothesis based on this measure– Used to power study and determine statistical
significance of any treatment effect – Analytic method must be specified a priori in
the Statistical Analysis Plan (SAP) at a minimum
21
What do we mean by primary and secondary outcomes and analyses?
• Secondary analyses are any other analyses, e.g.:– Sensitivity analysis of primary outcome
measure with respect to missing data– Subgroup analyses by age, race, gender,
ethnicity, disease severity, etc.• Secondary outcomes are any other
outcome measures
22
Why secondary analyses?• Publish or perish!!!• Possible that primary outcome measure
data ends up being unreliable– e.g., using TLFB but high rate of discordance
between self-report and UDS• Analytic issues with pre-specified primary
analysis– e.g., proposed distribution of the primary
outcome variable does not hold and alternative methods should be used
23
Why secondary analyses? (cont.)• Possibly poor power of primary analysis if
assumptions used in sample size are not appropriate
• Sensitivity analyses of primary outcome with respect to missing data – key for addiction research
• Subgroup analyses– Race, ethnicity, gender required by NIH– Baseline severity of disease (Nunes et. al., 2011)
SPECIFY A PRIORI IN PROTOCOL OR SAP AS MUCH AS POSSIBLE!
24
Why secondary outcomes?• Publish or perish!!!• In addiction research we always focus on
abstinence but is that enough?– Improved overall quality of life– Engaging in less risky sexual behaviors– Less illegal activity such as theft or
prostitution– CTN TEAM Task Force recommends at least
one secondary outcome measure be related to “functioning, satisfaction, or quality of life”
25
Why secondary outcomes? (cont.)• Again, what if there are unanticipated
issues with the primary outcome? • Cannot “hang your hat” on only one
outcome measureSPECIFY A PRIORI IN PROTOCOL OR SAP AS MUCH
AS POSSIBLE!
Example of Utility of Secondary Analyses• CTN: Women with Trauma and Addictions• Primary outcome results (Hien et. al., 2009):
– Trauma symptom severity: NS– Abstinence: NS
• Secondary analyses:– Women with baseline eating disorders had significantly
less improvement in PTSD severity and abstinence– SS significantly reduces unprotected sex in high risk
women over time– Racial/ethnic matching with therapist associated with SS
effectiveness– Examples of other positive findings: retention, sleep
disorders, intimate partner violence
Denise Hien CPDD 2013 Presentation:
28
Words of Warning• Too many post hoc analyses opens one to
accusations of data dredging
• Secondary analyses/outcomes cannot be used to evaluate the trial as a whole (only primary outcome)
• If there are a substantial number of pre-specified secondary outcomes and analyses, consider adjusting for multiple comparisons
• Appropriate interpretation of results is key– Hypothesis generating
29
Cautionary Tale
Scott Harkonen, MD
• Convicted of wire fraud: “willfully overstating in a press release the evidence for benefit of a drug his company made”
• Primary outcome p-value=0.08• Asked his statisticians to identify sub-group with significance • Patients with mild to moderate disease severity: p=0.004• Press release acknowledged negative finding from primary
outcome analysis but maintained drug associated with increased survival
• Post: “…everyone agrees there weren’t any factual errors in the four-page document. The numbers were right; it’s the interpretation of them that was deemed criminal.”
• During appeal court said: “Statements are fraudulent if ‘misleading or deceptive’ and need not be literally ‘false’.”
30
Why such controversy?Multiplicity• Type I error is preserved (usually 5%) for primary
outcome(s)
• If performing multiple secondary analyses, then overall Type I error will be higher
• If enough analyses are performed, there will be at least one spurious association
• Adjustment not necessary for secondary analyses/outcomes, but interpretation must be cautious and presentation of results forthright and transparent
31
Illustration of Multiplicity• Generate 10 outcome variables
independently from normal with mean=0 and variance=1 for 300 participants
• Calculate Spearman correlation coefficient for each pair-wise combination
• Test correlation coefficient ≠ 0
• Type I error estimated as the number of tests that are statistically significant divided by number of tests (45)
32
Illustration of Multiplicity (cont’d)V2 V3 V4 V5 V6 V7 V8 V9 V10
V1 R = 0.03p = 0.661
R = -0.03p = 0.541
R = 0.04p = 0.448
R = 0.16p = 0.006
R = 0.07p = 0.231
R = -0.04p = 0.481
R = <0.01p = 0.960
R = 0.01p = 0.875
R = -0.08p = 0.186
V2 R = -0.05p = 0.436
R = 0.03p = 0.548
R = -0.12p = 0.033
R = 0.06p = 0.310
R = 0.02p = 0.793
R = 0.10p = 0.083
R = 0.01p = 0.877
R = 0.07p = 0.222
V3 R = 0.10p = 0.082
R = 0.05p = 0.427
R = 0.07p = 0.197
R = -0.03p = 0.605
R = -0.06p = 0.322
R = -0.08p = 0.149
R = 0.07p = 0.197
V4 R = -0.04p = 0.494
R = 0.08p = 0.149
R = -0.14p = 0.013
R = 0.15p = 0.012
R = 0.03p = 0.598
R = 0.05p = 0.396
V5 R = 0.06p = 0.282
R = -0.03p = 0.550
R = -0.04p = 0.501
R = 0.02p = 0.772
R = -0.16p = 0.005
V6 R = 0.05p = 0.433
R = 0.05p = 0.397
R = -0.14p = 0.013
R = -0.06p = 0.273
V7 R = -0.11p = 0.055
R = -0.07p = 0.227
R = -0.09p = 0.116
V8 R = 0.03p = 0.614
R = -0.03p = 0.623
V9 R = 0.07p = 0.232
Type I Error Rate = 6/45 = 13.3%
33
Implications• Avoid post hoc analyses
• Pre-specify as much as possible (protocol or SAP) →
→ Avoid data dredging criticism
→ Can even adjust Type I error rate for number of secondary analyses performed (rare)
• Interpret secondary results keeping in mind the inflated Type I error rate
34
Responsible Analysis and Reporting• Focus should always be on primary outcome
• Of secondary analyses, focus should be on those that were pre-specified
• Requires careful planning with statement of hypotheses in protocol/SAP (SAP should be finalized before data lock)
• Report in a manuscript the number of pre-specified analyses performed and the number reported
35
Responsible Analysis and Reporting (cont’d)• Present estimates of treatment differences
and CIs:“plausible range of treatment differences consistent with trial results”
• Interpretation needs to be viewed as exploratory rather than confirmatory
• Frame results in context of supporting or contradictory data from other studies
36
Responsible Analysis and Reporting (cont’d)• For post hoc analyses:
– Acknowledge that analyses were not specified a priori (data driven)
– Describe why analyses are important and the relevance of the research question
– Report number of post hoc analyses performed and the number reported
– Significance should be viewed as descriptive and not used for inference or decision making
– Can be used to justify future research
37
ExamplesIf primary outcome not statistically significant but some pre-specified secondary analyses were:
While the primary outcome did not demonstrate statistically significant evidence of a treatment effect, some secondary analyses suggested that the treatment may be effective. Therefore, future research is warranted.
If primary outcome is statistically significant but no secondary analyses are:
The primary outcome was statistically significant indicating that treatment is effective in this study population. Despite the fact that numerous secondary analyses did not yield statistical significance, there is sufficient evidence to justify future research of this intervention.
38
Questions…
39
IMPLEMENTATION OF SECONDARY ANALYSES
The Design Stage
Multiple Types of Secondary Analyses• Secondary hypotheses—Utilize the design• Mediation studies—Use data post-
randomization• Association Studies—Normally don’t use
the design– Example: Predictors of HIV Testing
(CTN0032)– Since do not use the study design—
observational!40
Multiple Types of Secondary Analyses (cont.)• Subgroup or Moderator Analyses
– Risk reduction counseling impact on HIV testing by modality of substance use treatment (CTN0032)
– Differential Treatment Effects by Race/Ethnicity and/or Gender
– We do not randomize to subgroups—observational!
41
Why consideration at design stage?• Appropriate measures
• Sample size considerations
• For secondary analyses that do NOT use design:– Causal interpretation is difficult– Statistical models can help
• Subject to assumptions—no unmeasured confounders• Implies need to think about and measure confounders for
any secondary analysis that is not just a test of difference by randomized treatment group
42
Secondary Outcomes• Simplest type of “secondary” analysis
• Other outcomes on which we feel the intervention will have an impact
• Analysis strategy frequently very similar to primary outcome analysis
• Need to consider multiple testing issue!– If enumerate and measure 20 secondary outcomes,
the Type 1 error is .64 (if we use α = .05 for each test)– Each secondary outcome being in a separate paper
does NOT change this fact43
Mediation Analyses• Another analysis frequently included in
protocols• Because pieces of the model are determined
after randomization, there is difficulty making a strong causal interpretation
44
Treatment Assignment
Attendance in Treatment
Count of Drug Use Days
But we do not randomize attendance, so even if observed for everyone:
45
Treatment Assignment
Attendance in Treatment
Count of Drug Use Days
CONFOUNDERS
We cannot rule out confounders of both Attendance and Drug Useand, therefore, cannot make strong causal statements about the gold pathway without strong assumptions.
Possible AssumptionsNo Unmeasured Confounders
• If we measure all potential confounds, then can make causal statements SUBJECT to the assumption that we have measured ALL the confounds
• May want to measure potentially confounding factors
• Example: Propensity Score Analysis
Instrumental Variables
• If we can find exogenous factors, Z, that are correlated with Attendance, X
• Also, Z does not directly affect Count of Drug Use Days
• Can use Z as instruments to identify the causal impact of attendance on Count of Drug Days
• Use the instrumental variable in place of the endogenous variable
46
Association Studies• Like mediator models, association studies are
looking at the impact of variables which the experimenter has not controlled on some chosen outcome→Do not make causal claims (unless include confounder analysis or Instrumental Variable(s))!
• Frequently these studies will look at numerous predictors– Type I error– Must be very clear and honest about the way the
analyses were done47
Potential Solution for Type I Error• Machine Learning approaches
– Used in data mining– Allows an exhaustive search for the best
predictive model of the outcome– Like testing all covariates, sometimes can over-fit
• Cross-validation– Simplest approach is split sample, explore on
one sample, then replicate on a second sample (the second sample is a “test” of results on the first sample)
48
Subgroup or Moderator Analyses: Should Determine Which Subgroups Are of Interest at Design Stage• Many of us have interests in:
– Racial and ethnic groups– Gender
• But other subgroups may be of interest– Drug of choice– Severity of individual’s problem – Age– Socioeconomic status – Site Differences (in levels of outcome and/or treatment
effects)– PTSD/No PTSD
• Important to define groups a priori (Necessary to consider at the design phase) 49
Many (if not most) subgroups are not randomized.
• This means these are observational models and cannot make causal statements– Should assess for confounds– Should be careful not to over-interpret– Even if assess for confounds, cannot rule out
unobserved confounds– Race/Ethnic differences are examples where
differences are largely NOT causal (race/ethnicity is correlated with true casual agent)
50
Example
• If results show there is an interaction & appears that treatment works best in high severity
• Cannot be sure that high severity “caused” treatment success (or that treatment will work in high severity) because have not randomized to high severity 51
Count of Drug Use Days
Randomized Group
Baseline Drug Use Severity
Randomized Group X Baseline Drug Use Severity
Genetic factor
Suggestions• Pre-specify at the design stage the subgroups of
interest
• Plan to assess known confounds related to the subgroups
• Minimize the number of subgroups examined (Type I error issue)
• Use tests of interaction within all participants, rather than testing treatment effects within each subgroup!!!
52
Wang et. al., Statistics in Medicine – Reporting of Subgroup Analyses in Clinical Trials. NEJM, 2007:357;21.Lagakos, S. The Challenge of Subgroup Analysis—Reporting without Distorting. NEJM, 2006:354:16.
Defining Subgroups• Easiest to work with subgroups that are
inherently categorical (race/ethnicity, gender, primary drug of use, site)
• Subgroup membership is ambiguous (and potentially manipulated) if defined on continuous measures (age or income, etc.)—better to include a continuous interaction
• Focusing on categorical subgroups
53
To analyze subgroups, you must recruit subgroups.
• Examine sites and particular clinics for subgroup composition—choose sites accordingly
• Should focus on a few subgroups (or correct for multiple testing in analysis)
• May have sites that are predominately a single minority (Puerto Rico in BSFT was 100% Hispanic)
NOTE: This may create difficulty in identifying subgroup effects separately from site effects in studies with small number of sites
54
Must Protect Integrity of the Overall Study Design
• If subgroup is associated with primary outcome measure, consider stratified randomization (by subgroup to ensure balance across conditions)
• Must decide whether to incorporate subgroups into primary hypothesis testing
• If so, how to incorporate• Fully stratified primary analysis is like running
duplicate trials in each subgroup and would require a large overall sample
55
How much should subgroups be incorporated into primary analysis?• Depends on what subgroup membership affects
—trick is specifying a priori
• Look at 4 possibilities (generally assuming that randomization works) for subgroup effects:– Initial levels differ by subgroup– Initial levels and rates of change differ by subgroup – Initial levels and rates of change differ by subgroup and
failure of randomization within at least one subgroup– Initial levels and rates of change differ by subgroup and
intervention status [i.e., Subgroup X Treatment Interaction]
56
Only Initial Levels Differ
57
• Include a 1df control for subgroup membership– Reduces residual variance – Increases power
• Little cost to include: Minor increase in model complexity—1df (per additional group) if wrong
Initial Level and Rates of Change
58
• 2 df per group (one for intercept and 1 for change-if linear)– Reduces residual variance– Increases Power
• Relatively low cost to complexity and lost df (unless many higher order polynomials in change)
Initial Level & Rates of Change and Randomization Failure in 1 Group
59
• We assume randomization will work—and participants should look similar on average at baseline
• But as you examine more subgroups you increase the chance that one subgroup will have an imbalance (here also differences in slopes)
Initial Levels and Rates of Change Differ by Group and Intervention
60
• Need separate intercepts and slopes for each group BY each intervention condition (have interaction of treatment by subgroup)
• Costly Model (in the sense of statistical power and sample size)– Complicated, many df (wasted if guess wrong)– Basically, fully stratified model—should consider powering within subgroup
61
So, what should you do for your a priori Statistical Analysis Plan?• Mean Differences by subgroup: add 1 df
control for subgroup membership—low cost if wrong
• Different trajectories by subgroup (but = intervention effect)– With existing evidence: plan to include
subgroup specific rates of change – No Prior Evidence:
• Could include examination of trajectories blind by condition
• May be better to test control group for differences in trajectories by subgroup
So, what should you do for your a priori Statistical Analysis Plan? (cont.)• Different intervention effects—Full
stratification with test for intervention interaction effects—Large Trial– Note that Testing for interaction does not
“solve” the problem
– Must plan for possibility that interaction is significant—then would want power to show effect within a subgroup
62
How many do you need in a subgroup if want to explore?
• If subgroups are not main emphasis of trial, difficult to power subgroup effects– Emphasize effect sizes and minimal sample to get
stable estimates– Nevertheless, useful to have a feel for power
• Will look at power implications– For mean (initial) level differences across subgroups– For rate of change differences across subgroups
63
64
N per group to Have 80% Power For Simple Mean Differenceat Point in Time
Sta
ndar
dize
d D
iffer
ence
Sample Size Per Group
0.1
0.3
0.5
0.7
0.9
0 100 200 300 400 500
Source: PASS 2005 Assumes normality and 2 group comparison
Power for Differences in Slopes• Assumes 4 assessment times
• Lose 15% at T2, 5% more at T3 & T4
• Compound symmetry in Errors=.20
• Random effects in intercept and slope terms
• Effect size is the standardized mean difference at the LAST time (assumes largest effect at last observation)
• Uses RMASS program from Don Hedeker– Hedeker, Gibbons, & Waternaux, 1999 JEBS– Available at http://tigger.uic.edu/~hedeker/
65
66
Sample Size per Subgroup for 80% PowerGrowth Curve Showing Mean Difference at Last of 4 Times
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 50 100 150 200 250 300 350
n per group
Effe
ct S
ize
Linear Growth CurveQuadratic Growth Curve
Assumes 2 group comparison
Note About Power Statements• These are for 2 group comparisons
• If you want to find power for testing difference in intervention effects across 2 (or more) groups (an interaction effect) would need to simulate if planning on a growth curve framework with missing data
• Rough, overestimate—4 times the per group n—should over-power the interaction effect
• However, if want power within subgroups—4 times per group n is the proper sample size
67
Potential Difficulties with Subgroup Analyses• Disentangling subgroup effects from site
effects
• Interpretation of subgroup findings– Are measurement instruments equally valid
and reliable across subgroups (need at least 100 per group even for small invariance analysis)
68
Potential Difficulties with Subgroup Analyses (cont.)• Interpretation of subgroup findings (cont.)
– What does group membership mean? Proxy for?• What person-specific variables do we need to
understand subgroup difference• What contextual variables do we need?
– Local, neighborhood and regional factors (zip codes of individuals and clinics?)
– Treatment context and other site level factors
69
Summary of Issues with Secondary Analysis• Analyses that use treatment assignment as the
only predictor*:– Have a strong causal interpretation– Are useful to characterize the full impact of an
intervention—substance treatment has multiple targets
– But too many secondary outcomes may cause a Type I error, unless a multiple testing procedure is followed
– Need to be clear about the number of secondary outcomes considered
70* Could also include other factors measured at baseline as control variables
Summary of Issues with Secondary Analysis (cont.)• All other secondary analyses (mediation,
associational, and subgroup):– Cannot be interpreted as causal without strong
assumptions
– Need to assess for confounding variables
– Be careful not to over-interpret—Cannot rule out unobserved confounding
– Are also susceptible to Type 1 errors—• limit the number of analysis and/or incorporate a multiple
testing strategy and be clear in presentation of what you have done
71
Suggestions• All manuscripts should have an analysis plan
pre-specified
• Any findings that come from exploratory analyses need to be clearly designated in the manuscript
• May be useful to cite the CTN dissemination library for the particular trial (for readers reference for the other different outcomes that have been published on the same data)
72
73
WRAPPING IT UP!Highlights
74
Summary• Secondary analyses are important and necessary
components of clinical trial research
• Secondary outcomes and analyses should be considered and identified during protocol and/or SAP development
• Interpret results appropriately: exploratory, confirmatory, descriptive, not causal
• Always report whether analysis/outcome was pre-specified or post hoc and the number performed
75
Summary (cont.)• CTN encourages multiple analyses & publications
• Describe methods clearly and accurately
• It is essential to acknowledge the limitations
• Beware of:− Excessive testing− Data dredging− Misinterpreting correlations as causes− Confounding variables
76
Q&A – Questions / Comments
Alternatively, questions can be directed to the presenter by sending an email to [email protected].
77
References• Hien, et al. 2009. J Consult Clin Psychol, 77(4); 607–619.
• Lagakos, S. 2006. The challenge of subgroup analysis—Reporting without distorting. NEJM, 354; 16.
• Nunes, et al. 2011. American Journal of Drug and Alcohol Abuse, 37; 446-452.
• Wang, et al. 2007. Statistics in medicine – Reporting of subgroup analyses in clinical trials. NEJM, 357; 21.
78
Survey ReminderThe NIDA CCC encourages all to complete the survey issued to participants directly following this webinar session, as this is the primary collective tool for rating your experience with this and other webinars, and for communicating the interests and needs of CTN members and associates.
See you all in 2014!
79
A copy of this presentation will be available electronically after this session.
http://ctndisseminationlibrary.org
80
THANK YOU FOR YOUR PARTICIPATION