Beyond subgroup analysis: improving the clinical interpretation of treatment effects in stroke...

8
Journal of Neuroscience Methods 143 (2005) 209–216 Beyond subgroup analysis: improving the clinical interpretation of treatment effects in stroke research Mei Lu a,, Patrick D. Lyden b,c , Thomas G. Brott d , Scott Hamilton e , Joseph P. Broderick f , James C. Grotta g a Department of Biostatistics and Research Epidemiology, Henry Ford Health System, One Ford Place, 3E Detroit, MI 48202, USA b Stroke Center, University of California, San Diego, CA, USA c San Diego Veteran’s Affairs Medical Center, Research Division, San Diego, CA, USA d Department of Neurology, Mayo Clinic, Jacksonville, Charlottesville, FL, USA e Stanford University School of Medicine, Department of Neurology, CA, USA f Department of Neurology, University of Cincinnati, OH, USA g Department of Neurology, University of Texas Medical Center, TX, USA Received 30 July 2004; received in revised form 7 October 2004; accepted 8 October 2004 Abstract In large clinical trials designed to determine efficacy of an experimental treatment, patients are enrolled with presence or absence of various risk factors, such as diabetes or history of atrial fibrillation. A treatment-by-risk factor interaction indicates that the treatment effect may depend on the risk factor presence or absence. It is important to identify such interaction, since a treatment may fail or cause adverse events in the presence of the risk. Although statistical methods exist to identify such interaction, they are underutilized in clinical stroke research. This paper reviews the notion of treatment-by-risk factor interaction and identifies two types of interaction, quantitative and qualitative, using a graphical technique and statistical testing. We illustrate how to avoid drawing the erroneous conclusions regarding the treatment effect on subgroups when failing to detect an interaction, and provide rigorous tools to estimate the treatment effect on subgroups when an interaction is observed. Applications are presented using the data collected from the NINDS t-PA stroke studies. In stroke clinical trials, a treatment-by-risk factor interaction must be considered if the data permit. The graphical approach provides a heuristic illustration of interactions. Qualitative interactions are more important than quantitative interactions on therapeutic conclusion. Results of NINDS t-PA stroke studies confirmed our previous conclusions on the treatment t-PA benefit within 3-h therapeutic window. No subgroup of patients would lead a physician to withhold the t-PA treatment. © 2004 Elsevier B.V. All rights reserved. Keywords: Cerebral ischemia; Clinical trials; Subgroup analysis; Treatment-by-risk factor interaction; Qualitative or quantitative interaction 1. Introduction A treatment effect that varies on patients’ subgroups with and without a risk (e.g., history of diabetes) implies a treatment-by-risk factor interaction. Thus, the interaction can cause a treatment failure or adverse reaction when the presence of the risk overturns the treatment benefit. To study of the treatment effect on subgroups, the most efficient Corresponding author. Tel.: +1 313 874 6413; fax: +1 313 874 6730. E-mail address: [email protected] (M. Lu). statistical approach is to test the treatment-by-risk factor interaction rather than a subgroup analysis. This concept has general acceptance in statistics, but the acceptance has been slower among clinical investigators in a clinical trial study. Pocock et al. (2002) studied 50 randomized clinical trials with sample sizes over 50 per group and published in major medical journals: British Medical Journal (BMJ), the Journal of the American Medical Association (JAMA), the Lancet and The New England Journal of Medicine (NEJM) during July–September of 1997. Of 50 clinical trials, 70% conducted subgroup analyses, but only 30% performed 0165-0270/$ – see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jneumeth.2004.10.002

Transcript of Beyond subgroup analysis: improving the clinical interpretation of treatment effects in stroke...

Page 1: Beyond subgroup analysis: improving the clinical interpretation of treatment effects in stroke research

Journal of Neuroscience Methods 143 (2005) 209–216

Beyond subgroup analysis: improving the clinical interpretation oftreatment effects in stroke research

Mei Lua,∗, Patrick D. Lydenb,c, Thomas G. Brottd, Scott Hamiltone,Joseph P. Broderickf, James C. Grottag

a Department of Biostatistics and Research Epidemiology, Henry Ford Health System, One Ford Place, 3E Detroit, MI 48202, USAb Stroke Center, University of California, San Diego, CA, USA

c San Diego Veteran’s Affairs Medical Center, Research Division, San Diego, CA, USAd Department of Neurology, Mayo Clinic, Jacksonville, Charlottesville, FL, USA

e Stanford University School of Medicine, Department of Neurology, CA, USAf Department of Neurology, University of Cincinnati, OH, USA

g Department of Neurology, University of Texas Medical Center, TX, USA

Received 30 July 2004; received in revised form 7 October 2004; accepted 8 October 2004

A

e of variousr ffect mayd verse eventsi research.T tive, usinga nt effect ons nteraction iso ent-by-riskf ualitativei firmed ourp o withholdt©

K

1

wacpo

ctorcept

ce hasrialalhedJ),A),ine

als,rmed

0d

bstract

In large clinical trials designed to determine efficacy of an experimental treatment, patients are enrolled with presence or absencisk factors, such as diabetes or history of atrial fibrillation. A treatment-by-risk factor interaction indicates that the treatment eepend on the risk factor presence or absence. It is important to identify such interaction, since a treatment may fail or cause ad

n the presence of the risk. Although statistical methods exist to identify such interaction, they are underutilized in clinical strokehis paper reviews the notion of treatment-by-risk factor interaction and identifies two types of interaction, quantitative and qualitagraphical technique and statistical testing. We illustrate how to avoid drawing the erroneous conclusions regarding the treatme

ubgroups when failing to detect an interaction, and provide rigorous tools to estimate the treatment effect on subgroups when an ibserved. Applications are presented using the data collected from the NINDS t-PA stroke studies. In stroke clinical trials, a treatm

actor interaction must be considered if the data permit. The graphical approach provides a heuristic illustration of interactions. Qnteractions are more important than quantitative interactions on therapeutic conclusion. Results of NINDS t-PA stroke studies conrevious conclusions on the treatment t-PA benefit within 3-h therapeutic window. No subgroup of patients would lead a physician t

he t-PA treatment.2004 Elsevier B.V. All rights reserved.

eywords:Cerebral ischemia; Clinical trials; Subgroup analysis; Treatment-by-risk factor interaction; Qualitative or quantitative interaction

. Introduction

A treatment effect that varies on patients’ subgroupsith and without a risk (e.g., history of diabetes) impliestreatment-by-risk factor interaction. Thus, the interaction

an cause a treatment failure or adverse reaction when theresence of the risk overturns the treatment benefit. To studyf the treatment effect on subgroups, the most efficient

∗ Corresponding author. Tel.: +1 313 874 6413; fax: +1 313 874 6730.E-mail address:[email protected] (M. Lu).

statistical approach is to test the treatment-by-risk fainteraction rather than a subgroup analysis. This conhas general acceptance in statistics, but the acceptanbeen slower among clinical investigators in a clinical tstudy.Pocock et al. (2002)studied 50 randomized clinictrials with sample sizes over 50 per group and publisin major medical journals: British Medical Journal (BMthe Journal of the American Medical Association (JAMthe Lancet and The New England Journal of Medic(NEJM) during July–September of 1997. Of 50 clinical tri70% conducted subgroup analyses, but only 30% perfo

165-0270/$ – see front matter © 2004 Elsevier B.V. All rights reserved.oi:10.1016/j.jneumeth.2004.10.002

Page 2: Beyond subgroup analysis: improving the clinical interpretation of treatment effects in stroke research

210 M. Lu et al. / Journal of Neuroscience Methods 143 (2005) 209–216

testing for treatment-by-risk factor interactions. Two cardi-ology trials (Frasure-Smith et al., 1997; Kostis et al., 1997)drew conclusions based on the subgroup analyses withouttesting for the treatment-by-risk interaction; however, in fact,no interactions were identified in either trial (p-values > 0.20).Pocock et al. (2002)concluded that there were continuing andsignificant concerns regarding overuse and over-interpretingof subgroup analyses, and under use of appropriate statis-tical tests for assessing the treatment-by-risk factor interac-tion.

After we (National Institute of Neurological Disorders andStroke rt-PA Stroke Study Group, 1995) reported the pri-mary results of the treatment t-PA benefit on 3-month recov-ery for stroke patients treated within 3 h of the stroke onset,the Study Group has been more focused on studying vari-ous modifiers of treatment effect.Marler et al. (2000)con-cluded that patients with early t-PA treatment had increasedchance of the 3-month favorable outcome compared to pa-tients who were treated later, after we observed a significanttreatment by the time of stroke onset to the treatment (OTT)interaction. This result has been confirmed byHacke et al.(2004)on combining individual patient’s data from six ran-domized placebo-controlled t-PA stroke trials including thetwo NINDS t-PA stroke trials.The NINDS t-PA Stroke StudyGroup (2000)later examined the 3-month brain computed to-m l t-PAt .H onthC ando reaterl t oft

-riskf re-t ticalc rouss ices.F d es-t ps?,W ffectf eg-a f thet nefitf riski h in-t n ourp

riskf an-t tatis-t ousc g tod matet ctioni col-l

2. Testing, understanding and estimating thetreatment effect on subgroups

In stroke trials, treatment responses are usually describedas simplified outcome scales assessed post-treatment. Co-variates that might influence treatment effects are proteanand include baseline risk factors, demographic factors, anytreatment variables (e.g., dose and the time of the treatment),and rarely, genomic differences in the study population (e.g.,fast metabolizers).

To test null hypothesis, H0 the treatment effect on theresponse is not associated with a risk factor; we need to testthe treatment-by-risk factor interaction using a generalizedregression model, described as:

f (Y ) = b0 + b1X + b2Z + b3XZ (1)

whereY is the response,X is the risk factor of diabetes, as anexample, with value of 1 as presence and 0 as absence,Z is thetreatment indicator with value 1 for the treated group and 0 forthe untreated/placebo group, andf(·) is a link function. If Yis a binary response, as an example, Eq.(1) becomes logisticregression, wheref(·) = ln (p/(1−p)) andp is the proportion ofresponses, which derives the odds ratio for treatment effect.WhenY is a continuous outcome, Eq.(1) turns to the regularregression, wheref(Y) =Y,b0 is a coefficient for the intercept,b(

ni oft FDAi isc ing asb

f

soo by-r nte ther gni-t etesa henb

ratioa s)a(

)

w

ography (CT) lesion size, and demonstrated a marginareatment benefit on the lesion reduction (p-value = 0.059)owever, we then explored the treatment effect on 3-mT lesion size in relationship to baseline risk factorsbserved a treatment-by-age interaction: there was g

esion reduction in younger patients with the treatmen-PA.

Assessment and interpretation of the treatment-byactor interaction is a challenge, of which clinical interpation for decision-making is weighted more than statisonsiderations. In this paper, we seek to provide rigotatistical methods to answer questions for clinical practor instances, How could we better test, understand an

imate the magnitude of the treatment effect on subgrouould the interaction lead to changing the treatment e

rom one direction (e.g., positive) to the other direction (ntive) and would this affect the therapeutic conclusion o

rial?, and Would a subgroup analysis show unique beor a particular subgroup? Moreover, If no treatment-by-nteraction is detected, is that safe to conclude no suceraction? Those questions were not fully addressed irevious publication.

In this paper, we review the notion of treatment-by-actor interaction and identify two types of interaction, quitative and qualitative, using a graphical technique and sical testing. We illustrate how to avoid drawing the erroneonclusion of treatment effect on subgroups when failinetect an interaction, and provide rigorous tools to esti

reatment effect on subgroups when a qualitative interas observed. Applications are presented using the dataected from the NINDS t-PA stroke studies.

1, b2 andb3 are coefficients forX, Z and the interactionXZe.g., treatment and diabetes interactions).

Testing H0 is equivalent to testb3 = 0. We can detect anteraction ifp-value < 0.10. Choosing critical 0.10 insteadhe usual 0.05 for interaction was recommended by then 1985 (Fleiss and Joseph, 1986) to be conservative, and thriterion has been used in many clinical studies becomtandard for testing the interaction. In addition, Eq.(1) cane rewritten as:

(Y ) = b0 + b1X + (b2 + b3X)Z (2)

As you can see from Eq.(2) that if an interaction ibserved,b3 �= 0, the treatment effect,b2 +b3X, is dependn value of X. In contrast, the absent of treatment-isk factor interaction,b3 = 0, indicates that the treatmeffect, b2, is robust regardless whether patients haveisk factor presence or not. We could estimate the maude treatment effects in different risk groups (e.g., diabnd non-diabetes groups) using following formulas, w3 �= 0.

For a binary outcome, we would calculate the oddss OR = exp (b2 +b3X); and the 95% confidence limits (CLs:

OR exp{−1.96[var(b2)+var(b3)X2+2cov(b2, b3)X]

1/2}

,

OR exp{

1.96[var(b2)+var(b3)X2+2cov(b2, b3)X]1/2

})

(3

hereX is a fixed value of the covariate.

Page 3: Beyond subgroup analysis: improving the clinical interpretation of treatment effects in stroke research

M. Lu et al. / Journal of Neuroscience Methods 143 (2005) 209–216 211

For a continuous response, we have estimated responseand its 95% CLs asY = (b0 + b1X) + (b2 + b3X)Z with95% CLs as:(Y − 1.96var(Y )1/2, Y + 1.96var(Y )1/2

)(4)

where, the variance ofYcan be calculated using the followingformula:

var(Y ) = var(b0) + var(b1)X2 + var(b2)Z + var(b3)X2Z

+2cov(b0, b1)X + 2cov(b0, b2)Z

+ 2cov(b0, b3)XZ + 2cov(b1, b2)XZ

+2cov(b1, b3)X2Z + 2cov(b2, b3)XZ

It is clear that regression model can be extended to a co-variateX as a continuous variable. The regression model inEq.(1) is the only way and efficient way to test the treatmenteffect on subgroups There is no justification to estimate treat-ment effect by a risk group using Eq.(3), Eq.(4), or conduct asubgroup analysis, when H0 orb3 = 0 is true. More discussionis followed in Section5.

3. Magnitude of the treatment effect on subgroups

3

thet therq them ’ riskp t in di-r lvesa le; at old.F byA dg ari-a forl ita-t ons s onli t forp

k( inaryr oddsr mente edi areg hasoq ert alue

Fig. 1. Two types of treatment–covariate interactions—the treatment effectfor a continuous response variable (Y): (A) a possible quantitative interaction;(B) a possible qualitative interaction; (C) no interaction.

of X, although the treatment effect declined asX increases; apossiblequantitativeinteraction (Fig. 2B), despite an overalltreatment effect, the odds ratio and its lower 95% lower boundis greater than 1.0 for most of all values of the covariateX,and the treatment effect changes the direction atX0; and,in contrast, a possiblequalitative interaction (Fig. 2C), thetreatment effect changes in direction at the level ofX0. Theodds ratio is greater than 1.0 for some values of the covariate,but either becomes 1.0 or actually crosses over the 1.0 lineand becomes significantly less than 1.0. Such a qualitativeinteraction would indicate the potential for benefit or harmdue to the treatment depending on the value of the covariate.

For a continuous response (e.g., the total lesion size ortransformed lesion size for ischemic strokes detected onimaging studies), a significant treatment effect is detectedwhen the 95% confidence limits, also called confidence in-terval, for the mean in each group are not overlapping. Theplots are similar toFig. 1A and B except that four more lineswould be drawn for the 95% confidence limits, a pair of linesfor the treated group and another pair of lines for the placebogroup using formula in Eq.(4).

In summary, the graphic approach provides a heuristic il-lustration of the quantitative, qualitative interaction and nointeraction. Any graphic indication of the qualitative interac-tion should be further tested using a proper statistical method.

.1. A graphic approach

Peto (1982) first suggested a classification ofreatment-by-risk factor (covariate) interaction as eiuantitative or qualitative. For a quantitative interaction,agnitude of responses varies according to the patientsresence and absence, but those effects are not differenection. A qualitative interaction, on the other hand, invochange in direction of the treatment effect for examp

reatment may benefit the young, but actually harm theig. 1A andFig. 1B illustrate the two types of interactioniken and West (1991). In Fig. 1A, the effect of the treateroup is superior to placebo on all of the values of covteX. The difference (or the effect of treatment) is larger

arge values ofX than for small ones, indicating a quantive interaction. InFig. 1B, placebo has better responsesmall values ofX, but the treatment has better response

arge values ofX; so this is a qualitative interaction.Fig. 1Cllustrates no interaction; a consistent treatment benefiatients with any risk level (any value) ofX.

The graphic approach proposed byAiken and West’s wor1991)for a continuous response can be extended to a besponse (e.g., stroke recovery) with the estimation ofatios for a treated group versus placebo for the treatffect using Eq.(3). A positive treatment effect is observ

f the odds ratio and its 95% confidence limits (CLs)reater than 1.0. The treatment-by-risk factor interactionne of three patterns illustrated inFig. 2A–C: a definitelyuantitative interaction (Fig. 2A), the odds ratio is great

han 1 and so as its lower 95% confidence limit on any v

Page 4: Beyond subgroup analysis: improving the clinical interpretation of treatment effects in stroke research

212 M. Lu et al. / Journal of Neuroscience Methods 143 (2005) 209–216

Fig. 2. Two types of treatment–covariate interactions for binaryresponses—the odds ratio of the treatment vs. placebo depending on thecovariateX as a continuous variable: (A) a quantitative interaction; (B) apossible quantitative interaction; (C) a possible qualitative interaction. Theodds is defined as the proportion of the response divided by the proportionof no-response in a treatment or placebo group.

3.2. Test statistics for a qualitative interaction

In theory, the concepts of no interaction, quantitative inter-action, non-qualitative interaction and qualitative interactionare proposed byGail and Simon (1985)and illustrated inFig. 3 for two subgroups of patients (e.g.,X= 0 or 1 as di-abetes presence or absence). Letδl andδ2 be the treatmenteffects in subgroups 0 and 1, respectively.δl andδ2 could bethe log of odds ratios for a binary outcome, or the differencein means for a continuous outcome.δ1 = δ2 = 0 at the origin,or odds ratio as 1 indicates no treatment effects in both sub-groups. Theδ1 = δ2 represents no treatment–covariate inter-action with a positive treatment effect, ifδ1 or δ2 are positive(e.g., odds ratios > 1) or with a negative treatment effect whenδ1 or δ2 are both negative (e.g., odds ratios < 1). Any points,such as the points falling into quadrant II (δ1 > 0 andδ2 < 0)or VI (δ1 < 0 andδ2 > 0) indicates a qualitative interaction.The qualitative interaction is generally important for clini-cal trial study. Detecting a statistically significant qualitativeinteraction will substantiate the results.

Gail and Simon (1985)proposed the likelihood ratio testto test for qualitative interaction. In their algorithm, the like-lihood ratio test can be express as min(Q+, Q−), whereQ+ ≡ ∑

(Di/σiI) (Di > 0) and)I(Di > 0) = 1 if all Di > 0and 0 otherwise.Q− is defined otherwise for allDi < 0 Di

a d de-v fora ouso tivetto thed sub-g

ali-t odr t testc enti odr rmfuli alita-t1wp

a-t asedo at-m e. Int sDc chw reat-m eq tivei

ndσ i are the estimated treatment effect and standariation in ith subgroups, such as the log (odds ratio)binary outcome or difference in means for a continu

utcome in theith subgroups. We can detect a qualitareatment–covariate interaction, if (Q+, Q−) >c, wherec ishe critical value for the likelihood ratio test listed inTable A1f Appendix 1. The qualitative interaction arises whenirection of the true treatment differences varies amongroups.

Piantadosi (1993)proposed a new range test for the quative interaction that to simplify Gail and Simon’s likelihoatio test. He stated that the range test is more efficienompared to the likelihood ratio test when the new treatms harmfulonly in a few subgroups; whereas, the likelihoatio test has greater power when the new treatment is han several subgroups. Using the range test, we detect a quive interaction if both max{Di/σi, where I(Di > 0) =}C′

2α and min{Di/σi, where I(Di < 0) = 1} < −C′2α,

here critical values of C′2α are listed inTable A2of Ap-endix 1.

Shuster and Eys (1983)had methodology for deriving pients into mutually exclusive three regions (subgroups) bn a continuous covariateXas the treatment is positive; treent effect is uncertain; and the treatment is negativ

heir approach, the derived point ofX can be calculated a0 =−b1/b3, from Eq.(1). We can estimatedD0 and its 95%onfidence limits.D0’s upper bound confidence limit, whiould be equivalent to the lower bound of the negative tent effect, falls in the range ofX-collected indicates thualitative interaction; otherwise it would be a quantita

nteraction.

Page 5: Beyond subgroup analysis: improving the clinical interpretation of treatment effects in stroke research

M. Lu et al. / Journal of Neuroscience Methods 143 (2005) 209–216 213

Fig. 3. The space of treatment effect for two subgroup of patients.

We later applied those test statistics for the qualitativeinteraction to study the t-PA treatment effect on subgroupsusing the NINDS t-PA stroke data.

4. Therapeutic conclusion

If a treatment-by-covariate interaction is observed after theoverall treatment effect is detected for the primary hypothesis,how can we draw a conclusion regarding the treatment effect?

Risk factors (covariates) are of central importance to clin-icians because a treatment can be recommended only af-ter careful examination of the patient’s medical history andpresent medical condition. In clinical trials, the randomiza-tion process is supposed to render different treatment groupscomparable with respect to all of the covariates whetherknown or unknown based on the randomization theory byPiantadosi (1997). Therefore, the statistical test for the treat-ment effect is regardless those covariates, except the stratifiedvariables or the variables that are known to be unbalancedbetween groups, isalwaysa valid statistical test, because ofthe nature of randomized treatment allocation. However, ac-cording toBuyse (1989), this test statistic may not be the beststatistical test for treatment effect. The adjusted estimates ofthe treatment effect, which consider a covariate, might bec ntialia ultsi isiono

erallt iate

interaction, especially for a qualitative interaction, wouldraise considerable caution about the overall treatment effect.Given that clinical trials are often designed to test the overalltreatment effect with assumption of randomization, the testfor overall treatment effect should be reported as the primaryresult. Any treatment-by-covariate interactions must bedisclosed and yet caution must be raised about applyingthe treatment until the interaction is fully understood andvalidated. Subgroup results should be rarely reported as theprimary results of a clinical trial, if the trial is not designedfor a subgroup study.

5. Treatment effect on subgroups and a subgroupanalysis

In general, to control the type I error, a subgroup anal-ysis should be consideredonly if a treatment-by-covariateis detected whenp-value for interaction < 0.10. Situations inwhich a particular subgroup does not benefit at all from thetreatment, or conversely in which the benefit is extremelylarge, are certainly important to discover. A naıve yet tradi-tional approach is to conduct a subgroup analysis. However,in some clinical research reported byFrasure-Smith et al.(1997)andKostis et al. (1997), subgroup analysis was con-d nif-i

tivei s not ppenqa were

loser to the true treatment effect, if there was a substambalance, in spite of the randomization. In contrast,Beachnd Meier (1989)claimed that the adjustment seldom res

n any benefit at all because it does not increase the precf the estimate of the treatment effect.

Nevertheless, in a positive trial based on the ovreatment effect in conjunction with a treatment-by-covar

ucted without first testing the interaction. There was sigcant increase the possibility of type I error.

Unfortunately, that treatment seems highly effecn one subgroup and not at all in others when there ireatment-by-covariate interaction at 0.10 level can hauite easily just by chance alone.Peto (1982)conductedsimulation study and showed that when patients

Page 6: Beyond subgroup analysis: improving the clinical interpretation of treatment effects in stroke research

214 M. Lu et al. / Journal of Neuroscience Methods 143 (2005) 209–216

divided into two subgroups with similar size, if the overalltreatment effect is significant (e.g.,p-value < 0.05), there isone-third of chance the treatment effect will be larger andsignificant in one subgroup (p-value < 0.05) and negligiblein the others (p-value < 0.50). Thus, looking at treatmenteffect in an individual subgroup is a sure way to be misledby the play of chance. On the other hand, when the overalltreatment effect is not significant but has a trend, for example,p-value < 0.32, with five subgroups are analyzed separatelyfor a treatment effect, we would have a 21% chance todetect a significant treatment effect (<0.05) in at least onesubgroup. Therefore, the traditional subgroup analysiswithout rigorous analysis of the treatment-by-covariateinteraction is quite likely to lead to spurious and erroneousconclusions.

If the treatment-by-risk factor interaction is detected at thecritical value of 0.10, we can estimates treatment effect onsubgroups by testing the treatment-by-risk factor interactionin Eq.(3) or (4) to increase precision. We can also conduct asubgroup analysis when sample size is large.

6. Statistical power

If we fail to detect the treatment-by-risk factor interaction,i antt lacko e ing e-s ticalt riatei op-u hep iates( ulti-v ts ofN

7N

7i

okew 95),s to de-t s andt -PAt d asn ea-s okeS elineN nsett h

the odds ratio of treatment-by-time interaction as 0.993 andconfidence limits as 0.984–1.001, reported in the previouspublication. The treatment effect varied on OTT was pre-sented inFig. 2 of the previous publication (Marler et al.,2000) using the proposed method in Section2. Furthermore,the t-PA treatment is significantly superior for all patientswho are treated within 148 min from stroke onset, there wasborderline treatment benefits for patients treated afterwardsand treatment effect would change directions at the treatmentadministered at 203 min after stroke onset (the lower boundof negative treatment effect region), which is far beyond theapproved 3-h time limit for using rt-PA in stroke patients.The likelihood ratio (Gail and Simon, 1985) and ranges tests(Piantadosi, 1993) for the qualitative interaction were con-ducted and no qualitative interaction was identified in eithertests. The results confirmed a quantitative treatment-by-OTTinteraction.

7.2. The age-treatment interaction on 3-month lesionvolume obtained at 3-month CT scans

In a study of treatment effect on 3-month lesion size, wefirst report a marginal treatment effect (p-value = 0.059) basedon the intention-to-treat algorithm that imputes a lesion sizefor patients who died or missed the follow-up (The NINDSt cia-t vol-u inter-a linec nter-a effecto t theo ge in-t -t usingf ndr ctionb 70 to< of aq rew t-PAt

8

thep ut-c d andr upso h le-s 00;M pec-u her-a fit orh ere is

t does not always indicate a lack of the clinical importreatment-by-risk factor interaction. It could be due to af power for testing such interaction, which would be trueneral for any clinical trial, especially, if a trial is not digned to detect such interaction. The power of a statisest is the probability to detect such treatment-by-covanteraction, when, in fact, the interaction exists in this plation.Cohen (1988)developed a formula to calculate tower, which takes into account the number of covare.g., the treatment variable and/or risk factors) in a mariable setting, which was applied to our previous resulINDS Stroke Trials (1997).

. Applications: study of qualitative interactions theINDS t-PA stroke trials

.1. The t-PA treatment-by-the time of stroke onsetnteraction on 3-month favorable outcome

After significant t-PA benefit at 3 months after acute stras reported by the NINDS t-PA Stroke Study Group (19ubsequent studies by the Study Group were conductedermine the association between the baseline covariatehe 3-month favorable outcome with and without the treatment. The 3-month favorable outcome was defineo or a minimal deficit at 3 months after stroke and was mured from four clinical assessments (NINDS t-PA Strtudy Group, 1995). After adjusting for unbalanced basIHSS, we discovered a treatment-by-time of stroke o

o treatment (OTT) interaction withp-value = 0.09 and wit

-PA Stroke Study Group 2000). In addition, the assoions among baseline covariates, treatment, and lesionme were tested. The results showed a treatment–agection (p-value = 0.01) after adjusting for the other baseovariates, including presumptive stroke category, an iction between baseline NIHSS and the edema, massr hyperdense artery sign on the baseline CT. Withouther baseline variable adjustments, the treatment-by-a

eraction remained.Fig. 2 in our previous publication illusrated the treatment effect at each age level, calculatedormula in Eq.(4). Moreover, the likelihood ratio test aange test were performed to test the qualitative interaased on five age categories <50, 50 to <60, 60 to <70,80, and 80 and over. The results showed no evidenceualitative interaction (p-values > 0.20), indicating that theas no threshold of age or age category to withhold the

reatment.

. Discussion and conclusion

After the NINDS t-PA Stroke Study Group reportedrimary results of t-PA benefit on 3-month favorable oome, a series of exploratory analyses were conducteeported to explore treatment benefit in various risk gron the 3-month favorable outcome and on the 3-montion size (The NINDS t-PA Stroke Study Group, 1997, 20arler et al., 2000). Thus, have been risen concerns and slations of a subgroup that the treatment t-PA with 3-h tpeutic window of stroke onset may not have a benearm the patient, although the results concluded that th

Page 7: Beyond subgroup analysis: improving the clinical interpretation of treatment effects in stroke research

M. Lu et al. / Journal of Neuroscience Methods 143 (2005) 209–216 215

no evidence that would lead physician to withhold the treat-ment of t-PA. In this paper, we further studied the magnitudeof the t-PA treatment effect on subgroups (treatment-by-riskfactor interaction) and tested the qualitative interactions forthe treatment-by-OTT and the treatment-by-age interactionson the previous publication. Using the proposed statisticalmethods, we demonstrated that t-PA treatment influences the3-month favorable outcome, depending on the time of strokeonset to the time of the study treatment; odds ratio for thefavorable outcome was decreased as the time of stroke on-set to time of the treatment increased; however, there wasno crossover effect. In other words, the t-PA treatment ef-fect is beneficial throughout the 3-h treatment window. Thet-PA treatment marginally reduced the 3-month lesion size,compared to the placebo. There is a potential lack of t-PAtreatment effect on 3-month lesion size reduction for pa-tients being older; however, the fact of treatment effect chang-ing in direction based on age is not significant (no qualita-tive interaction was detected). Based on NINDS t-PA strokedata, we concluded a consistent and significant t-PA bene-fit on 3-month favorable outcome, and marginal benefit on3-month lesion reduction, compared to the placebo treatedgroups. Results confirmed our previous conclusions on thetreatment t-PA benefit. No threshold of a covariate or a sub-group of patients would lead a physician to withhold the t-PAt

reat-m rac-t yt rac-t entc -t ew s ate e thet lysisi notr

evelos thea on-s nefitc event lt isd ver-a n isi

t( thet hent be-t e co-v videst acha ing on

the level of the risk factor, therefore distinguishing the differ-ences between quantitative and qualitative interactions. Anytreatment-by-covariate interaction in Eq.(1) will be furtherstudied for testing either the qualitative or the quantitative in-teraction. Qualitative interaction indicates treatment effect onsubgroups changes in direction from positive to negative. It isvery important to identify subgroups with negative responsefor a therapeutic decision, if it, in fact, exits.

We discussed that in any clinical trial, the randomizationprocess tends to make different treatment groups compara-ble with respect to all of the covariates, whether known orunknown within stratification. The statistical test, compar-ing treatment groups without regard to covariates except thestratified variables, is always avalid test. We should alwaysreport the overall treatment effect as the primary results ofthe clinical trial, if the trial is designed to study of the overalltreatment effect.

Failure to detect a significant interaction may be due to thelack of power for testing such interaction. If there is strongbelieve that the risk factor influences the treatment effect,but the treatment-by-risk factor interaction is not significant,power calculation becomes necessary to guide clinical inter-pretation and conclusion. An observation of over 80 or 90%of power provides confirmation of no such interaction other-wise such insignificant interaction will be further validated orc ion.I e, its , sucha ncea ficientp as-s

sent-i mente c-t ationo trokec odsp oupa

A

ass 01-N 79,N NS-0 e ofN

A

reatment.We emphasize that the regression model including t

ent, risk factor and the treatment-by-risk factor inteion, described in Eq.(1) is only way and the efficient wao study treatment effect on subgroups. When the inteion is detected at 0.10 level, the magnitude of treatman be estimated using Eq.(3) for binary outcome with esimation of odds ratio, Eq.(4) for a continuous outcomith estimation the mean and its 95% confidence limitach risk level. A subgroup analysis can also estimat

reatment effect at each subgroup, however, this anas less efficient when sample size is small, which isecommended.

In addition, the magnitude of treatment at each risk lr any subgroup analysis is appropriate onlywhenthere is aignificant treatment-by-risk factor interaction, otherwisenalysis will be considered as exploratory. We have demtrated a simulation results that a significant treatment beould be easily identified in a subgroup by chance alone,here is no overall treatment effects. A trial primary resurawn from a subgroup analysis without reporting an oll treatment effect or the treatment–risk factor interactio

nappropriate and increases the type I error.We extend the statistical method given byAiken and Wes

1991)to include binary outcome variables and illustratereatment-by-covariate interaction in a quantitative way whe interaction was detected. Visualizing the relationshipween the treatment and the response on the value of thariate, or the presence or absence of the covariate, prohe first intuition of the interaction. The graphical approlso describes the magnitude of treatment effect depend

onfirmed in the other study with a similar study populatf a risk factor varies the treatment effect on an outcomhould be specified and be considered in the trial designs stratification of the patients with the risk factor presend absence, and insure the sample size to have sufower to detect the treatment-by-risk factor interactionuming a quantitative interaction.

These and other analyses provide a foundation for preng a novel, widely applicable method for assessing treatffects under conditions of potentially modifying risk fa

ors. The methodology we present, which is an adaptf standard statistical methods to the unique case of slinical trials, could be widely implemented. These methrovide a powerful, rigorous alternative to crude “subgrnalysis”.

cknowledgments

The authors thank Lula Adams for editing. This work wupported in part by NIH contracts # N01-NS-02382, NS-02374, N01-NS-02377, N01-NS-02381, N0-NS-0230-NS-02373, N0-NS-02376, N01-NS-02378 and N01-2380, Grant P01-NS23393 from the National Instituteurological Disorders and Stroke, Bethesda, MD.

ppendix A

SeeTables A1 and A2.

Page 8: Beyond subgroup analysis: improving the clinical interpretation of treatment effects in stroke research

216 M. Lu et al. / Journal of Neuroscience Methods 143 (2005) 209–216

Table A1Critical values{c} for the likelihood ratio test min (Q+ Q−) >c

Number of groups Significant level

0.20 0.10 0.05 0.001

2 0.71 1.64 2.71 9.553 1.73 2.95 4.23 11.764 2.59 4.01 5.43 13.475 3.39 4.96 6.50 14.956 4.14 5.84 7.48 16.327 4.86 6.67 8.41 17.588 5.56 7.48 9.29 18.789 6.75 8.26 10.51 19.93

10 6.69 9.02 10.99 21.0312 8.24 10.50 12.66 23.1314 9.53 11.93 14.15 25.1716 10.79 13.33 15.66 27.1018 12.03 14.70 17.13 28.9620 13.26 16.04 18.57 30.7925 16.28 19.34 22.09 35.1530 19.25 22.55 25.50 39.33

Table A2Critical valuesC′

2αfor the two-sided range test

References

Aiken LS, West SG. Multiple regression: testing and interpreting interac-tions. first ed. Thousand Oaks, CA: Sage Publications; 1991.

Beach M, Meier P. Choosing covariates in the analysis of clinical trials.Controlled Clin Trials 1989;10:161S–75S.

Buyse ME. Analysis of clinical trial outcomes: some comments on sub-group analysis. Controlled Clin Trials 1989;10:187S–94S.

Cohen J. Statistical power analysis for the behavioral sciences. Seconded. New Jersey: Lawrence Erlbaum Associates Inc.; 1988.

Fleiss, Joseph L. Analysis of data from multiclinic trials. Controlled ClinTrials 1986;7:267–75.

Frasure-Smith N, Lesperance F, Prince RH, Verrier P, Garber RA, JuneauM, et al. Randomized trial of home-based psychosocial nursing in-tervention for patients recovering from myocardial infarction. Lancet1997;350:473–9.

Gail M, Simon R. Testing for qualitative interactions between treatmenteffect and patient subsets. Biometrics 1985;41:361–72.

Hacke W, Donnan G, Fieschi C, Kaste M, von Kummer R, BroderickJP, et al. Association of outcome with early stroke treatment: pooledanalysis of ATLANTIS, EC ASS, and NINDS rt-PA stroke trials.Lancet 2004;363:768–74.

Kostis JB, Davis BR, Cutler J, Grimm Jr RH, Berge KG, Cohen JD, etal. Prevention of heart failure by antihypertensive drug treatment inolder persons with isolated systolic hypertension. SHEP CooperativeResearch Group. JAMA 1997;278:212–6.

Marler JR, Tilley BC, Lu M, Brott TG, Lyden PC, Grotta JC, et al. Earlystroke treatment associated with better outcome: the NINDS rt-PAstroke study. Neurology 2000;55:1649–55.

National Institute of Neurological Disorders and Stroke rt-PA Stroketroke.

P pman

P ative

P ork:

P is, co-rting:

S tment.

T fortroke

T nantsured

Number of groups Significant level

0.20 0.10 0.05 0.001

2 0.84 1.64 1.96 3.093 1.25 1.95 2.24 3.294 1.46 2.12 2.39 3.405 1.60 2.23 2.49 3.486 1.71 2.32 2.57 3.547 1.79 2.39 2.63 3.598 1.86 2.44 2.69 3.639 1.92 2.49 2.73 3.66

10 1.97 2.53 2.77 3.6912 2.05 2.60 2.83 3.7414 2.12 2.66 2.89 3.7816 2.18 2.71 2.93 3.8218 2.23 2.75 2.97 3.8520 2.27 2.78 3.00 3.8825 2.36 2.86 3.07 3.9330 2.42 2.92 3.13 3.98

Study Group. Tissue plasminogen activator for acute ischemic sN Engl J Med 1995;333:1581–1587.

eto DP. Statistical aspects of cancer trials. first ed. London: Chaand Hall; 1982.

iantadosi S. A comparison of the power of two tests for qualitinteraction. Stat Med 1993;12:1239–48.

iantadosi S. Clinical Trials. A methodologic perspective. New YJohn Wiley & Sons Inc; 1997.

ocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysvariate adjustment and baseline comparisons in clinical trial repocurrent practice and problems. Stat Med 2002;21:2917–30.

huster J, Eys JV. Interaction between prognostic factors and treaControlled Clin Trials 1983;4:209–14.

he NINDS t-PA Stroke Study Group. Generalized efficacy of t-PAacute stroke Subgroup analysis of the NINDS t-PA stroke trial. S1997;28(11):2119–2125.

he NINDS t-PA Stroke Study Group. Effect of intravenous recombitissue plasminogen activator on ischemic stroke lesion size meaby computed tomography. Stroke 2000;31:2912–2919.