Clinical Trials

16
http://ctj.sagepub.com Clinical Trials DOI: 10.1177/1740774507083434 2007; 4; 499 Clin Trials D.P. MacKinnon, C.M. Lockwood, C.H. Brown, W. Wang and J.M. Hoffman The intermediate endpoint effect in logistic and probit regression http://ctj.sagepub.com/cgi/content/abstract/4/5/499 The online version of this article can be found at: Published by: http://www.sagepublications.com On behalf of: The Society for Clinical Trials can be found at: Clinical Trials Additional services and information for http://ctj.sagepub.com/cgi/alerts Email Alerts: http://ctj.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://ctj.sagepub.com/cgi/content/refs/4/5/499 SAGE Journals Online and HighWire Press platforms): (this article cites 25 articles hosted on the Citations © 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.com Downloaded from

Transcript of Clinical Trials

http://ctj.sagepub.com

Clinical Trials

DOI: 10.1177/1740774507083434 2007; 4; 499 Clin Trials

D.P. MacKinnon, C.M. Lockwood, C.H. Brown, W. Wang and J.M. Hoffman The intermediate endpoint effect in logistic and probit regression

http://ctj.sagepub.com/cgi/content/abstract/4/5/499 The online version of this article can be found at:

Published by:

http://www.sagepublications.com

On behalf of:

The Society for Clinical Trials

can be found at:Clinical Trials Additional services and information for

http://ctj.sagepub.com/cgi/alerts Email Alerts:

http://ctj.sagepub.com/subscriptions Subscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

http://ctj.sagepub.com/cgi/content/refs/4/5/499SAGE Journals Online and HighWire Press platforms):

(this article cites 25 articles hosted on the Citations

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

Clinical Trials 2007; 4: 499–513ARTICLE

The intermediate endpoint effect in logistic andprobit regression

DP MacKinnona, CM Lockwooda, CH Brownb, W Wangc and JM Hoffmand

Background An intermediate endpoint is hypothesized to be in the middle of thecausal sequence relating an independent variable to a dependent variable.The intermediate variable is also called a surrogate or mediating variable and thecorresponding effect is called the mediated, surrogate endpoint, or intermediateendpoint effect. Clinical studies are often designed to change an intermediate orsurrogate endpoint and through this intermediate change influence the ultimateendpoint. In many intermediate endpoint clinical studies the dependent variable isbinary, and logistic or probit regression is used.Purpose The purpose of this study is to describe a limitation of a widely usedapproach to assessing intermediate endpoint effects and to propose an alternativemethod, based on products of coefficients, that yields more accurate results.Methods The intermediate endpoint model for a binary outcome is described for atrue binary outcome and for a dichotomization of a latent continuous outcome.Plots of true values and a simulation study are used to evaluate the differentmethods.Results Distorted estimates of the intermediate endpoint effect and incorrectconclusions can result from the application of widely used methods to assess theintermediate endpoint effect. The same problem occurs for the proportion of aneffect explained by an intermediate endpoint, which has been suggested as a usefulmeasure for identifying intermediate endpoints. A solution to this problem is givenbased on the relationship between latent variable modeling and logistic or probitregression.Limitations More complicated intermediate variable models are not addressed inthe study, although the methods described in the article can be extended to thesemore complicated models.Conclusions Researchers are encouraged to use an intermediate endpoint methodbased on the product of regression coefficients. A common method based ondifference in coefficient methods can lead to distorted conclusions regarding theintermediate effect. Clinical Trials 2007; 4: 499–513. http://ctj.sagepub.com

Introduction

The purpose of this article is to describe whystandard statistical approaches for examining theintermediate endpoint or mediated effect usingprobit and logistic regression can be inaccurate and

how to correct them. The focus of this article is thecase in which the dependent variable is binary suchas dead/alive, presence/absence of disease, or use/nonuse of drugs. This article examines methodol-ogy for modeling as to how an active interventioncondition, compared to control, affects this

Author for correspondence: David P. MacKinnon, Ph.D., Department of Psychology, Arizona State University, Tempe,AZ 85287-1104, USA. E-mail: [email protected]

aDepartment of Psychology, Arizona State University, Tempe, Arizona, USA, bDepartment of Epidemiology andBiostatistics, University of South Florida and Center for Health Statistics, University of Illinois at Chicago, USA,cDepartment of Epidemiology and Biostatistics, University of South Florida, USA, dDepartment of RehabilitationMedicine, University of Washington, Seattle, Washington, USA

� Society for Clinical Trials 2007SAGE Publications, Los Angeles, London, New Delhi and Singapore 10.1177/1740774507083434

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

outcome through a continuous intermediate ormediating variable. Logistic or probit regression aretraditionally used to examine this mediation effect.Such models are ubiquitous in clinical trials wherethe effect of a dichotomous treatment variable on adiagnosis or other dichotomous outcome ismediated by an intermediate variable.

The notion of an intervening or mediatingvariable is well established in clinical studies of abinary disease outcome [1,2] where the mediatedeffect is termed the surrogate or intermediateendpoint effect. Surrogate endpoints are a subsetof mediator variables where the theoretical andstatistical requirements for a variable to be asurrogate endpoint are more stringent than thosefor mediators in general. Surrogates are expectedto explain the full impact on a distal outcome; amediator may explain part of this effect. Oneimportant use of an intermediate variable is insituations where the ultimate outcome is difficultor costly to obtain. In both prevention andtreatment trials, the extended length of time for adisease or death to occur and the low incidencerates often require exorbitantly large sample sizes tostudy correlates of the disease. In this situation,researchers advocate the use of surrogate or inter-mediate endpoints [3,4]. Surrogate endpoints aremore frequent or more proximate to the preventionstrategy and are therefore easier to study. Examplesof surrogate endpoints are serum–cholesterol levelsfor the ultimate outcome of coronary heart disease[5] , measures of immune system response for theultimate outcome of death in HIV infected indivi-duals [6], and presence of polyps for the ultimateoutcome of colon cancer [7]. The use of surrogateendpoints rests on the mediation assumption thatthe independent variable causes the surrogate end-point, which, in turn, causes the ultimate outcome[8,9]. As a result, decomposing the effects by whicha surrogate explains impact on a distal outcome iscritically important because the surrogate endpointis considered to be causally related to the outcome.The importance of surrogate endpoints wasrecently summarized by Begg and Leung ([8],p. 27), Above all else, we believe that the issue ofwhen and how to use surrogate endpoints isprobably the pre-eminent contemporary problemin clinical trials methodology, so it merits muchextensive scrutiny.

Prentice ([9] p. 432) defined a surrogate endpointas a response variable for which a test of the nullhypothesis of no relationship to the treatmentgroups under comparison is also a valid test ofthe corresponding null hypothesis based on thetrue endpoint. Freedman et al. [10] proposed theproportion of the treatment effect explained by anintermediate variable to test this surrogate end-point effect. A value of 100% indicates that the

surrogate endpoint explains all of the relationbetween the treatment and the dependent variable,therefore satisfying Prentice’s definition. If theproportion mediated is <100%, it indicates thatsome of the relation may not be explained by thisintermediate variable and other causal mechanismsmay be neglected [11]. The proportion measurereflects the size of the surrogate endpoint (i.e.,mediated) effect as well as the amount of thetreatment effect explained by the surrogate end-point. The proportion mediated has not beenaccepted without criticism [12], namely that accu-rate identification of surrogate endpoints requiresmeasurement of the ultimate outcome [8], that theproportion measure can be negative or >1 [13], andthat values of the proportion mediated are oftenvery small. Furthermore, other research has shownthat the proportion mediated tends to be unstableunless the sample size is large or the effect size isvery large [14,15]. In response to the limitations ofthe proportion measure, Buyse and colleaguesproposed two criteria for a surrogate endpoint(see [16,17] for more specific information on thesecriteria). The first criteria is that the total effectrelating the treatment to the dependent variabledivided by the effect of the treatment on themediator or surrogate variable, is close to 1. Thelogic here is that an intermediate endpoint shouldbe affected by a treatment to the same magnitudeas the treatment affects the dependent variable. Asecond criterion requires that the relation betweenthe mediator and the dependent variable is alsosubstantial. Both the proportion mediated and themeasures proposed by Buyse and colleagues areimportant because they combine a measure ofeffect size along with a measure of the mediatedeffect. These Buyse et al. criteria do not include anestimator of the intermediate endpoint effect sothey are not addressed in this article. Otherapproaches to identification of surrogate endpointsfocus on the importance of meta analytical combi-nation of results from individual studies [18,19]and more detailed approaches to causal inferencefrom one study [20]. The present article focuses onestimators of the intermediate endpoint effect froma single study.

In addition to replacing costly outcome mea-sures, intermediate variables can also be used toelucidate how an intervention leads to changes inthe outcome variable. A common example is in thefield of prevention research where programs aredesigned to change intermediate variables thatprevent negative outcomes. An example of testingmediation with binary outcomes was describedby Foshee and colleagues [21] in which aprogram to prevent adolescent dating violencewas hypothesized to work by changing norms,decreasing gender stereotyping, improving conflict

500 DP Mackinnon et al.

Clinical Trials 2007; 4: 499–513 http://ctj.sagepub.com

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

management, changing beliefs about the need forhelp, increasing awareness of services for victimsand perpetrators, and increasing help-seeking beha-vior. The researchers compared the values of thelogistic regression coefficient of treatment predict-ing the binary dating violence outcome with andwithout controlling for the proposed mediators.The authors considered evidence of mediation tooccur when the difference between the two valueswas 20% of the value of the unadjusted coefficient(i.e., 20% of the program effect was explained bythe mediators). Given this criterion, results indi-cated that the treatment effect on sexual violencewas mediated by changes in norms, gender stereo-typing, and awareness of victim services.

As described by several researchers, the examina-tion of mediation in the evaluation of preventionprograms and clinical trials is important for at leasttwo major reasons [22–24]. First, most trials arebased on changing mediating variables hypothe-sized to be related to the outcome measure. It isimportant to assess whether the program success-fully changed the mediating variables it wasdesigned to change. If the program was not ableto change the mediating variable, then it would notbe surprising that effects on outcomes were notobserved. Second, the extent to which change inone or more mediating variables account forobserved program effects sheds light on the theore-tical basis of the program. This information haspractical implications as well, because ineffectiveintervention components may be dropped if theydo not contribute to intervention success orintervention components may be added if theoriginal intervention only works through somebut not all hypothesized mediators. Statisticalmethods for assessing mediation for continuousoutcomes have been described [25,26], as well asmethods for specific statistical models such ascovariance structure models [27] and multilevelmodels [28]. Issues related to mediation analysisfor categorical outcomes have not received muchresearch attention.

Two general approaches have been offered toquantify mediation [23,24,29]. In the situationwhere the dependent and mediating variables arecontinuous and normal-based models are used,one way to quantify mediation is to compare theregression coefficient of the outcome on theindependent variable both without adjustment forthe mediator (�z in Equation (5)) and with adjust-ment for the mediator (�z in Equation (1)). If theunadjusted independent variable coefficient �z isnonzero and the adjusted coefficient �z is zerowhen the mediator is included in the model, theeffect of the independent variable is entirelymediated by the mediating variable. Standarderrors for the mediated effect are available [26]

and are generally accurate [15]. This is called thedifference in coefficients method.

A second general method, based on path analysisand more commonly used in the social sciencesfor continuous outcomes, treats mediation as theproduct of two regression coefficients, the regres-sion of the mediator on the independent variable(�z in Equation (2)) and the partial regressioncoefficient of the outcome on the mediator,adjusted for the independent variable (�m inEquation (1)) [30]. This method is called theproduct of coefficients method [26].

For standard least squares regression modelswithout missing data, the difference in coefficientsand the product of coefficients estimators areidentical [15]. For binary response variables, thetwo estimators are not identical and can differdramatically, as shown below. The most widelyused estimator of the standard error of thismediated effect was derived by Sobel [27] usingthe multivariate delta method under the assump-tion of normality (see MacKinnon et al. [26] forexamples of other standard errors).

The proportion mediated measure, 1� �z=�z(from Equations below), was suggested byFreedman et al. [10]; we note that this measure isnot necessarily between 0 and 1. Other possibledefinitions of proportion mediated measures are�m�z=�z and �m�z=ð�z þ �m�zÞ (from Equationsdescribed below). In the continuous dependentvariable case with no missing data, all threeproportion measures for the mediated effect areequal. Because logistic and probit regression arenonlinear models, the point estimates of these twoquantities are not equal. In this article we showsome of the estimates have low bias while othersrequire standardization procedures in order to beappropriate.

The mediated effect along with its standard errorcan be used to create confidence limits for themediated effect [15]. For dichotomous outcomes,Freedman et al. [10] proposed a variance estimatorof the difference in coefficients method. The resultsof this article suggest a preference for the product ofcoefficients method.

Complications with the estimation of theintermediate variable effect with a binarydependent variable

When the dependent variable is binary and logisticregression is used, the difference in coefficients andproduct of coefficients methods for calculating themediated effect are not equal as MacKinnon andDwyer [24] demonstrated with a simulation study.This article will examine estimation methods formediation analysis with binary outcomes, describe

Intermediate end point effect 501

http://ctj.sagepub.com Clinical Trials 2007; 4: 499–513

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

a problem that arises with the estimation, andpropose a solution. The solution is demonstratedwith analytical methods and a simulation studythat investigates both small sample properties androbustness to distributional assumptions. Finally,we present a case study of a randomized preventivetrial to prevent later cigarette use in adolescence.

Estimation of the mediated effect withbinary outcomes

Let Z be a binary variable representing interventionstatus (1 for active intervention and 0 for control),and let the binary Y be the major outcome variable(diagnosis, death, or improvement). Define X to bep-dimensional covariates measured at baseline. LetM be the hypothesized surrogate variable. In mosttrials, M is measured with a continuous variable,rather than a dichotomous one, since the formerprovides much greater statistical power. For bothpractical and statistical reasons, we consider thecase of a continuous M variable, with normal ornon-normal errors conditional on interventioncondition.

As in the case of a continuous dependentvariable, the mediated effect can be calculated intwo ways when the dependent variable is binary[24]. To calculate the mediated effect both methodsuse information from two of the three equationslisted below. In this model we assume no treatmentby baseline interaction or moderation.

First, Equation (1) shows a logistic regressionmodel where Y depends on M as well as on baselinecovariates X and intervention condition Z1

logit PrfY ¼ 1jX ¼ x,M ¼ m,Z ¼ zg

¼ �0 þ c0xxþ �mmþ �zz, z ¼ 1, 0 ð1Þ

where the � coefficients refer to intercept (c0),covariates (c0x), and intervention (cz) after adjust-ment for the mediator. For probit regression, asimilar expression exists, with logit replaced by theinverse distribution function for a standard normal.

Second, a linear relationship for the relationbetween Z and M is assumed.

M ¼ �0 þ �0xxþ �zzþ "m ð2Þ

Eð"mjx; zÞ ¼ 0; z ¼ 0;1 ð3Þ

The � coefficients refer to the prediction of M(ignoring Y completely). Like the previous model,this model assumes no baseline by interventioninteraction. Also, most analytic models impose adistributional assumption on the error, such asnormality with a homogeneous variance. Thisdistributional assumption is more restrictive thanthe conditional mean specifications given inEquations (2) and (3). To specify this exactdistribution, fMjXZðmjx; zÞ refers to the conditionaldensity of M given covariates X ¼ x and interven-tion condition Z ¼ z.

The subscript i is used to indicate subject levelcovariates (Xi), intervention assignment (Zi), med-iator (Mi), and distal outcome (Yi), for subjectsi ¼ 1, . . . ,n. It is assumed that each mediator isindependent of all other mediator outcomes andis given by Equations (2) and (3) conditional on allcovariates and intervention assignments. Also,given all covariates, intervention assignments,and mediators, the distal outcomes Y1, . . . ,Yn areall independent and follow the distribution speci-fied in Equation (1).

Under the above assumptions, as sample sizeincreases, asymptotically unbiased estimates of �’sand � 0s can be estimated through standard leastsquares and logistic regression respectively. Further,the marginal relationship between Z and Y, condi-tional on X but unconditional on M, is given by

PrfY ¼ 1jX ¼ x,Z ¼ zg ¼

Z 1

�1

� ð1þ e��0�c0xx�cmm��zzÞ�1fMjXZðmjx, zÞdm,

z ¼ 1, 0:

ð4Þ

Equation (4) shows the true relation between Z andY, conditioning on multiple covariates X but notdepending on M. This relation is called themarginal relation between Z and Y in this article.Without knowing the explicit form of fMjXZ, theabove integral cannot be evaluated explicitly, andregardless of the form of this error distribution,there is no simple form for this integral. Instead,researchers often introduce a standard logit modelfor this marginal relationship between Y and X andZ. In particular, consider the following model:

logit PrfY ¼ 1jX ¼ x,Z ¼ zg

¼ �0 þ b0xxþ �zz, z ¼ 1, 0 ð5Þ

1In the social science literature four symbols are often used to express the mediation equations. Our notation in thisarticle allows other covariates and is more in line with notation used in standard logistic regression modeling. Thisfootnote shows their comparability. The symbol a, the regression of the mediator on the intervention condition, is givenby �z in this paper’s notation. The symbol b, the partial regression of the outcome on the mediator adjusted for theintervention condition, is given by �m in our notation. The symbol c for the regression of the outcome on theintervention condition without adjusting for the mediator is given by �z in our notation. Finally, c0, the partial regressionof the outcome on the intervention condition adjusted for the mediator, is given by �z.

502 DP Mackinnon et al.

Clinical Trials 2007; 4: 499–513 http://ctj.sagepub.com

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

Equation (4) reduces to the logit form given inEquation (5) only if there are no covariates [31].When there are no covariates, the binary nature ofZ allows specification of these parameters in astandard logit model. Even in this case of nocovariates, however, there is no analytic way toobtain the exact value of �z from the � and �coefficients; the integral needs to be evaluatednumerically or approximated.

Quantifying mediation using the difference ofcoefficients method

There are two equivalent methods for examiningthe degree of mediation, difference in coefficientsand product of coefficients methods. As will beshown below, these two methods are not the samefor binary outcome models. A natural method forquantifying mediation is to compare the coeffi-cients for the marginal relationship between Z andY, to that adjusting for M. No difference in thesecoefficients implies that knowledge of M plays norole in predicting how intervention affects thedistal outcome. This method has been proposedfor dichotomous outcomes as well, but applied tothe approximate marginal model Equation (5)rather than the exact marginal model given byEquation (4). Thus this straightforward (but as wewill see inaccurate) difference in coefficientsmethod is based on the parameter difference,�z � �z. It is estimated by fitting two separatelogistic regression models, one using covariatesand intervention status as predictors, the otherincluding the mediator as well. We know of noapplied research examples that use the difference incoefficients method with the correct marginalrelationship, Equation (4).

Quantifying mediation using the product ofcoefficients method

The second method for assessing mediation incontinuous outcome models relies on a pathdiagram approach. Mediation depends on theproduct of the relation between intervention andmediator, and the relation between mediator anddistal outcome, adjusted for intervention. A directapplication of this method to the regression modelof M on Z and the logistic regression model of Y onboth M and Z leads to examination of �z�m.

Mediation using latent variables

Since there is equivalence between the difference incoefficient and product of coefficient methods with

linear, continuous models, it is useful to embed theproblem within a continuous model. Investigationof the nonequivalence of the methods clarifies howthe two approaches differ statistically and concep-tually. The equivalence of the two methods isdemonstrated using an alternative representationof these logistic regression models in terms of anunderlying latent variable, an approach that goesback at least to Finney [32].

Equation (1) can be expressed equivalently byintroducing the unobserved latent variable Y* thatis linearly related to Z and M as well as covariates X.

Y� ¼ �0 þ c0xxþ �mmþ �zzþ "y, z ¼ 1, 0: ð6Þ

In this model we let "y represent the residualvariability having a standard logistic distribution,that is,

Prf"y<ug ¼eu

1þ eu: ð7Þ

We also assume that this error in predicting Y* isindependent of the error predicting M from Zin Equation (2). The dichotomous, observable Yis derived from Y* through the relation Y¼1 ifand only if Y* > 0. This definition of Y in termsof Y* and the logit error distribution togetherproduce a model that is identical to that ofEquation (1).

In Equation (6), we can substitute the expressionfor M in terms of Z that is explicit in Equation (2),obtaining

Y� ¼ �0 þ c0xxþ �mð�0 þ a0xxþ �zzþ "mÞ þ �zzþ "y

¼ �0 þ �m�0 þ ðcx þ �maxÞ0xþ ð8Þ

ð�z þ �m�zÞzþ ð9Þ

�m"m þ "y ð10Þ

The next to last line above shows that the meandifference on Y* for active intervention versuscontrol when not adjusting for M is ð�z þ �m�zÞ.The mean difference on Y* when adjusting for M is�z. Thus the product of coefficients, �m�z isidentical to the difference in coefficients on theY* scale. This underlying equivalence of the twomethods when applied to the mean of Y*holds regardless of the distribution of "m as longas these errors have zero conditional means as inEquation (3).

Estimation and variance of mediated effects withbinary outcomes

This section shows that the estimates based onthe product of coefficients method produce

Intermediate end point effect 503

http://ctj.sagepub.com Clinical Trials 2007; 4: 499–513

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

asymptotically unbiased estimates of the mediatedeffect while those derived from the difference incoefficients method must be scaled properly inorder to reduce zero-order bias. We also show thatvariance estimates from the General EstimatingEquation (GEE, see [33]) approach can be used toform confidence intervals and testing. Appendixshows the consistency of the product of coefficientsmethod for logistic regression when YjXMZ is givenby Model 1 and the conditional mean of MjXZ islinear as in Equations (2) and (3). Appendix A alsoshows the asymptotic independence of � and �z.

Standardizing mediated effects based on thedifference of coefficients method

In this section, it is shown that the ordinaryestimates obtained from a logistic regression of Yon X and Z based on Equation (5) are not consistentfor �z þ �m�z. We start from the true latent variablerepresentation in Equations (8)–(10). Then

PrfY ¼ 1jX ¼ x;Z ¼ zg

¼ Prf�m"m þ "y>� ð�0 þ �m�0Þ � ðcx þ �maxÞ0x

� ð�z þ �m�zÞzg ð11Þ

Unless �m ¼ 0, this probability function will nothave a logistic form. More importantly, because thevariance of �m"m þ "y is larger than that of astandard logistic error, all of the logistic regressionestimates of the intercept and slopes for X and zwill be biased. It is possible to make adjustments forthis higher variance, noting that a standardlogistic regression error has varianceVarð"yÞ ¼ �2=3 [35] while the marginal variance isVarð�m"m þ "yÞ ¼ �2

mVarð"mÞ þ �2=3. Thus if we firstperform logistic regression of Y on X and z, thenmultiply all the logistic regression estimates,including �z by this factor,

�logit:standardz ¼ �z

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi3ð�2

mdVarð"mÞ þ �2=3Þ=�2

q; ð12Þ

the coefficient will at least be scaled properly. In thetables below we refer to this as the standardizedlogistic solution and investigate how this simpleadjustment compares with the unstandardizedsolutions. Similarly, the proportion mediated canthen be estimated by 1� �z=�

logit:standardz .

For probit regression, the residual variance isfixed to one while Varð�m"m þ "yÞ ¼ �2

mVarð"mÞ þ 1.Thus the standardized probit solution is to adjustthe routine probit regression estimate of �z by

�probit:standardz ¼ �z

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1þ �2m

dVarð"mÞq: ð13Þ

When using the standardized estimate of �z, thestandard error of the estimate needs to be scaled aswell. Bootstrap resampling can also be used toobtain the standard error.

Simulation study

A simulation study was conducted to demonstratethe discrepancy between �z � �z and �m�z and toinvestigate the performance of different estimatorsfor the proportion mediated. The mean andvariance of different estimators of the mediatedeffect and the proportion mediated were investi-gated as a function of sample size and size ofrelations among variables. In addition, the simula-tion allows the exploration of robustness to boththe logistic assumption in Equation (4) and to thenormality assumption for the distribution of MjZ.

The Z variable was always dichotomous withequal numbers in each group, and the mediator Mwas either normally distributed, distributed as at-distribution with three degrees of freedom, orhaving large skewness in the error; these weregenerated using a chi-square distribution on threedegrees of freedom. This latter situation was chosento mimic the primary departure from normalitythat we observed in our actual data example below.For all of these cases we still apply least squares sothat we can examine the impact of misspecifyingthe error distribution for MjZ.

The outcome variable Y*jMZ was generated tohave either an underlying logistic or a normal errordistribution. Thus with the assumption of a logiterror, YjMX follows Equation (1) with a similarprobit model for normal errors. Estimates werecompared for both logistic and probit regressionwith underlying probit errors so that we can assessthe impact of misspecifying the conditional modelfor YjMX. Simulated sample sizes were 50, 100, 200,500, 1000, and 5000. Parameter values for �z; �m,and �z were chosen to correspond to four differenteffect sizes ([36], p. 412–414) of small (2% of thevariance), medium (13% of the variance), large(26% of the variance), and very large (40% of thevariance). These parameter values correspond to0.14, 0.39, 0.59, and 1, respectively in a populationmediation model with continuous variables. Foreach of these 64 combinations of parameter valuesand all six sample sizes used in the simulation,there were 500 replications. In addition, for everycombination of parameter values we computed onereplication for a sample size of 20 000, using this asthe true expected value for infinite sample size.

The accuracy of the point estimates were com-pared to the parameter values generating the latentY* data. Also, by comparing the means across

504 DP Mackinnon et al.

Clinical Trials 2007; 4: 499–513 http://ctj.sagepub.com

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

different estimators for the largest sample size(20 000), we can examine the variation in popula-tion averages of the different parameterizations.Analysis of Variance was used to estimate effects ofparameter value and sample size in order tosummarize results. To conserve space, details ofthe analysis of variance are not presented.

Graphical comparison of the difference incoefficient and product of coefficient methods

We begin with a graphical comparison of theproduct of coefficients with both standardizedand unstandardized difference in coefficient meth-ods, all derived under a correctly specified model.For logistic regression, a comparison of the unstan-dardized �z � �z, the standardized �

logit:standardz � �z,

and the product of coefficients �m�z estimates ofmediation effects is shown in Figure 1 for the casewhere MjZ is normally distributed. In this figure,the large sample expected values, obtained through500 simulations of the three estimators of themediated effect with n ¼ 20000, are plotted as afunction of increasing values of the �m coefficient,averaging across values for the relation between theindependent variable and the mediator (�z). As therelation between the mediator and dependentvariable gets larger, the size of the expectedmediated effect should increase linearly becausethis relation is one of the components of themediated effect. As shown in Figure 1, withincreasing values of the �m parameter, all threeestimators increase at different rates. While themean of the estimated product of coefficientsmeasure, �m�z, has the highest rate, the differencein coefficients measure, �z � �z, tends to flatten out

at larger values of �m. Also shown in this figure, thestandardized measure for �

logit:standardz � �z is very

close to the �m�z values. This result demonstratesthe distortions possible when using the unstandar-dized difference in coefficients method. A similarpattern in the three estimates exists for probitregression and for other combinations of logit andprobit error and logistic and probit regression.

Figures 2 and 3 demonstrate problems with the�z � �z method of estimating mediation. Figure 2shows the value of �z � cz as function of cz for thecase where the true model holds �z at zero and

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Med

iate

d ef

fect

0.14 0.39 0.59 1gm

gmaz

bz − gz

bzlogit.standard − gz

^

^ ^

^ ^

^

Figure 1 Average values of mediated effect estimates: logistic

regression

−0.08

−0.06

−0.04

−0.02

0.00

Med

iate

d ef

fect

0.14 0.39 0.59 1

gm

gz az

bz − gz

^ ^

^ ^

Figure 2 Estimated mediated effects as function of �m when

�z is held at 0: logistic regression

−0.05

0.00

0.05

0.10

Med

iate

d ef

fect

0.14 0.39 0.59 1gm

gz = 0.14gz = 0.39gz = 0.59gz = 1

Figure 3 Estimated mediated effect �z � �z as function of �mand �z with �z¼0.14: logistic regression

Intermediate end point effect 505

http://ctj.sagepub.com Clinical Trials 2007; 4: 499–513

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

allows �z to vary. The true mediated effect is zerofor all conditions in the plot and the �z�m estimatorof the mediated effect correctly has a value of zerofor all values while the �z � �z estimate departsfrom zero and becomes more negative as �zincreases. Figure 3 shows similar information forthe two estimators with �z held at 0.14 and �mvaried so that the mediated effect should increaseas �z is increased. The value of �z � �z varies asfunction of the value of �m and for �z equal to 1,�z � �z increases slightly and then declines eventhough the true mediated effect is increasing.

Numerical comparison of mediation measuresunder the correct and incorrect models forthe dependent variable

The first two tables are based on logistic and probitmodels with each fit using the appropriate model.

The third table fits a logistic model to datagenerated from a probit model. For Tables 1–6below, we provide three rows for comparison. Thefirst two rows compare the replication averages topopulation quantities whereas the last row exam-ines variability due to replications. The first row,labeled Mean (�n), provides a mean value of theestimates for each sample size (n), averaged over the64 combinations of parameter values. Since all ofthe parameter values chosen are positive, a meanvalue that is smaller than those used to generate thedata, labeled in the table as �, indicates attenua-tion. Changes across this row reflect sample sizebias. The second row, labeled Standard Deviation(�n) , measures the average standard deviation ofthe replication-averaged means over the 64 differ-ent parameter value simulations at that sample size.These values can be compared to the true standarddeviation across the 64 sets of parameter values (�).The last row, labeled Average Squared Distance,

Table 1 Comparison of the product of coefficients and difference in coefficients estimators of the mediated effect under a correct

logistic and correct normality assumption

Sample size

50 100 200 500 1000 5000

�m�z �¼0.281, �¼0.258

Mean (�n) 0.314 0.295 0.289 0.285 0.283 0.281Standard deviation (�n) 0.439 0.342 0.298 0.272 0.265 0.258

Average squared distance 0.111 0.045 0.020 0.007 0.004 0.001

�z � �zMean (�n) 0.214 0.213 0.214 0.214 0.213 0.213Standard deviation (�n) 0.356 0.270 0.234 0.213 0.206 0.201

Average squared distance 0.080 0.031 0.014 0.005 0.003 0.001

�logit:standardz � �z

Mean (�n) 0.289 0.271 0.266 0.262 0.260 0.259

Standard deviation (�n) 0.420 0.320 0.277 0.253 0.245 0.239

Average squared distance 0.105 0.040 0.018 0.007 0.003 0.001

Table 2 Comparison of the product of coefficients and difference in coefficients estimators of the mediated effect under a correctprobit and correct normality assumption

Sample size

50 100 200 500 1000 5000

�m�z �¼0.281, �¼0.258Mean (�n) 0.321 0.295 0.289 0.285 0.283 0.281

Standard deviation (�n) 0.428 0.323 0.288 0.269 0.263 0.257

Average squared distance 0.096 0.031 0.014 0.005 0.003 0.001

�z � �zMean (�n) 0.149 0.156 0.159 0.161 0.160 0.160

Standard deviation (�n) 0.301 0.222 0.194 0.180 0.174 0.170

Average squared distance 0.057 0.019 0.008 0.003 0.002 0.000

�probit:standardz � �z

Mean (�n) 0.326 0.297 0.290 0.285 0.283 0.281

Standard deviation (�n) 0.445 0.326 0.290 0.269 0.263 0.257

Average squared distance 0.108 0.033 0.015 0.005 0.003 0.001

506 DP Mackinnon et al.

Clinical Trials 2007; 4: 499–513 http://ctj.sagepub.com

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

measures the average squared distance of theestimate from the expected value of the estimate.The expected value of the estimate was obtainedthrough 500 simulations with n ¼ 20000.

Table 1 compares the three estimators ofmediated effect when the underlying data areboth generated and modeled with a logistic dis-tribution. The distribution of MjZ is generated fromthe normal and modeled as normal. For theproduct of coefficients method which is listedfirst, the first row shows there is very little overallbias for large samples (0.281) and very modest bias

for small samples (0.314), compared to the truepopulation mean (0.281). In terms of variability ofthese estimates across the sets of parameters, thesecond row shows substantial variation at samplesize of 50 (�n ¼ 0:439 relative to � ¼ 0:258), butminimal difference in these estimates with samplesizes of 500 or above. The last row indicates that thereplication variances for the product of coefficientsmethod decrease smoothly to zero.

The unadjusted difference in coefficients methodexhibits several unfavorable behaviors. The first rowshows it produces systematic attenuation, with

Table 3 Comparison of the product of coefficients and difference in coefficients estimators of the mediated effect under an incorrect

logistic and correct normality assumption

Sample size

50 100 200 500 1000 5000

�m�z �¼0.475, �¼0.442

Mean (�n) 0.548 0.501 0.490 0.482 0.478 0.475Standard deviation (�n) 0.741 0.560 0.499 0.464 0.454 0.444

Average squared distance 0.288 0.095 0.042 0.016 0.008 0.002

�z � �zMean (�n) 0.218 0.229 0.234 0.236 0.235 0.235Standard deviation (�n) 0.512 0.369 0.320 0.294 0.285 0.278

Average squared distance 0.168 0.055 0.024 0.009 0.004 0.001

�probit:standardz � �z

Mean (�n) 0.483 0.435 0.424 0.416 0.413 0.410

Standard deviation (�n) 0.707 0.508 0.448 0.414 0.403 0.394

Average squared distance 0.284 0.082 0.037 0.013 0.007 0.001

Table 4 Comparison of the product of coefficients and difference in coefficients estimators for the proportion mediated estimatorsunder a correct logistic assumption and correct normal assumption

Sample size

50 100 200 500 1000 5000

�m�z=ð�z þ �m�zÞ �¼0.343, �¼0.227Mean (�n) �2.4�1011 3.8�1011 1.3�1010 0.358 0.280 0.349

Standard deviation (�n) 1.7�1013 4.4�1013 2.3�1012 13.508 13.144 0.247

Average squared distance 3.0�1026 1.9�1027 5.2�1024 182.396 172.716 0.009

1� �z=�zMean (�n) 0.243 0.311 0.412 0.340 0.320 0.304

Standard deviation (�n) 3.862 6.659 8.494 4.905 1.142 0.251

Average squared distance 14.872 44.256 72.060 23.981 1.243 0.010

�m�z=�zMean (�n) 0.360 0.407 0.501 0.427 0.403 0.386

Standard deviation (�n) 3.845 6.682 8.428 5.084 1.145 0.284

Average squared distance 14.716 44.532 70.926 25.743 1.231 0.0111� �z=�

logit:standardz

Mean (�n) 0.288 0.335 0.435 0.369 0.350 0.335

Standard deviation (�n) 3.414 6.107 7.776 4.584 1.116 0.248

Average squared distance 11.618 37.214 60.387 20.934 1.185 0.010�m�z=�

logit:standardz

Mean (�n) 0.314 0.360 0.460 0.395 0.375 0.360

Standard deviation (�n) 3.403 6.115 7.708 4.715 1.114 0.258

Average squared distance 11.535 37.314 59.324 22.151 1.175 0.010

Intermediate end point effect 507

http://ctj.sagepub.com Clinical Trials 2007; 4: 499–513

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

�5000 ¼ 0:213 compared to m ¼ 0:281. Also, thesemeans are nearly constant over the different samplesizes. Variability across these 64 replications in theaverage difference in coefficients estimates shows asimilar pattern of decrease with sample size com-pared to the product of coefficients method.

Standardizing the difference in coefficientsunder a logistic model does substantially reducethe attenuation, as indicated by the first row in thethird part of this table. Also, �n is quite appropriategiven the 8% attenuation that still occurs for asample size of 5000.

Table 2 shows the same comparisons of mediatedeffect when the underlying data are both generatedand modeled with a probit distribution. Again, thedistribution of MjZ is taken from the normal andmodeled as such. As shown in the three parts inTable 2, the �m�z and �z � �z estimators are notequal but standardization makes �z � �z very closeto �m�z.

The product of coefficients method behavesjust as well under a probit model as it does undera logit model. There is very little bias (row 1), andvariability in these estimates behaves quite well forlarge and small sample sizes (row 2).

Examination of the unadjusted difference incoefficients method in Table 1, leads to a verydifferent conclusion. First, there is large attenua-tion of this method (�5000 ¼ 0:160 compared tom ¼ 0:281), even more than under a logistic model.Also, compared to the product of coefficients, thereis larger variance as a function of sample size(�50=�5000 ¼ 1:76) for this difference in coefficientsmethod. However, standardization succeeds verywell, making each estimae of �

logit:standardz � �z nearly

identical to those of �m�z.The sensitivity of these estimates to the assump-

tion of a logistic distribution is investigated inthis section. Table 3 shows the behavior of thedifferent estimators for the mediated effect when

Table 5 Comparison of the product of coefficients and difference in coefficients estimators of the mediated effect under an incorrect

logistic and incorrect normality assumption (probit and t(3))

Sample size

50 100 200 500 1000 5000

�m�z �¼0.474, �¼0.440

Mean (�n) 0.555 0.510 0.492 0.480 0.476 0.475Standard deviation (�n) 0.762 0.579 0.506 0.466 0.451 0.443

Average squared distance 0.330 0.116 0.049 0.018 0.009 0.002

�z � �zMean (�n) 0.317 0.314 0.310 0.307 0.306 0.306Standard deviation (�n) 0.532 0.398 0.352 0.326 0.317 0.311

Average squared distance 0.168 0.051 0.022 0.008 0.004 0.001

�logit:standardz � �z

Mean (�n) 0.597 0.538 0.514 0.500 0.496 0.495

Standard deviation (�n) 0.828 0.618 0.541 0.497 0.483 0.475

Average squared distance 0.386 0.124 0.052 0.019 0.010 0.002

Table 6 Comparison of the product of coefficients and difference in coefficients estimators of the mediated effect under an incorrectlogistic and incorrect normality assumption (probit and �2(2))

Sample size

50 100 200 500 1000 5000

�m�z �¼0.484, �¼0.447Mean (�n) 0.588 0.528 0.508 0.492 0.487 0.484

Standard deviation (�n) 0.928 0.610 0.529 0.475 0.459 0.449

Average squared distance 0.582 0.139 0.059 0.021 0.010 0.002

�z � �zMean (�n) 0.418 0.386 0.377 0.368 0.365 0.364

Standard deviation (�n) 0.630 0.460 0.411 0.377 0.368 0.361

Average squared distance 0.215 0.062 0.027 0.010 0.005 0.001

�logit:standardz � �z

Mean (�n) 0.891 0.674 0.624 0.596 0.587 0.582

Standard deviation (�n) 2.110 0.885 0.692 0.613 0.590 0.577

Average squared distance 3.780 0.337 0.097 0.031 0.014 0.003

508 DP Mackinnon et al.

Clinical Trials 2007; 4: 499–513 http://ctj.sagepub.com

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

the underlying data are generated with a probitdistribution but analyzed with logistic regression.We note that these two distributional assumptionsare difficult to distinguish in practice unless largeamounts of data are available. Thus methods thatbehave differently under these logistic and probitassumptions are not robust against this fairlyinnocuous change in models. Again, the distribu-tion ofMjZ is taken from the normal andmodeled assuch.

Unlike the two previous tables where definitionsof mediation could be computed directly from theparameters themselves, this table’s true mediatedeffects had to be defined numerically based onfinding a best-fitting linear logistic model appliedto data generated by a linear probit model. Wefound m ¼ 0:475 and � ¼ 0:442.

As shown in Table 3, the three methods producequite different results, with the difference incoefficients having the smallest values, the productof coefficients method having the largest values,and the standardized method having a mean 13%smaller than the first. For large sample sizes, theproduct of coefficients methods agrees very closelywith the mean over the 64 sets of parameter values(�5000 ¼ 0:475). We do, however, observe in row 1of this table a relatively large bias of order

ffiffiffiffiffiffiffiffin�1

p.

This overestimation in small samples under anincorrect logit assumption is somewhat larger thanthe overestimation when the true probit model isused. In contrast, the unadjusted difference incoefficients method shows severe attenuation atlarge samples, even more than it did under acorrectly modeled probit. The unadjusted differ-ence in coefficients method produces a modestnegative

ffiffiffiffiffiffiffiffin�1

pbias making attenuation larger in

smaller sample sizes.

Simulation for the proportion mediated measures

We next compared five different estimators ofthe proportion mediated, using the product ofcoefficients and unstandardized and standardizedestimates of difference in coefficients. Note thatthe first estimator is based on coeffients thatneed no standardization; because the second andthird estimators use �z in the denominator, weanticipate that they will contain bias. The fourthand fifth estimators use the standardized �z inthe denominators and because of this we expecta priori that they will do nearly as well as thefirst. For Table 4 the dependent variable isgenerated from a logit and modeled using logisticregression, with the mediator having a normaldistribution and modeled as such. The true meanof the 64 simulations for the proportionmediated is 0.343 and � ¼ 0:227. Both of these

agree well with the �5000 and �5000 for the firstestimator, �z�m=ð�z�m þ �zÞ. However, this estima-tor performs extremely poorly for samples of 500or smaller since at these sample sizes thevariability in the estimates is extreme. This iscaused by the denominator occasionally beingnear zero. For the second and third estimators,which are both based on unstandardized �z, bothare severely biased in large samples and aretherefore not recommended. The last two esti-mators, which are based on denominators withstandardized �z both have modest bias in largesamples (�5000 ¼ 0:360 and 0.335, respectivelyversus a true value of 0.343) and agree wellacross all 64 simulations, as indicated by the factthat �5000 is only modestly larger than � ¼ 0:227.Somewhat surprisingly, both of these estimatorsperform better than the first in small samples,although there is still substantial variabilityacross replications.

The results for probit regression under a trueprobit model mirror these results for a correctlogistic regression and therefore are not presentedin tabular form but only described. Again, the largesample fits using �z�m=ð�z�m þ �zÞ are the best,followed by the last two estimators that usestandardized �z. The two nonstandardized modelshave very large bias in large samples. Also as in theprevious case with logistic regression modeling atrue logit model, there is extremely high instabilityof the first estimator for sample sizes of 500 orbelow. We conclude that the measure of theproportion mediated is inherently difficult toestimate in small to moderate sized samples.

We now turn to estimation of the proportionmediated when the data are generated under aprobit but analyzed incorrectly with logisticregression. As the results are similar to the twocorrectly analyzed logistic and probit regressionsjust described, no table is shown. Again, largesample bias for �z�m=ð�z�m þ �zÞ is negligible, butthis estimator is still subject to extreme variancein small to moderate samples. The last twoestimators with standardized �z in the denomi-nators perform much better in small samples andshow only a small amount of bias with a samplesize of 5 000. However, both of these estimatorsthat rely on unstandardized �z show substantialbias in large samples and cannot berecommended.

Simulation results when both conditionaldistributions are incorrectly specified

To investigate how sensitive results are to theunderlying distributional assumptions, we inves-tigated the behavior of the three estimators of

Intermediate end point effect 509

http://ctj.sagepub.com Clinical Trials 2007; 4: 499–513

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

mediation effect in two situations where theconditional distributions of Y and M were bothmisspecified. Again, for both of these simulationswe assumed a true probit model but used logisticregression to model how Y depends on M and Z.In the first simulation described below, weassumed that MjZ was normal, but the trueerror distribution for the mediator was given at-distribution with three degrees of freedom. Thatis, M ¼ �zðZ � 3=2Þ þ T3=

ffiffiffi3

p; z ¼ 1;2 with the

scale factor chosen so that the variance of theerror term is one. This symmetric error distribu-tion with large but finite variance is often usedin studies of robustness to a small proportion oflarge errors. The second simulation was generatedto resemble the data in our example describedbelow. These data had substantial skewness(�1 � 2). In our simulation, the mediator wasgenerated as M ¼ �zðZ � 3=2Þ þ ð�2

2 � Eð�22ÞÞ=2 so

that the skewness in the data were exactly 2.The errors were weighted so that the standardizedregression coefficients for the mediator wereagain small (0.14), medium (0.39), and large(0.59) to match the earlier simulations.

In Table 5 , where the t-distribution is incorrectlymodeled as normal, the product of coefficientsmethod performs quite well. In fact, the simulationresults for this estimator are nearly identical to theones obtained in Table 3 where conditional nor-mality for M was correctly assumed. In fact, themeans and variances are almost identical for theproduct of coefficients method (0.475 versus 0.475for �5000 and 0.443 versus 0.444 for �5000). There isalso very good agreement between the product ofcoefficients method and the standardized differ-ence in coefficients method across all sample sizes,much better than that in Table 3. Both of thesemethods show positive bias at smaller sample sizes.Just as before, the unstandardized difference incoefficients method shows substantial bias in largesamples and little change in bias as a function ofsample size.

In Table 6, with an asymmetric error distribu-tion, we anticipate some instability of theseestimators because they are computed using twowrong distributional assumptions. Only the

product of coefficients method provides verylittle overall bias. In particular, �5000 ¼ 0:484,which agrees very well with value of�5000 ¼ 0:475 from Table 3 and 0.475 fromTable 5, the two previous models with the sameincorrectly specified logistic model. There issomewhat more variation in these estimateswith a asymmetric error distribution than in theprevious two cases (�5000 ¼ 0:577 compared to0.444 when normality is correctly assumed).These differences, however, are quite minimalcompared to the instability of both the unstan-dardized and standardized difference in coeffi-cients methods for the normality, long-tailed,and asymmetric error distributions. In particular,�5000 varies from 0.235 to 0.364 for the unstan-dardized difference in coefficients and from 0.410to 0.582 even with standardization.

Case study with intentions as amediator for cigarette use

Mediation for a binary dependent variable isillustrated here with a data set from a school-based drug prevention program. The MidwesternPrevention Project (MPP) was a longitudinalschool- and community-based drug preventionprogram. The intervention program was introducedto 6th and 7th grade students in Kansas City andconsisted of school and parent programs, healthpolicy changes and mass media coverage (for fulldetails see Pentz et al. [37]). Schools were randomlyassigned to treatment or control conditions. In thisexample we posit that behaviors are preceded bybehavioral intentions, and test whether intentionto use cigarettes is a mediator of the program effecton cigarette use. The data set consists of 864students with complete data on three variables: Ztreatment condition; M intention to use tobacco inthe following 2-month period, measured approxi-mately 6 months after baseline; and Y report oftobacco use at follow-up in the prior one monthperiod. Table 7 contains the cross-tabulations ofthese three variables, as well as the coding schemes.

Table 7 Example data set from Midwestern Prevention Project (N¼864)

Mediator Treatment group (1) Control group (0)

Intention to use cigarettes Cigarette use (1) No use (0) Cigarette use (1) No use (0)

Yes (4) 19 9 20 9

Probably (3) 11 15 20 14

Don’t think so (2) 11 43 13 36No (1) 32 353 30 229

510 DP Mackinnon et al.

Clinical Trials 2007; 4: 499–513 http://ctj.sagepub.com

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

The mediated effect was estimated usingEquation (11) and least squares regression of M onZ. When calculated as �z�m , the mediated effectwas �0.171 ( �2

�z�m¼ :065, CI¼�0.298, �0.044).

When calculated as �z � �z the mediated effect was -0.129 (��z��z ¼ 0:089, CI¼�0.303, 0.046), a discre-pancy of 25% and a formal difference in whether asignificant mediation was found. This discrepancyis also present in the various proportion mediatedmeasures, with 1� �z=�z ¼ 0:254, �z�m=�z ¼ 0:338,and �z�m=�z þ �z�m ¼ 0:312. Although the propor-tion mediated estimates in this example are all>0.20, the proportion mediated estimates using the�z�m estimate are 20 to 30% greater than the size ofthe estimate using �z and �z.

Equation (12) was used to standardize the �zlogistic regression estimate. After standardization,the difference in coefficient mediated effect and thetwo proportion measures involving proportionswere recalculated. The standardized estimate ofthe difference in coefficients estimate was �0.197,much closer to the product of coefficients estimate.The standardized proportion measures are1� �z=�z ¼ 0:343 and �z�m=�z ¼ 0:297. Theseresults are also consistent with the simulationstudy; standardization brings the estimates muchcloser together. In this example, the use ofthe unstandardized estimates would lead to theconclusion that the mediated effect was smallerthan it truly was.

Discussion

The purpose of this article was to examine twodifferent approaches to estimating the intermediateendpoint or mediated effect when the dependentvariable is binary and logistic or probit regression isused. Unlike the linear model situation, thedifference in coefficients and product of coeffi-cients estimators can lead to substantially differentestimates and inferential conclusions. We recom-mend using the product of coefficients methods forthe following reasons. It generally has less bias thanthe difference in coefficients method, it reflects thepopulation mediated effect, and in our simulationsit was quite robust against departures from thelogistic or probit assumption as well as the normal-ity assumption for the distribution of the mediator.This estimator behaved well under both a sym-metric error distribution with heavy tails as well asan asymmetric error distribution. Another advan-tage of this method is that it is the easiest tocompute.

We do not recommend the use of the differencein coefficients method. The unstandardized proce-dure can indicate no change in the mediated effect

when it is actually increasing. This effect is due tothe difference in scales of the logistic regressioncoefficients of Y on Z adjusting or not adjusting forM as these coefficients come from separate logisticregression equations. There may be situationswhere the mediated effect is defined in terms of adifference in coefficients so the method may beappropriate. The same problem also affects theproportion mediated measures based on this differ-ence in coefficients method. Standardizing coeffi-cients prior to calculating these quantities doesimprove the performance of the difference incoefficients method.

None of the five estimators for the proportionmediated effect were fully satisfactory. The methodbased on the product of coefficients performed wellin samples over 500 both in terms of bias androbustness to the probit assumption, but thisestimator showed a high degree of variability insmall samples. The two estimators based on theunstandardized difference in coefficients methodshowed marked bias even in large samples. Bystandardizing the denominator for these twoestimators, there was a marked decrease in biascompared to the unstandardized version, and theresults were somewhat more stable across differentsample sizes. We recommend not using any ofthese estimators for samples below a few hundredand in large samples recommend estimating theproportion mediated based on the product ofcoefficients method.

Although the results discussed in this article arefor the binary X case, the results were very similarwhen the study was conducted with a continuousX. The only important difference in the results wasrelated to the �z�m=�z þ �z�m proportion mediatedestimator, which performs much better when Z iscontinuous.

A limitation of the current study is that is doesnot discuss multiple mediator models. More com-plicated models would have the same issues withmediation estimation as described in this article.Software for the latent variable approach is avail-able for more complicated models [38]. Similarly, itwas assumed that the form of the true mediationmodel was known and no moderation existed.With actual data, the model is not known andissues related to the true relations among Z, M, andY may be more difficult to disentangle.

Acknowledgements

This paper is dedicaed to James H. Dwyer. We thankour colleagues in the Prevention Science andMethodology Group (PSMG) as well as anonymousreviewers for many helpful comments on this

Intermediate end point effect 511

http://ctj.sagepub.com Clinical Trials 2007; 4: 499–513

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

article. We thank Ghulam Warsi and Myeong sunYoon for assistance with this research andChristopher Winship for comments on the manu-script. We thank Ellen Laing and L. Terri Singer fortheir help in producing this manuscript. We alsothank Mary Ann Pentz for the use of data from theStudents Taught Awareness and Resistance study.Part of this work was presented at the 1991Psychometric Society Meeting. This research wassupported by U. S. Public Health Service GrantsR01-DA09757, R01-MH40859, R01-MH42968, andP30-MH068685.

References

1. Susser M. Causal thinking in the health sciences:concepts and strategies of epidemiology. OxfordUniversity Press, New York, 1973.

2. Susser M. What is a cause and how do we know one? Agrammar for pragmatic epidemiology. Am J Epidemiol1991; 133: 635–48.

3. Statistical and mathematical modeling of the AIDSepidemic. Papers from a conference. Baltimore,Maryland, 17–18 November, 1987. Stat Med 1989; 8:1–139.

4. Henney JE. Remarks by: Jane E, Henney, M.D.Commissioner of Food and Drugs. InternationalConference on Surrogate Endpoints and Biomarkers -National Institutes of Health, 1999. Available at http://www.fda.gov/oc/speeches/surrogates8.html Accessed 21November 2001.

5. Criqui MH, Cowan LD, Tyroler HA et al. Lipoproteins asmediators for the effects of alcohol consumption andcigarette smoking on cardiovascular mortality: resultsfrom the lipid research clinics follow-up study. Am JEpidemiol 1987; 126: 629–37.

6. Lin DY, Fleming TR, De Gruttola V. Estimating theproportion of treatment effect explained by a surrogatemarker. Stat Med 1997; 16: 1515–27.

7. Freedman LS, Schatzkin A. Sample size for studyingintermediate endpoints within intervention trials orobservational studies. Am J Epidemiol 1992; 136:1148–59.

8. Begg CB, Leung DHY.On the use of surrogate end pointsin randomized trials. J R Stat Soc A 2000; 163: 15–28.

9. Prentice RL. Surrogate endpoints in clinical trials:definition and operational criteria. Stat Med 1989; 8:431–40.

10. Freedman LS, Graubard BI, Schatzkin A. Statisticalvalidation of intermediate endpoints for chronic dis-eases. Stat Med 1992; 11: 167–78.

11. Fleming TR, DeMets DL. Surrogate end points in clinicaltrials: are we being misled? Ann Intern Med 1996; 125:605–13.

12. Molenberghs G, Buyse M, Geys H, Renard D,Burzykowski T, Alonso, A. Statistical challenges in theevaluation of surrogate endpoints in randomized trials.Controlled Clin Trials 2002; 23: 607–25.

13. Verbeke G, Molenberghs G. Linear mixed models forlongitudinal data. Springer, New York, 2000.

14. Freedman LS. Confidence intervals and statistical powerof the ‘‘Validation’’ ratio for surrogate or intermediateendpoints. J Stat Plan Infer 2001; 96: 143–53.

15. MacKinnon DP, Warsi G, Dwyer JH. A simulation studyof mediated effect measures. Multivar Behav Res 1995; 30:41–62.

16. Buyse M, Molenberghs G. Criteria for the validation ofsurrogate endpoints in randomized experiments.Biometrics 1998; 54: 1014–29.

17. Buyse M, Molenberghs G, Burzykowski T, Renard D,Geys H. The validation of surrogate endpoints inmeta-analyses of randomized experiments. Biostatistics2000; 1: 49–67.

18. Burzykowski T, Molenberghs G, Buyse M. Theevaluation of surrogate endpoints. Springer, New York,2005.

19. Gail MH, Pfeiffer R, van Houwelingen HC, Carroll RJ.On meta-analytic assessment of surrogate outcomes.Biostatistics 2000; 1, 231–46.

20. Frangakis CE, Rubin, DB. Principal stratification incausal inference. Biometrics 2002, 58, 21–29.

21. Foshee VA, Bauman KE, Arriaga XB et al. Anevaluation of Safe Dates, an adolescent datingviolence prevention program. Am J Public Health 1998;88: 45–50.

22. Botvin GJ. Preventing drug abuse in schools: Social andcompetence enhancement approaches targeting indivi-dual-level etiologic factors. Addict Behav 2000; 25:887–97.

23. Judd CM, Kenny DA. Process analysis: estimatingmediation in treatment evaluations. Evaluation Rev1981; 5: 602–19.

24. MacKinnon DP, Dwyer JH. Estimating mediatedeffects in prevention studies. Evaluation Rev 1993; 17:144–58.

25. Kenny DA, Kashy DA, Bolger N. Data analysis in socialpsychology. In Gilbert DT, Fiske ST, Lindzey G. Eds.The handbook of social psychology. McGraw-Hill, Boston,1998; 233–265.

26. MacKinnon DP, Lockwood CM, Hoffman JM,West SG,Sheets V. A comparison of methods to test mediationand other intervening variable effects. Psychol Methods2002; 7: 83–104.

27. Sobel ME. Asymptotic confidence intervals for indirecteffects in structural equation models. SociologicalMethodology 1982; 13: 290–312.

28. Krull JL, MacKinnon DP. Multilevel mediation model-ing in group-based intervention studies. Evaluation Rev1999; 23: 418–44.

29. Baron RM, Kenny DA. The moderator-mediator variabledistinction in social psychological research: conceptual,strategic, and statistical considerations. J Pers Soc Psych1986; 51: 1173-82.

30. Bollen KA. Structural equation with latent variables. Wiley,New York, 1989.

31. Hosmer DW, Lemeshow, S. Applied logistic regression.Wiley, New York, 2000.

32. Finney DJ. Probit analysis: statistical treatment of thesigmoid response curve. Cambridge University Press,Cambridge, England, 1964.

33. Zeger SL, Liang KY, Albert PS. Models for longitudinaldata: a generalized estimating equation approach.Biometrics, 1989; 45: 1049–60.

34. Draper MP, Smith H. Applied regression analysis,3rd edition. Wiley, New York, 1998.

35. McCullagh P, Neldes JA. Generalized linear models.London: Chapman Hall, 1989.

36. Cohen J. Statistical power analysis for the behavioralsciences. Erlbaum, Hillsdale, NJ, 1988.

37. Pentz MA, Dwyer JH, MacKinnon DP et al. A multi-community trial for primary prevention of adolescentdrug abuse: effects on drug use prevalence. JAMA 1989;261: 3259–66.

38. Muthen LK, Muthen BO. Mplus 4.1: User’s guide. LosAngeles: Author, 2007.

512 DP Mackinnon et al.

Clinical Trials 2007; 4: 499–513 http://ctj.sagepub.com

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from

Appendix: Mediated effects based onthe product of coefficients method

This section shows the consistency of the productof coefficients method for logistic regression whenYjXMZ is given by Model 1 and the conditionalmean of MjXZ is linear as in Equations (2) and (3).A formula for the variance of this product ofcoefficients estimator is then given. Define � asthe maximum likelihood solution for the logisticregression coefficients � ¼ ð�0; c

0x; �m; �zÞ in

Equation (1), the standard logistic regressionmodel of Y predicted by M as well as Z and X. Asthe sample size gets large, c are asymptoticallyunbiased estimates for c. Similarly for the regres-sion of M on Z adjusted for X in Equation (2), theleast squares estimates a are asymptoticallyunbiased estimates for a provided the conditionalmean is correctly specified as in Equation (3). Thusthe product of these estimates is consistent.

Further, the following sandwich-type varianceestimates are consistent as well. First define the ithobservation’s score function as S�i ¼@ logPrfY ¼ yijX ¼ xi;M ¼ mi;Z ¼ zig=@c evaluatedat c. The information matrix for a is definedas I � ¼ �

Pni¼1

@2

@�@� 0 logPrfY ¼ yijX ¼ xi;M ¼ mi;Z ¼

zig. Then

Avarð�Þ ¼ I �1�

Xni¼1

S�iS0�i

!I �1� ð14Þ

Turning to the regression of M on Z and X, iffMjXZ;aðmjx; zÞ is known, it would be used tomaximize the corresponding likelihood. Forexample, if this distribution is normal,define ðu;�; �2Þ as the normal density. Then theith observation’s score function, withparameters h ¼ ð�0; ax; �z; �

2m:xzÞ, is given by Si ¼

@ logðmi;�0 þ a0xxi þ �zzi; �2m:xzÞ=@h evaluated at the

least squares solution h. Similarly, the informationmatrix for this problem is given byI ¼ �

Pni¼1

@@h0

Si . Then

AvarðhÞ ¼ I �1

Xni¼1

SiS0i

!I �1 ð15Þ

and the diagonal element corresponding to �z,Avarð�zÞ is therefore a consistent estimate of thevariance of �z provided the mean of M is linear in Xand Z.

If the underlying distribution for YjXZ is notknown, then consistent estimates of a can still beobtained as long as the conditional model for themean is correctly specified. In this case, the leastsquares approach described above provides consis-tent estimates of both the regression parametersand their standard errors [34].

Finally, we show that these GEE estimators of� and �z are asymptotically independent. Using aTaylor expansion on the score function for �,

� � � ¼ I �1�

Xni¼1

S�i

!þOp n�1

� �ð16Þ

where Opðn�1Þ is a random term that converges to a

distribution with order n�1. Explicitly, the ithobservation’s contribution to the logistic scorefunction, S�i is

S�i ¼@

@�log PrfY ¼ yijX ¼ xi,M ¼ mi,Z ¼ zig

¼ ðyi � piÞð1,xi,mi, ziÞ0

ð17Þ

wherepi ¼ PrfY ¼ 1jX ¼ xi;M ¼ mi;Z ¼ zig ¼ EðY ¼ 1jX ¼

xi;M ¼ mi;Z ¼ ziÞ.Also

I � ¼ �Xni¼1

@2 log pi@�@� 0

¼Xni¼1

pið1� piÞð1,x0i,mi, ziÞ

0ð1,x0

i,mi, ziÞ ð18Þ

Note that like I� , �z depends only onT ¼ fðxi,mi, ziÞ, i ¼ 1, . . . ;ng and not on y. Then byconditioning first on T, we have

Acovð�; �zÞ ¼Eð���Þ�z

¼EMðEY ðð���Þ�zjTÞÞ

¼EðI �1� ð

Xni¼1

Eðyi�pijTÞð1;xi;mi;ziÞ0Þ�zÞ

¼0

This result can easily extended to any GEE typeof estimator of �, of which the maximum like-lihood from an exponential family distribution is aspecial case.

Intermediate end point effect 513

http://ctj.sagepub.com Clinical Trials 2007; 4: 499–513

© 2007 The Society for Clinical Trials. All rights reserved. Not for commercial use or unauthorized distribution. at UNIV WASHINGTON LIBRARIES on October 29, 2007 http://ctj.sagepub.comDownloaded from