Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal...

24
1 Causality on Longitudinal Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi, Perry Groot, Marieke HC van Rijn, Jan AJG van den Brand, Marianne Heins, Hans Knoop, Tom Heskes, the Alzheimer’s Disease Neuroimaging Initiative * , the MASTERPLAN Study Group, the OPTIMISTIC consortium ** Abstract—A typical problem in causal modeling is the instability of model structure learning, i.e., small changes in finite data can result in completely different optimal models. The present work introduces a novel causal modeling algorithm for longitudinal data, that is robust for finite samples based on recent advances in stability selection using subsampling and selection algorithms. Our approach uses exploratory search but allows incorporation of prior knowledge, e.g., the absence of a particular causal relationship between two specific variables. We represent causal relationships using structural equation models. Models are scored along two objectives: the model fit and the model complexity. Since both objectives are often conflicting we apply a multi-objective evolutionary algorithm to search for Pareto optimal models. To handle the instability of small finite data samples, we repeatedly subsample the data and select those substructures (from the optimal models) that are both stable and parsimonious. These substructures can be visualized through a causal graph. Our more exploratory approach achieves at least comparable performance as, but often a significant improvement over state-of-the-art alternative approaches on a simulated data set with a known ground truth. We also present the results of our method on three real-world longitudinal data sets on chronic fatigue syndrome, Alzheimer disease, and chronic kidney disease. The findings obtained with our approach are generally in line with results from more hypothesis-driven analyses in earlier studies and suggest some novel relationships that deserve further research. Index Terms—Longitudinal data, Causal modeling, Structural equation model, Stability selection, Multi-objective evolutionary algorithm, chronic fatigue syndrome, chronic kidney disease, Alzheimer’s disease 1 I NTRODUCTION Causal modeling, an essential problem in many disciplines [1], [2], [3], [4], [5], [6] attempts to model the mechanisms by which variables R. Rahmadi is with the Department of Informatics, Universitas Islam Indonesia and Institute for Computing and Information Sciences, Radboud University Nijmegen, the Netherlands. E-mail: [email protected] P. Groot and T. Heskes are with Institute for Computing and Information Sciences, Radboud University Nijmegen, the Nether- lands. M. van Rijn and J. van den Brand are with Department of Nephrology, Radboud University Medical Center, Nijmegen, The Netherlands. H. Knoop is with the Department of Medical Psychology, Aca- demic Medical Center, University of Amsterdam, The Nether- lands. ** The members of OPTIMISTIC consortium are described in the acknowledgment. relate and to understand the changes on the model if the mechanisms were manipulated [7]. In the medical domain, revealing causal relationships may lead to improvement of clin- ical practice, for example, the development of treatment and medication. Slowly but steadily, causal discovery methods find their way into the medical literature, providing novel in- * One of the data sets used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investiga- tors can be found at: http:// adni.loni.usc.edu/ wp-content/ uploads/ how to apply/ ADNI Acknowledgement List.pdf M. Heins is with the Netherlands Institute for Health Services Research, The Netherlands. arXiv:1605.06838v3 [stat.ML] 4 Apr 2017

Transcript of Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal...

Page 1: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

1

Causality on Longitudinal Data:Stable Specification Search in Constrained

Structural Equation ModelingRidho Rahmadi, Perry Groot, Marieke HC van Rijn, Jan AJG van den Brand,

Marianne Heins, Hans Knoop, Tom Heskes,the Alzheimer’s Disease Neuroimaging Initiative∗,

the MASTERPLAN Study Group, the OPTIMISTIC consortium∗∗

Abstract—A typical problem in causal modeling is the instability of model structure learning, i.e., small changes in finitedata can result in completely different optimal models. The present work introduces a novel causal modeling algorithmfor longitudinal data, that is robust for finite samples based on recent advances in stability selection using subsamplingand selection algorithms. Our approach uses exploratory search but allows incorporation of prior knowledge, e.g., theabsence of a particular causal relationship between two specific variables. We represent causal relationships usingstructural equation models. Models are scored along two objectives: the model fit and the model complexity. Since bothobjectives are often conflicting we apply a multi-objective evolutionary algorithm to search for Pareto optimal models.To handle the instability of small finite data samples, we repeatedly subsample the data and select those substructures(from the optimal models) that are both stable and parsimonious. These substructures can be visualized through acausal graph. Our more exploratory approach achieves at least comparable performance as, but often a significantimprovement over state-of-the-art alternative approaches on a simulated data set with a known ground truth. We alsopresent the results of our method on three real-world longitudinal data sets on chronic fatigue syndrome, Alzheimerdisease, and chronic kidney disease. The findings obtained with our approach are generally in line with results frommore hypothesis-driven analyses in earlier studies and suggest some novel relationships that deserve further research.

Index Terms—Longitudinal data, Causal modeling, Structural equation model, Stability selection, Multi-objectiveevolutionary algorithm, chronic fatigue syndrome, chronic kidney disease, Alzheimer’s disease

F

1 INTRODUCTION

Causal modeling, an essential problem in manydisciplines [1], [2], [3], [4], [5], [6] attemptsto model the mechanisms by which variables

• R. Rahmadi is with the Department of Informatics, UniversitasIslam Indonesia and Institute for Computing and InformationSciences, Radboud University Nijmegen, the Netherlands. E-mail:[email protected]

• P. Groot and T. Heskes are with Institute for Computing andInformation Sciences, Radboud University Nijmegen, the Nether-lands.

• M. van Rijn and J. van den Brand are with Department ofNephrology, Radboud University Medical Center, Nijmegen, TheNetherlands.

• H. Knoop is with the Department of Medical Psychology, Aca-demic Medical Center, University of Amsterdam, The Nether-lands.

• ∗∗The members of OPTIMISTIC consortium are described in theacknowledgment.

relate and to understand the changes on themodel if the mechanisms were manipulated[7]. In the medical domain, revealing causalrelationships may lead to improvement of clin-ical practice, for example, the development oftreatment and medication. Slowly but steadily,causal discovery methods find their way intothe medical literature, providing novel in-

• ∗One of the data sets used in preparation of this article wereobtained from the Alzheimer’s Disease Neuroimaging Initiative(ADNI) database (adni.loni.usc.edu). As such, the investigatorswithin the ADNI contributed to the design and implementationof ADNI and/or provided data but did not participate in analysisor writing of this report. A complete listing of ADNI investiga-tors can be found at: http://adni.loni.usc.edu/wp-content/uploads/how to apply/ADNI Acknowledgement List.pdf

• M. Heins is with the Netherlands Institute for Health ServicesResearch, The Netherlands.

arX

iv:1

605.

0683

8v3

[st

at.M

L]

4 A

pr 2

017

Page 2: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

2

sights through exploratory analyses [8], [9],[10]. Moreover, data in the medical domainis often collected through longitudinal studies.Unlike in a cross-sectional design, where allmeasurements are obtained at a single occa-sion, the data in a longitudinal design consistof repeated measurements on subjects throughtime. Longitudinal data make it possible tocapture change within subjects over time andthus gives some advantage to causal modelingin terms of providing more knowledge to estab-lish causal relationships [11]. As emphasizedin Fitmaurice et al., [12] there is much naturalheterogeneity among subjects in terms of howdiseases progress that can be explained by thelongitudinal study design. Another advantageis that in order to obtain a similar level ofstatistical power as in cross-sectional studies,fewer subjects in longitudinal studies are re-quired [13].

To date, a number of causal modelingmethods have been developed for longitu-dinal (or time series) data. Some of themethods are based on a Vector Autoregres-sive (VAR) and/or Structural Equation Model(SEM) framework which assumes a linear sys-tem and independent Gaussian noise [14], [15],[16], [17], [18]. Some other methods, inter-estingly, take advantage of nonlinearity [19],[20], [21], or non-Gaussian noise [20], [22], togain even more causal information. Most ofthe aforementioned methods conduct the es-timation of the causal structures in somewhatsimilar ways. For example [15], [16], [17], [20],[22] use the (partial correlations of the) VARresiduals to either test independence or as in-put to a causal search algorithm, e.g., LiNGAM(Linear Non-Gaussian Acyclic Model) [23], PC(“P” stands for Peter, and “C” for Clark, theauthors) [24]. In general these causal searchalgorithms are solely based on a single runof model learning which is notoriously insta-ble: small changes in finite data samples canlead to entirely different inferred structures.This implies that, some approaches might notbe robust enough to correctly estimate causalmodels from various data, especially when thedata set is noisy or has small sample size.

In the present paper, we introduce a robustcausal modeling algorithm for longitudinal

data that is designed to resolve the instabilityinherent to structure learning. We refer to ourmethod as S3L, a shorthand for Stable Specifi-cation Search for Longitudinal data. It extendsour previous method [25], here referred to asS3C, which is designed for cross-sectional data.S3L is a general framework which subsamplesthe original data into many subsets, and foreach subset S3L heuristically searches for Paretooptimal models using a multi-objective opti-mization approach. Among the optimal mod-els, S3L observes the so-called relevant causalstructures which represent both stable and par-simonious model structures. These steps con-stitute the structure estimation of S3L whichis fundamentally different from the aforemen-tioned approaches that mostly use a single runfor model estimation. For completeness, detailabout S3C/L is described in Section 2. More-over, in the default setting S3L assumes someunderlying contexts: iid samples for each timeslice (lag), linear system, additive independentGaussian noise, causal sufficiency (no latentvariables), stationary (time-invariant causal re-lationships), and fairly uniform time intervalsbetween time slices.

The main contributions of S3L are:• The causal structure estimation of S3L is

conducted through multi-objective opti-mization and stability selection [26] overoptimal models, to optimize both the sta-bility and the parsimony of the modelstructures.

• S3C/L is a general framework whichallows for other causal methods withall of their corresponding assumptions,e.g., nonlinearity, non-Gaussianity, to beplugged in as model representation andestimation. The multi-objective search andthe stability selection part are independentof any mentioned assumptions.

• In the default model representation, S3Ladopts the idea of the “rolling” modelfrom [27] to transform a longitudinal SEMmodel with an arbitrary number of timeslices into two parts: a baseline model anda transition model. The baseline modelcaptures the causal relationships at base-line observations, when subjects enter thestudy. The transition model consists of two

Page 3: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

3

time slices, which essentially represent thepossible causal relationships within andacross time slices. We also describe how toreshape the longitudinal data correspond-ingly, so as to match the transformed lon-gitudinal model which then can easily bescored using standard SEM software.

• We provide standardized causal effectswhich are computed from IDA estimates[28].

• We carry out experiments on three differ-ent real-world data of (a) patients withchronic fatigue syndrome (CFS), (b) pa-tients with Alzheimer disease (AD), and(c) patients with chronic kidney disease(CKD).

Some relevant methods, have attempted tomake use of common structures to infer causalmodels. Causal Stability Ranking (CStaR) [29],originally designed for gene expression data,tries to find stable rankings of genes (covari-ates) based on their total causal effect on aspecific phenotype (response), using a subsam-pling procedure similar to stability selectionand IDA to estimate causal effects. As CStaRonly focuses on relationships from all covari-ates to a single specific response, it seemsto be difficult to generalize it to other do-mains where any possible causal relationshipmay be of interest. Moreover, another ap-proach called Group Iterative Multiple ModelEstimation (GIMME), originally developed forfunctional Magnetic Resonance Imaging (fMRI)data and essentially an extension of extendedunified SEM (combination of VAR and SEM)[30], aims to combine the group-level causalstructures with the individual-level structures,resulting in a causal model for each individ-ual which contains common structures to thegroup. Such subject-specific estimation may befeasible given relatively long time series (as inresting state fMRI), but likely too challengingfor the typical longitudinal data in clinical stud-ies with a limited number of time slices persubject. Still in the domain of fMRI, there isa method called Independent Multiple-sampleGreedy Equivalence Search (IMaGES) [31]. Themethod is a modification of GES (describedin the following paragraph), and designed to

handle unexpected statistical dependencies incombined data. Since IMaGES was developedmainly for combining results of multiple datasets, we do not consider it further.

Having both the transformed longitudinalmodel and the reshaped data, we can run otheralternative approaches which are designed forcross-sectional data and conduct comprehen-sive comparisons. Here, for evaluation of S3L,we generate simulated data and compare withsome advanced constrained-based approachessuch as PC-stable [32], Conservative PC (CPC)[33], CPC-stable [32], [33], and PC-Max [34]. Allof these methods are extensions of the PC algo-rithm which in principle consists of two stages.The first stage uses conditional independencetests to obtain the skeleton (undirected edges)of the model, and the second stage orients theskeleton based on some rules, resulting in anessential graph or markov equivalence class model(described in Section 2.1, for more detail see[35]). We also compare with an advanced score-based algorithm called Fast Greedy Equiva-lent Search (FGES) [36]. It is an extension ofGES which in general starts with an empty(or sparse) model, and iteratively adds anedge (forward phase) which mostly increasesthe score until no more edge can be added.Then GES iteratively prunes an edge (back-ward phase) which does not decrease/improvethe score until no more edge can be excluded.

The rest of this paper is organized as fol-lows. All methods used in our approach arepresented in Section 2. The results and the cor-responding discussions are presented in Sec-tion 3. Finally, conclusions and future work arepresented in Section 4.

2 METHODS

2.1 Stable Specification Search for Cross-Sectional dataIn [25] we introduced our previous work, S3C,which searches over structures represented bySEMs. In SEMs, refining models to improvethe model quality is called specification search.Generally S3C adopts the concept of stabilityselection [26] in order to enhance the robust-ness of structure learning by considering awhole range of model complexities. Originally,

Page 4: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

4

in stability selection, this is realized by varyinga continuous regularization parameter. Here,we explicitly consider different discrete modelcomplexities. However, to find the optimalmodel structure for each model complexity isa hard optimization problem. Therefore, werephrase stability selection as a multi-objectiveoptimization problem, so that we can jointlyrun over the whole range of model complexi-ties and find the corresponding optimal struc-tures for each model complexity.

In more detail, S3C can be divided intotwo phases. The first phase is search, perform-ing exploratory search over Structural Equa-tion Models (SEMs) using a multi-objectiveevolutionary algorithm called Non-dominatedSorting Genetic Algorithm II (NSGA-II) [37].NSGA-II is an iterative procedure which adoptsthe idea of evolution. It starts with randommodels and in every generation (iteration), at-tempts to improve the quality of the models bymanipulating (refining) good models (parents)to make new models (offsprings). The qualityof the models is characterized by scoring thatis based on two conflicting objectives: model fitwith respect to the data and model complexity.The model manipulations are realized by usingtwo genetic operators: crossover that combinesthe structures of parents and mutation thatflips the structures of models. Moreover, thecomposition of model population in the nextgeneration is determined by selection strategy.One of the key features of NSGA-II is that inevery iteration, it sorts models based on theconcept of domination, yielding fronts or sets ofmodels such that models in front l dominatethose in front l + 1. The domination conceptstates that model m1 is said to dominate modelm2 if and only if model m1 is no worse than m2

in all objectives and the model m1 is strictlybetter than m2 in at least one objective. Thefirst front of the last generation is called thePareto optimal set, giving optimal models for thewhole range of model complexities. Details ofthe NSGA-II algorithm are described in Deb etal [37].

Based on the idea of stability selection [26],S3C subsamples N subsets from the data Dwith size b|D|/2c without replacement, and foreach subset, the search phase above is applied,

giving sets of Pareto optimal models. Afterthat, all Pareto optimal models are transformedinto their corresponding Markov equivalenceclasses which can be represented by CompletedPartially Directed Acyclic Graphs (CPDAGs) [35].Since all DAGs that are a member of the sameMarkov equivalence class represent the sameprobability distribution, they are indistinguish-able based on the observational data alone.In SEMs, these models are called covarianceequivalent [38] and return the same scores.From these CPDAGs we compute the edge andcausal path stability graphs (see Figure 7 foran example) by grouping them according tomodel complexity and computing their selec-tion probability, i.e., the number of occurrencesdivided by the total number of models for acertain level of model complexity. The edgestability considers any edge between a pair ofvariables (i.e., A → B, B → A, or A − B)and the causal path stability considers directedpath, e.g., A → B of any length. Stabilityselection is then performed by specifying twothresholds, πsel (boundary of selection proba-bility) and πbic (boundary of complexity). Forexample, setting πsel = 0.6 means that all causalrelationships with edge stability or causal pathstability greater than or equal to this thresholdare considered stable. The second threshold πbicis used to control overfitting. For every modelcomplexity j, we compute the Bayesian Infor-mation Criterion (BIC) score for each modelin j based on the data subset to which themodel is fitted. We then compute BICj theaverage of BIC scores in model complexity j.We set πbic to the minimum BICj . All causalrelationships with an edge stability or a causalpath stability that is smaller than or equal toπbic (e.g., πbic = 27 in Figure 7c) are consideredparsimonious. Hence, the causal relationshipsgreater than or equal to πsel and smaller thanor equal to πbic are considered both stable andparsimonious and called relevant from whichwe can derive a causal model. In addition,we call the region with which the relevantstructures intersect as relevant region.

The second phase concerns visualization,combining the stability graphs into a graphwith nodes and edges. This is done by addingthe relevant edges and orienting them using

Page 5: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

5

prior knowledge described in Section 2.2.2) andthe relevant causal paths. More specifically, wefirst connect the nodes following the relevantedges. Then we orient these edges based onthe prior knowledge. And finally, we orientthe rest of the edges following the relevantcausal paths. The resulting graph consists ofdirected edges which represent causal relation-ship and possibly with additional undirectededges which represent strong association butfor which the direction is unclear from thedata. Furthermore, following Meinshausen andBuhlmann [26], for each edge in the graphwe take the highest selection probability ithas across different model complexities in therelevant region of the edge stability graphas a measure of reliability and annotate thecorresponding edge with this reliability score.The reliability score indicates the confidenceof a particular relevant structure. The higherthe score, the more we can expect that therelevant structure is not falsely selected [26]. Inaddition each directed edge is annotated witha standardized causal effect estimate which isexplained in Section 2.2.3. The stability graphsare considered to be the main outcome of ourapproach where the visualization eases inter-pretation.

2.2 Stable specification search for longitu-dinal data

Stable Specification Search for Longitudinaldata (S3L) is an extension of S3C. In principle,as illustrated in Figure 1, S3L applies S3C ontransformed longitudinal models, called base-line and transition models (explained in Sec-tion 2.2.1). Furthermore, in order to see towhich extent a covariate would cause a re-sponse, S3L provides standardized total causaleffect estimates which are intrinsically com-puted from estimates from IDA [28] (describedin Section 2.2.3). In the following subsections,we first describe how we transform a longi-tudinal model and reshape the data accord-ingly, and then we discuss the implication ofallowing prior knowledge in our S3C structurelearning.

2.2.1 Longitudinal model and data reshaping

Based on the idea of a “rolling” network in [27]we transform a longitudinal SEM with an ar-bitrary number of time slices (e.g., Figure 2c)into two parts: a baseline model (Figure 2a) anda transition model (Figure 2b). In the originalpaper, the authors treat these models as prob-abilistic networks, here we treat them purelyas SEMs. The baseline model essentially rep-resents the causal relationships between vari-ables that may happen at the initial time slicet0, for instance, causal relationships that oc-cur before a medical treatment started. More-over, the baseline model may also representrelationships of the unobserved process beforet0 [27]. The transition model constitutes thecausal relationships between variables acrosstime slices ti−1 and ti, and between variableswithin time slice ti for i > 0, for example, causalrelationships that represent interactions duringa medical treatment. In S3L, the structure esti-mations will be conducted on the baseline andtransition model separately.

From the transition model we distinguishtwo kinds of causal relationships, namely intra-slice causal relationship (e.g., solid arcs inFigure 2b), and inter-slice causal relationship(e.g., dashed arcs in Figure 2b). The intra-slice causal relationship represents relation-ships within time slice ti. Accordingly the inter-slice causal relationship represents relation-ships between time slices ti−1 and ti. We assumethat the inter-slice causal relationships are in-dependent of t (stationary). We also assumethat the time intervals between time slicesare fairly uniform. In addition, the transitionmodel implies two more constraints (explainedin Section 2.2.2): there is no intra-slice causalrelationship allowed in time slice ti−1 and theinter-slice causal relationships always go for-ward in time, i.e., from time slice ti−1 to timeslice ti.

Moreover, in order to score the transformedmodels, we reshape the longitudinal data ac-cordingly. Figure 3 shows an illustration of thedata reshaping. Suppose we are given longitu-dinal data with s instances, p variables, and itime slices. We assume that the original datashape is in a form of a matrix D of size s × q,

Page 6: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

6

S3L

Reshaped Data

Baseline Data

Longitudinal Data

S3C

A C

B

Baseline Model

A A

B BTransition Model

α/β 

α/β 

C C

Fig. 1: Given a longitudinal data set, S3L uses the baseline observations to infer a baseline model,and reshapes the whole data set to infer a transition model. Both baseline and transition modelare annotated with a reliability score α and a standardized causal effect β.

𝑋1

𝑋2

𝑋3

𝑋4

𝑡0

(a)

𝑋1

𝑋2

𝑋3

𝑋4

𝑋1

𝑋2

𝑋3

𝑋4

𝑡𝑖−1 𝑡𝑖

(b)

𝑋1

𝑋2

𝑋3

𝑋4

𝑋1

𝑋2

𝑋3

𝑋4

𝑋1

𝑋2

𝑋3

𝑋4

𝑡0 𝑡1 𝑡2

(c)

Fig. 2: (a) The baseline model which is used to capture causal relationships at the initialtime slice, e.g., before medical treatment. (b) The transition model which is used to representcausal relationships within and between time slices, e.g., during medical treatment. (c) Thecorresponding “unrolled” longitudinal model.

with q = p × i. The reshaped data is then amatrix D′ of size s′ × q′, with s′ = s(i− 1) andq′ = 2p. Having such reshaped data allows usto use standard SEM software to compute thescores.

2.2.2 Constrained SEMIn practice, we are often given some priorknowledge about the data. The prior knowl-edge which may be, e.g., results of previousstudies, gives us some constraints in terms ofcausal relations. For example, in the case of, saydisease A, there exists some common knowledge

which tells us that symptom S does not causedisease A directly. In terms of a SEM specifica-tion, the prior knowledge can be translated intoa constrained SEM in which there is no directededge from variable S (denotes symptom S) tovariable A (denotes disease A); this still allowsfor directed edges from A to S or directedpaths (indirect relationships) from S to A, e.g.,a path S → ... → A with any variables inbetween. S3C, and hence S3L allow for suchprior knowledge to be included in the model.In S3L, this prior knowledge only applies to theintra-slice causal relationships.

Page 7: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

7

...

...

reshape

Fig. 3: D is a matrix representing the original data shape which consists of s instances, p variables,and i time slices. D′ is a matrix representing the corresponding reshaped data.

Model specifications should comply withany prior knowledge when performing spec-ification search and when measuring the edgeand causal path stability. Recall that in orderto measure the stability, all optimal models(DAGs) are converted into their correspond-ing equivalence class models (CPDAGs). Thismodel transformation, however, could result inCPDAGs that are inconsistent with the priorknowledge. For example, a constraint A 6→ Bmay be violated since arcs B → A in the DAGmay be converted into undirected (reversible)edges A−B in the CPDAG. In order to preserveconstraints, we therefore extended an efficientDAG-TO-CPDAG algorithm of Chickering [35],as described in Rahmadi et al [25]. Essentially,the motivation of our extension to Chickering’salgorithm is similar to that of Meek’s algorithm[39], that is, to obtain a CPDAG consistent withprior knowledge.

2.2.3 Estimating causal effects

We employ IDA [28] to estimate the total causaleffects of a covariate Xi on a response Y fromthe relevant structures. This method works asfollows. Given a CPDAG G = {G1, . . . , Gm}which contains m different DAGs in its equiv-alence class, IDA applies intervention calculus[38], [40] to each DAG Gj to obtain multisetsΘi = {θij}j∈1,...,m, i = 1, . . . , p, where p is the

number of covariates. θij specifies the possiblecausal effect of Xi on Y in graph Gj .

Causal effects can be computed using so-called intervention calculus [38], which aims todetermine the amount of change in a responsevariable Y when one would manipulate thecovariate Xi (and not the other variables). Notethat this notion differs from a regression-typeof association (see IDA paper for illustrativeexamples). Given a DAG Gj , the causal effectθij can be computed using the so-called back-door adjustment, which takes into account theassociations between Y , Xi and the parentspai(Gj) of Xi in Gj . Under the assumptionthat the distribution of the data is normaland the model is linear, causal effects can becomputed from a regression of Y on Xi andits parents. Specifically, we have Maathuis etal., [28] θij = βi|pai(Gj), where, for any setS ⊆ {X1, . . . , Xp, Y } \ {Xi},

βi|S =

{0, if Y ∈ Scoefficient of Xi in Y ∼ Xi + S, if Y 6∈ S,

and Y ∼ Xi + S is the linear regression of Yon Xi and S. Note that IDA estimates the totalcausal effect from a covariate and response,which considers all possible, either direct orindirect, causal paths from the covariate to theresponse.

Page 8: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

8

IDA works for continuous, normally dis-tributed variables and then only requirestheir observed covariance matrix as input tocompute the regression coefficients. FollowingDrasgow [41], we treat discrete variables as sur-rogate continuous variables, substituting thepolychoric correlation for the correlation be-tween two discrete variables and the polyserialcorrelation between a discrete and a continuousvariable.

Our fitting procedure does not yield a sin-gle CPDAG, but a whole set of CPDAGs torepresent the given data. We therefore extendIDA as follows. We gather Gπbic , the CPDAGsof all optimal models with complexity equal toπbic. For each CPDAG G ∈ Gπbic , we computethe possible causal effects Θ of each relevantcausal path using IDA. For example, for thecausal effect from X to Y , we obtain estimatesΘkX→Y , k = 1, . . . , N , where N is the number of

subsets. All causal effect estimations in ΘkX→Y

are then concatenated into a single multisetΘX→Y .

To represent the estimated causal effects fromX to Y , we compute the median ΘX→Y andiff X and Y are continuous variables, we stan-dardize the estimation using

ΘX→Y · σXσY

,

where σX and σY are the standard deviationsof the covariate and the response, respectively.Standardized causal effects allow us to mean-ingfully compare them.

3 RESULTS AND DISCUSSION

3.1 ImplementationWe implemented S3C and S3L as an R pack-age named stablespec. The package is pub-licly available at the Comprehensive R ArchiveNetwork (CRAN)1, so it can be installed di-rectly, e.g., from the R console by typinginstall.package("stablespec") or fromRStudio. We also included a package documen-tation as a brief tutorial on how to use thefunctions.

1. https://cran.r-project.org/web/packages/stablespec/index.html

3.2 Parameter settingsFor application to simulated data and real-world data, we subsampled 50 and 100 subsetsfrom the data with size b|D|/2c, respectively.We did not do comprehensive parameter tun-ing for NSGA-II, instead, we followed guide-lines provided in Grefenstette [42]. The param-eters for applications to both simulated andreal-world data were set as follows: the numberof iterations was 35, the number of modelsin the population was 150, the probability ofapplying crossover was 0.85, the probability ofapplying mutation to a model structure was0.07, and the selection strategy was binary tour-nament selection [43]. We score models usingthe chi-square χ2 and the model complexity. Theχ2 is considered the original fit index in SEMand measures how close the model-implied co-variance matrix is to the sample covariance ma-trix [44]. The model complexity represents howmany parameters (arcs) need to be estimatedin the model. The maximum model complexitywith p variables is given by p(p− 1)/2.

When using multi-objective optimization weminimize both the χ2 and model complexityobjectives. These two objectives are, however,conflicting with each other. For example, min-imizing the model complexity typically meanscompromising the data fit.

3.3 Application to simulated data3.3.1 Data GenerationWe generated data sets from a longitudinalmodel containing four continuous variablesand three time slices (depicted by Figure 4).We generated ten data sets for each of thesesample sizes: 400 and 2000, with random pa-rameterizations. All data sets are made publiclyavailable.2

3.3.2 Performance measureWe conducted comparisons between S3L withFGES, PC-stable, CPC, CPC-stable, and PC-Max in two different scenarios: with and with-out prior knowledge about part of the causaldirections. In the case of prior knowledge,we added that variable X1 at ti cannot cause

2. Available at https://tinyurl.com/smmr-rahmadi-dataset

Page 9: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

9

variables X2 and X3 at ti directly. This priorknowledge translates to constraints that thevarious methods can use to restrict their searchspace. In addition to both scenarios, we alsoadded longitudinal constraints to the modelsof FGES, PC-stable, CPC, CPC-stable, and PC-Max the same as those used in the transitionmodel of S3L, i.e., there is no intra-causal re-lationship from time ti−1 and the inter-slicecausal relationships always go forward in timeti−1 to ti.

The parameters of FGES, PC-stable, CPC,CPC-stable, and PC-Max used in this simula-tion are set following some existing examples[28], [45], [46]. For FGES, the penalty of BICscore is 2 and the vertex degree in the forwardsearch is not limited. For PC-stable, CPC, CPC-stable, and PC-Max, the significance level whentesting for conditional independence is 0.01,and the maximum size of the conditioning setsis infinity.

Moreover, as the true model is known, wemeasure the performance of all approaches bymeans of the Receiver Operating Characteristic(ROC) [47] for both edges and causal paths. Wecompute the True Positive Rate (TPR) and theFalse Positive Rate (FPR) based on the CPDAGof the true model. As for example, in the caseof edge stability, a true positive means thatan edge obtained by our method or the otherapproaches is present in the CPDAG of theground truth.

To compare the ROC curves of our method

𝑋𝑋1

𝑋𝑋2

𝑋𝑋3

𝑋𝑋4

𝑋𝑋1

𝑋𝑋2

𝑋𝑋3

𝑋𝑋4

𝑋𝑋1

𝑋𝑋2

𝑋𝑋3

𝑋𝑋4

𝑡𝑡0 𝑡𝑡1 𝑡𝑡2

Fig. 4: The longitudinal model with four vari-ables and three time slices, used to generatesimulated data.

and those of alternative approaches, we em-ployed three significance tests. The first twotests, as introduced in [48] and in [49] comparethe Area Under the Curve (AUC) of the ROCcurves by using the theory of U-statistics andbootstrap replicates, respectively. The third test[50] compares the actual ROC curves by eval-uating the absolute difference and generatingrank-based permutations to compute the sta-tistical significance. The null hypothesis is that(the AUC of) the ROC curves of our methodand those of alternative approaches are identi-cal.

Furthermore, we computed the ROC curvesusing two different schemes: averaging andindividual. Both schemes are applied to allmethods and to all data sets generated. In theaveraging scheme, the ROC curves are com-puted from the average edge and causal pathstability from different data sets, and then thestatistical significance tests are applied to theseROC curves. On the other hand, in the indi-vidual scheme the ROC curves are computedfrom the edge and causal path stability on eachdata set. We then applied individual statisticalsignificance tests on the ROC curves for eachdata set and used Fisher’s method [51], [52],to combine these test results into a single teststatistic.

The experimental designs (with and with-out prior knowledge) and the ROC schemes(averaging and individual) are aimed to showempirically and comprehensively how robustthe results are of each approach in variouspractical cases as well as against changes in thedata.

3.3.3 DiscussionWe first discuss the result of our experimentson the data set with sample size 400. Figure 5shows the ROC curves for the edge stability((a) and (c)) and the causal path stability ((b)and (d)) from the averaging scheme. Panels(a) and (b) represent the results without priorknowledge, while panels (c) and (d) representthe results with prior knowledge. Table 3 liststhe corresponding AUCs.

Tables 1 and 2 present the results of thesignificance tests for both the averaging andindividual schemes in the experiment with and

Page 10: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

10

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FPR

TP

R

PC−stableCPCFGESS3LCPC−stablePC−Max

(a)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FPR

TP

R

PC−stableCPCFGESS3LCPC−stablePC−Max

(b)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FPR

TP

R

PC−stableCPCFGESS3LCPC−stablePC−Max

(c)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FPR

TP

R

PC−stableCPCFGESS3LCPC−stablePC−Max

(d)

Fig. 5: Results from simulation data with sample size 400: ROC curves for (a) the edge stabilityand (b) the causal path stability (without prior knowledge), and (c) the edge path stability and(d) the causal path stability (with prior knowledge), for different values of πsel in the range of[0, 1]. Table 3 lists the corresponding AUCs.

without prior knowledge, respectively. In thecase without prior knowledge, generally theAUCs of the edge and the causal path stabilityof S3L are better (p-value ≤ 0.05, or even≤ 0.001, few of them are marginally signifi-cant, e.g., p-value ≤ 0.1) than those of otherapproaches according to both schemes, exceptthose of FGES for which generally there isno evidence of a difference (p-value > 0.1).In the case with prior knowledge, in generalthe results are similar to those of experimentwithout prior knowledge, but now the AUCof the causal path stability of S3L is better (p-value ≤ 0.05) than that of FGES. The ROC ofthe causal path stability of S3L is now alsobetter (p-value ≤ 0.05) than those of PC-stable,CPC, CPC-stable, and PC-Max according to

the individual scheme. This is an improvementover the experiment without prior knowledge.

Next we discuss the result of our experi-ments on the data set with sample size 2000.Figure 6 shows the ROC curves and Table 6lists the corresponding AUCs. Tables 4 and5 list the results of the significance tests forboth the averaging and individual schemes inthe experiment with and without prior knowl-edge, respectively. In the case without priorknowledge, generally the AUCs of the edgeand the causal path stability of S3L are betterthan (p-value ≤ 0.05) those of other approachesaccording to the individual scheme. Moreover,the ROCs of the edge and the causal pathstability of S3L are better than those of FGES(p-value ≤ 0.001) and CPC-stable (p-value ≤

Page 11: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

11

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FPR

TP

R

PC−stableCPCFGESS3LCPC−stablePC−Max

(a)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FPR

TP

R

PC−stableCPCFGESS3LCPC−stablePC−Max

(b)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FPR

TP

R

PC−stableCPCFGESS3LCPC−stablePC−Max

(c)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FPR

TP

R

PC−stableCPCFGESS3LCPC−stablePC−Max

(d)

Fig. 6: Results from simulation data with sample size 2000: ROC curves for (a) the edge stabilityand (b) the causal path stability (without prior knowledge), and (c) the edge path stability and(d) the causal path stability (with prior knowledge), for different values of πsel in the range of[0, 1]. Tables 6 lists the corresponding AUCs.

0.1), respectively, according to the individualscheme. In the case with prior knowledge, theresults are pretty much similar to those of theexperiment without prior knowledge, but onlynow the p-value tends to become smaller, e.g.,(p-value ≤ 0.001).

To conclude, we see that in general S3Lattains at least comparable performance as, butoften a significant improvement over, alterna-tive approaches. This holds in particular forcausal directions and in the case of a smallsample size. The presence of prior knowledgeenhances the performance of the S3L.

3.4 Application to real-world dataHere the true model is unknown, so we canonly compare the results of S3L with those

reported in earlier studies and interpretationby medical experts. We set the thresholds toπsel = 0.6 and πbic to the model complexitywhere the minimum average of BIC scores isfound. By thresholding we get the relevantcausal relationships: those which occur in therelevant region. Details of the procedure aregiven in Section 2.1.

The model assumptions in the application toreal-world data follow from the assumptionsof S3L in the default setting. The assumptionsinclude iid samples on each time slice, linearsystem, independent gaussian noise, no latentvariables, stationary, and fairly uniform timeintervals between time slices.

Moreover, there is an important note relatedto the visualization of the stability graphs.

Page 12: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

12

TABL

E1:

Tabl

eof

p-va

lues

from

com

pari

sons

onda

tase

tw

ith

sam

ple

size

400

betw

een

S3L

and

alte

rnat

ive

appr

oach

esw

itho

utpr

ior

know

ledg

e.Th

enu

llhy

poth

esis

isth

at(t

heA

UC

of)

the

RO

Ccu

rves

ofS3

Lan

dth

ose

ofal

tern

ativ

eap

proa

ches

are

equi

vale

nt.F

orea

chsi

gnifi

canc

ete

st,w

eco

mpa

red

the

RO

Cof

the

edge

(Edg

e)an

dca

usal

path

(Cau

sal)

stab

ility

(see

Figu

re5a

and

5b)

onbo

thav

erag

ing

(Avg

.)an

din

divi

dual

(Ind

.)sc

hem

es.

FGES

PC-s

tabl

eC

PCC

PC-s

tabl

ePC

-Max

Sign

ifica

nce

test

Avg

.In

d.A

vg.

Ind.

Avg

.In

d.A

vg.

Ind.

Avg

.In

d.D

eLon

g[4

8]Ed

ge0.315

0.909

0.021

<10−

50.025

<10−

50.

052

<10−

50.050

<10−

5

Cau

sal

0.451

0.109

0.06

9<

10−

50.825

<10−

50.012

<10−

50.126

<10−

5

Boot

stra

p[4

9]Ed

ge0.331

0.935

0.020

<10−

50.024

<10−

50.

051

<10−

50.049

<10−

5

Cau

sal

0.466

0.09

00.

063

<10−

50.830

<10−

50.010

<10−

50.121

<10−

5

Venk

atra

man

[50]

Edge

0.304

0.906

0.09

10.102

0.07

60.118

0.359

0.516

0.449

0.743

Cau

sal

0.332

0.197

0.831

0.225

0.845

0.365

0.569

0.512

0.584

0.131

TABL

E2:

Tabl

eof

p-va

lues

from

com

pari

sons

onda

tase

twit

hsa

mpl

esi

ze40

0be

twee

nS3

Lan

dal

tern

ativ

eap

proa

ches

wit

hpr

ior

know

ledg

e.Th

enu

llhy

poth

esis

isth

at(t

heA

UC

of)

the

RO

Ccu

rves

ofS3

Lan

dth

ose

ofal

tern

ativ

eap

proa

ches

are

equi

vale

nt.

For

each

sign

ifica

nce

test

,we

com

pare

dth

eR

OC

ofth

eed

ge(E

dge)

and

caus

alpa

th(C

ausa

l)st

abili

ty(s

eeFi

gure

5can

d5d

)on

both

aver

agin

g(A

vg.)

and

indi

vidu

al(I

nd.)

sche

mes

.

FGES

PC-s

tabl

eC

PCC

PC-s

tabl

ePC

-Max

Sign

ifica

nce

test

Avg

.In

d.A

vg.

Ind.

Avg

.In

d.A

vg.

Ind.

Avg

.In

d.D

eLon

g[4

8]Ed

ge0.

090

0.146

0.08

6<

10−

30.

099

<10−

50.219

0.001

0.227

0.002

Cau

sal

0.264

0.003

0.06

1<

10−

50.035

<10−

50.022

<10−

50.031

<10−

5

Boot

stra

p[4

9]Ed

ge0.118

0.188

0.08

4<

10−

50.

099

<10−

50.208

<10−

30.223

0.001

Cau

sal

0.251

0.002

0.06

0<

10−

50.031

<10−

50.020

<10−

50.026

<10−

5

Venk

atra

man

[50]

Edge

0.430

0.598

0.05

60.680

0.103

0.543

0.680

0.998

0.707

0.998

Cau

sal

0.637

0.783

0.485

0.004

0.06

9<

10−

30.116

0.09

40.171

0.007

TABL

E3:

Tabl

eof

AU

Cs

for

the

edge

and

caus

alpa

thst

abili

tyfo

rea

chm

etho

d,fr

omsi

mul

atio

non

data

wit

hsa

mpl

esi

ze40

0,w

ith

(yes

)an

dw

itho

utpr

ior

know

ledg

e(n

o).

S3L

FGES

PC-s

tabl

eC

PCC

PC-s

tabl

ePC

-Max

AU

Cno

yes

noye

sno

yes

noye

sno

yes

noye

sed

ge0.80

0.74

0.83

0.81

0.63

0.60

0.63

0.62

0.63

0.65

0.63

0.65

caus

al0.92

0.96

0.90

0.93

0.84

0.90

0.92

0.89

0.78

0.84

0.85

0.85

Page 13: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

13

TABL

E4:

Tabl

eof

p-va

lues

from

com

pari

sons

onda

tase

tw

ith

sam

ple

size

2000

betw

een

S3L

and

alte

rnat

ive

appr

oach

esw

itho

utpr

ior

know

ledg

e.Th

enu

llhy

poth

esis

isth

at(t

heA

UC

of)

the

RO

Ccu

rves

ofS3

Lan

dth

ose

ofal

tern

ativ

eap

proa

ches

are

equi

vale

nt.F

orea

chsi

gnifi

canc

ete

st,w

eco

mpa

red

the

RO

Cof

the

edge

(Edg

e)an

dca

usal

path

(Cau

sal)

stab

ility

(see

Figu

re6a

and

6b)

onbo

thav

erag

ing

(Avg

.)an

din

divi

dual

(Ind

.)sc

hem

es.

FGES

PC-s

tabl

eC

PCC

PC-s

tabl

ePC

-Max

Sign

ifica

nce

test

Avg

.In

d.A

vg.

Ind.

Avg

.In

d.A

vg.

Ind.

Avg

.In

d.D

eLon

g[4

8]Ed

ge1.000

0.09

90.223

0.010

0.320

0.014

0.07

10.001

0.118

0.009

Cau

sal

0.05

2<

10−

50.563

<10−

30.353

<10−

50.221

<10−

50.952

0.183

Boot

stra

p[4

9]Ed

ge1.000

0.103

0.222

0.007

0.321

0.010

0.07

70.003

0.103

0.006

Cau

sal

0.045

<10−

50.554

<10−

30.357

<10−

50.202

<10−

50.952

0.161

Venk

atra

man

[50]

Edge

0.480

0.963

0.187

0.801

0.212

0.872

0.06

90.900

0.100

0.972

Cau

sal

0.418

<10−

30.404

0.637

0.289

0.339

0.726

0.897

0.520

0.250

TABL

E5:

Tabl

eof

p-va

lues

from

com

pari

sons

onda

tase

twit

hsa

mpl

esi

ze20

00be

twee

nS3

Lan

dal

tern

ativ

eap

proa

ches

wit

hpr

ior

know

ledg

e.Th

enu

llhy

poth

esis

isth

at(t

heA

UC

of)

the

RO

Ccu

rves

ofS3

Lan

dth

ose

ofal

tern

ativ

eap

proa

ches

are

equi

vale

nt.

For

each

sign

ifica

nce

test

,we

com

pare

dth

eR

OC

ofth

eed

ge(E

dge)

and

caus

alpa

th(C

ausa

l)st

abili

ty(s

eeFi

gure

6can

d6d

)on

both

aver

agin

g(A

vg.)

and

indi

vidu

al(I

nd.)

sche

mes

.

FGES

PC-s

tabl

eC

PCC

PC-s

tabl

ePC

-Max

Sign

ifica

nce

test

Avg

.In

d.A

vg.

Ind.

Avg

.In

d.A

vg.

Ind.

Avg

.In

d.D

eLon

g[4

8]Ed

ge0.296

0.978

0.413

<10−

30.348

0.005

0.147

<10−

30.122

<10−

3

Cau

sal

0.142

<10−

50.817

<10−

30.698

<10−

30.043

<10−

50.279

<10−

5

Boot

stra

p[4

9]Ed

ge0.295

0.983

0.412

<10−

30.344

0.002

0.146

<10−

50.125

<10−

5

Cau

sal

0.144

<10−

50.833

<10−

30.706

<10−

30.043

<10−

50.279

<10−

5

Venk

atra

man

[50]

Edge

0.761

0.862

0.119

0.207

0.210

0.290

0.146

0.08

20.140

0.257

Cau

sal

0.486

0.595

0.384

0.763

0.172

0.742

0.488

0.984

0.652

0.903

TABL

E6:

Tabl

eof

AU

Cs

for

the

edge

and

caus

alpa

thst

abili

tyfo

rea

chm

etho

d,fr

omsi

mul

atio

non

data

wit

hsa

mpl

esi

ze20

00,

wit

h(y

es)

and

wit

hout

prio

rkn

owle

dge

(no)

.

S3L

FGES

PC-s

tabl

eC

PCC

PC-s

tabl

ePC

-Max

AU

Cno

yes

noye

sno

yes

noye

sno

yes

noye

sed

ge0.77

0.73

0.77

0.78

0.67

0.67

0.69

0.65

0.62

0.61

0.65

0.60

caus

al0.92

0.93

0.85

0.90

0.90

0.94

0.88

0.93

0.87

0.87

0.92

0.91

Page 14: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

14

A DAG without edges will always be trans-formed into a CPDAG without edges. A fullyconnected DAG without prior knowledge willbe transformed into a CPDAG with only undi-rected edges. However if prior knowledge isadded, a fully connected DAG will be trans-formed into a CPDAG in which the edgescorresponding to the prior knowledge are di-rected. From these observations it follows thatin the edge stability graph all paths start witha selection probability of 0 and end up in aselection probability of 1. In the causal pathstability graph when no prior knowledge hasbeen added all paths start with a selectionprobability of 0 and end up in a selection prob-ability of 0. However, when prior knowledgeis added some of the paths may end up in aselection probability of 1 because of the addedconstraints.

3.4.1 Application to chronic fatigue syndromedata

Our first application to real-world data consid-ers a longitudinal data set of 183 patients withchronic fatigue syndrome (CFS) who receivedcognitive behavior therapy (CBT) [53]. Empiricalstudies have shown that CBT can significantlyreduce fatigue severity. In this study we focuson the causal relationships between cognitionsand behavior in the process of reducing sub-ject’s fatigue severity. We therefore include sixvariables namely fatigue severity, the sense ofcontrol over fatigue, focusing on the symptoms,the objective activity of the patient (oActivity),the subject’s perceived activity (pActivity), andthe physical functioning. The data set consistsof five time slices where the first and the fifthtime slices are the pre- and post-treatment ob-servations, respectively, and the second untilthe fourth time slices are observations duringthe treatment. The missing data is 8.7% andto impute the missing values, we used singleimputation with Expectation Maximization (EM)in SPSS [54]. As all of the variables have largescales, e.g., in the range between 0 to 155, wetreat them as continuous variables. We addedprior knowledge that the variable fatigue at t0and ti does not cause any of the other variablesdirectly. This is a common assumption made in

the analysis of CBT in order to investigate thecausal impact on fatigue severity [53], [55].

First we discuss the baseline model, whichonly considers the baseline causal relation-ships. The corresponding stability graphs canbe seen in Figures 7a and 7b. As mentionedbefore, πsel is set to 0.6 and from the searchphase of S3L we found that πbic = 6. Figures 7aand 7b show that three relevant edges and tworelevant causal paths were found. Followingthe visualization procedure (see visualizationphase in Section 2.1), we get a baseline modelin Figure 8a. The model shows that pActivity isa direct cause for fatigue severity. This followsfrom the prior assumption that we made andis consistent with earlier works [53], [55]. Thiscausal relationship suggests that a reductionof (perceived) activity, leads to an increase offatigue. In addition we found a strong rela-tionship between pActivity and oActivity whosedirection cannot be determined. This relation-ship is somewhat sensible as both variablesmeasuring patient’s activity. We also found aconnection between focusing and control, whichis not surprising as focusing on symptomsalso depends on patient’s sense of control overfatigue. One would expect that if a patient hasless control on the fatigue, the focus on thesymptom would increase.

Next we discuss the transition model, whichconsiders all causal relationships over timeslices. The corresponding stability graphs aredepicted in Figures 7c and 7d. We set πsel = 0.6and the search phase of S3L yielded πbic = 27.Figures 7c shows that nineteen relevant edgeswere found, consisting of eleven intra-slice(blue lines) and eight inter-slice relationshipsof which six are between the same variables(orange lines) and two are between differentvariables (black lines). Figure 7d shows thatthirty-five relevant causal paths were found,consisting of twelve intra-slice (blue lines) andtwenty-three inter-slice relationships of whichsix are between the same variables (orangelines) and seventeen are between different vari-ables (black lines). Applying the visualizationprocedure, we get the transition model in Fig-ure 8b. The model shows that all variableshave intra-slice causal relationships to fatigueseverity. These relationships are consistent with

Page 15: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

15

0.0

0.2

0.4

0.6

0.8

1.0

Model Complexity

Sel

ectio

n P

roba

bilit

y

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

πbic

πsel

pActivity — fatigue oActivity — pActivity focusing — control

(a)

0.0

0.2

0.4

0.6

0.8

1.0

Model Complexity

Sel

ectio

n P

roba

bilit

y

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

pActivity —> fatiguecontrol —> fatigue

πbic

πsel

(b)

Model complexity

Sel

ectio

n pr

obab

ility

0 2 4 6 8 10 13 16 19 22 25 28 31 34 37 40 43 46 49

00.

20.

40.

60.

81

intra−slice relationshipinter−slice relationship(same variables)inter−slice relationship(different variables)

πbic

πsel

(c)Model complexity

Sel

ectio

n pr

obab

ility

0 2 4 6 8 10 13 16 19 22 25 28 31 34 37 40 43 46 49

00.

20.

40.

60.

81

intra−slice relationshipinter−slice relationship(same variables)inter−slice relationship(different variables)

πbic

πsel

(d)

Fig. 7: The stability graphs of the baseline model in (a) and (b) and the transition model in (c)and (d) for chronic fatigue syndrome, with edge stability in (a) and (c), and causal path stabilityin (b) and (d). The relevant regions, above πsel and left of πbic, contain the relevant structures.

[53], [55], [56] which conclude that duringthe CBT an increase in sense of control overfatigue, physical functioning, and perceivedphysical activity, together with a decrease infocusing on symptoms lead to a lower level offatigue severity. Interestingly, the actual activityseems insufficient to reduce fatigue severity[53], however, how the patient perceives hisown activity does seem to help. Additionally,we also found that, with similar causal ef-fects, all variables (except pActivity and fatigue)also cause the change in fatigue indirectly viapActivity as an intermediate variable. This sug-gests that, as discussed in [53] an increasein perceived activity does seem important toexplain the change in fatigue. The variablesfocusing and functioning also appear to be in-direct causes of changes in the level of fatigue

severity.

3.4.2 Application to Alzheimer’s disease dataFor the second application to real-world data,we consider a longitudinal data set aboutAlzheimer’s Disease (AD), which is providedby the Alzheimer’s Disease Neuroimaging Ini-tiative (ADNI) [57], and can be accessed atadni.loni.usc.edu. The ADNI was launched in2003 as a public-private partnership, led byPrincipal Investigator Michael W. Weiner, MD.The primary goal of ADNI has been to testwhether serial magnetic resonance imaging(MRI), positron emission tomography (PET),other biological markers, and clinical and neu-ropsychological assessment can be combinedto measure the progression of mild cognitiveimpairment (MCI) and early AD. For up-to-dateinformation see www.adni-info.org.

Page 16: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

16

0.94

control(𝑡0)

pActivity(𝑡0)

oActivity(𝑡0)

functioning(𝑡0)

focusing(𝑡0)

fatigue(𝑡0)

0.96/0.39

0.65

(a)

control(𝑡𝑖−1)

pActivity(𝑡𝑖−1)

oActivity(𝑡𝑖−1)

functioning(𝑡𝑖−1)

focusing(𝑡𝑖−1)

fatigue(𝑡𝑖−1)

control(𝑡𝑖)

pActivity(𝑡𝑖)

oActivity(𝑡𝑖)

functioning(𝑡𝑖)

focusing(𝑡𝑖)

fatigue(𝑡𝑖)1/0.71

0.95/0.46

1/0.61

1/0.44

1/0.68

1/0.77

1/0.67

0.76/-0.42

1/0.33

0.75/-0.09

1/-0.40

0.61/0.02

1/0.26

0.91/0.23

1/-0.33

1/-0.40

1/-0.431/0.34

1/-0.52

(b)

Fig. 8: (a) The baseline model and (b) the transition model of chronic fatigue syndrome. Thedashed line represents a strong relation between two variables but the causal direction cannot bedetermined from the data. Each edge has a reliability score (the highest selection probability inthe relevant region of the edge stability graph) and a standardized total causal effect estimation.For example, the annotation “1/0.71” represents a reliability score of 1 and a standardized totalcausal effect of 0.71. Note that the standardized total causal effect represents not just the directcausal effect corresponding to the edge, but the total causal effect also including indirect effects

.

In the present paper we focus on patientswith MCI, an intermediate clinical stage in AD[58]. Following Haight et al., [59] we includeonly the variables: subject’s cognitive dysfunc-tion (ADAS-Cog), hippocampal volume (hip-pocampal vol), whole brain volume (brain vol),and brain glucose metabolism (brain glucose).The data set contains 179 subjects with fourcontinuous variables and six time slices. Thefirst time slice captures baseline observationsand the next time slices are for the follow-upobservations. The missing data is 22.9% andlike in the application to CFS, we imputed themissing values using single imputation withEM. We added prior knowledge that the vari-able ADAS-Cog at t0 and ti does not cause anyof the other variables directly. We performedthe search over 100 subsamples of the originaldata set.

First we discuss the baseline model whichonly considers the baseline causal relation-ships. The corresponding stability graphs areshown in Figures 9a and 9b. πsel is set to0.6 and the search phase of S3L found that

πbic = 4. Figures 9a and 9b show that four rele-vant edges and two relevant causal paths werefound. Following the visualization procedure,we obtain the baseline model in Figure 10a. Wefound that an increase in both brain glucosemetabolism and hippocampal volume causesreduction in subject’s cognitive dysfunction.These causal relations are consistent with find-ings in [59] which also concluded that bothbrain glucose and hippocampal vol were inde-pendently related to ADAS-Cog (in our model,it is represented by independent direct causalpaths). Additionally, strong relations betweenhippocampal volume and brain volume seemplausible as they both measure the volume ofthe brain (partly and entirely).

Next we discuss the transition model whichconsiders all causal relationships across timeslices. We set πsel = 0.6 and the search phaseof S3L yielded πbic = 12. The correspondingstability graphs can be seen in Figures 9c and9d. We found twelve relevant edges (see Fig-ure 9c), consisting of four intra-slice (blue lines)and eight inter-slice relationships of which four

Page 17: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

17

0.0

0.2

0.4

0.6

0.8

1.0

Model Complexity

Sel

ectio

n P

roba

bilit

y

0 1 2 3 4 5 6

πbic

πsel

hippocampal_vol — brain_volhippocampal_vol — brain_glucosehippocampal_vol — ADAS−Cogbrain_glucose — ADAS−Cog

(a)

0.0

0.2

0.4

0.6

0.8

1.0

Model Complexity

Sel

ectio

n P

roba

bilit

y

0 1 2 3 4 5 6

hippocampal_vol —> ADAS−Cogbrain_glucose —> ADAS−Cog

πbic

πsel

(b)

Model complexity

Sel

ectio

n pr

obab

ility

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

00.

20.

40.

60.

81

intra−slice relationship

inter−slice relationship(same variables)

inter−slice relationship(different variables)

πbic

πsel

(c)

Model complexity

Sel

ectio

n pr

obab

ility

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

00.

20.

40.

60.

81

intra−slice relationship

inter−slice relationship(same variables)

inter−slice relationship(different variables)

πbic

πsel

(d)

Fig. 9: The stability graphs of the baseline model in (a) and (b) and the transition model in (c)and (d) for Alzheimer’s disease, with edge stability in (a) and (c), and causal path stability in(b) and (d). The relevant regions, above πsel and left of πbic, contain the relevant structures.

are between the same variables (orange lines)and four are between different variables (blacklines). Moreover, we found seventeen relevantcausal paths (see Figure 9d), consisting of sixintra-slice (blue lines) and eleven inter-slicerelationships of which four are between thesame variables (orange lines) and seven arebetween different variables (black lines). Ap-plying the visualization procedure, we obtainthe transition model in Figure 10b. In addition,the direction of the edge from brain glucoseto brain vol follows because we do not allowcycles in our model. We found that there areindirect and direct causal relationships fromhippocampal vol and brain vol at both ti−1 and tito ADAS-Cog at ti. These particular causal rela-tionships support the hypothesis in [59] whichsays that any changes in both hippocampalvolume and brain volume will cause short-

term effects on a subject’s cognitive dysfunc-tion, both direct and indirect. In the originalpaper the authors suggested that the indirectcausal relationship is through brain glucose, butour analysis also discovers a potential indirecteffect through brain vol. Interestingly we foundthat a change in subject’s cognitive dysfunctionin a previous time slice ti−1 causes a reductionin brain volume in time slice ti.

3.4.3 Application to chronic kidney diseaseFor the third application to real-world data, weconsider a longitudinal data set about chronickidney disease (CKD), provided by the MAS-TERPLAN study group [60]. The MASTER-PLAN study was initiated in 2004 as a ran-domized, controlled trial studying the effectof intensified treatment with the aid of nursepractitioners on cardiovascular and kidney out-

Page 18: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

18

brain_glucose(𝑡0)

hippocampal_vol(𝑡0)

brain_vol(𝑡0)

ADAS-Cog(𝑡0)

0.90/-0.28

0.98/-0.25

0.90

1

(a)

brain_glucose(𝑡𝑖−1)

hippocampal_vol(𝑡𝑖−1)

brain_vol(𝑡𝑖−1)

ADAS-Cog(𝑡𝑖−1)

brain_glucose(𝑡𝑖)

hippocampal_vol(𝑡𝑖)

brain_vol(𝑡𝑖)

ADAS-Cog(𝑡𝑖)1/0.81

1/0.54

1/0.98

1/1

1/-0.08

0.85/00.67/0.540.75/-0.25

0.71/-0.84

0.95/-0.22 0.83/-1.21

0.62/-0.04

(b)

Fig. 10: (a) The baseline model and (b) the transition model of Alzheimer’s disease. The dashedline represents a strong relation between two variables but the causal direction cannot bedetermined from the data. Each edge has a reliability score (the highest selection probability inthe relevant region of the edge stability graph) and a standardized total causal effect estimation.For example, the annotation “1/0.81” represents a reliability score of 1 and a total standardizedcausal effect of 0.81. Note that the standardized total causal effect represents not just the directcausal effect corresponding to the edge, but the total causal effect also including indirect effects

.

come in CKD. This intensified treatment regi-men addressed eleven possible risk factors forthe progression of CKD simultaneously. Thestudy previously showed that this intensifiedtreatment resulted in fewer patients reachingend stage kidney disease compared to standardtreatment [60].

Here we focus on the potential causal me-diators for the protective effect incurred bythe intensified treatment with the aid of nursepractitioners. In other words, we aim to iden-tify which of the treatment targets contributedto the observed overall treatment effect. Inthe present analysis, we include only vari-ables of interest, being treatment status, eithernurse practitioner aided care or standard care,as allocated by the randomization procedure(treatment), estimated glomerular filtration rate(gfr)—a marker for overall kidney function,and a variable indicating informative censor-ing (inf cens). Informative censoring occurredwhen patients reached end stage kidney dis-ease requiring renal replacement therapy, suchas dialysis or a kidney transplantation, or whenthey died. Furthermore, we considered treat-ment targets that were previously hypothe-sized to contribute most to the overall treat-

ment effect: systolic blood pressure (sbp), LDL-cholesterol (ldl) and parathyroid hormone (pth)concentrations in blood, and protein excretionvia urine (pcr). In total, there are 497 sub-jects with seven variables (both continuous anddiscrete) over five time slices. The first timeslice contains the baseline observations takenbefore treatment, and the next time slices arethe follow-up observations during treatment.Particularly we set the variable treatment onlyat ti−1 as it remains the same over all timeslices, and the variable inf cens only at ti asit is a consequence of previous treatment. Wefurther added the prior knowledge that gfr at tidoes not directly cause any other variables, andthat there are no relations between any variableand inf cens within ti. Both gfr and inf cens areread-out for CKD progression and are within atime slice always the consequence and neverthe cause of another variable. However, werelax this prior knowledge at time slice t0 asit is a common assumption that without thetreatment, pth is a consequence of poor kidneyfunction. The missing data is 5.2% and a singleimputation with EM was conducted to imputethe missing values like in applications to CFSand ADNI data. We performed the search over

Page 19: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

19

100 subsamples of the original data set.First we discuss the baseline model, which

only considers the baseline causal relation-ships. Figures 11a and 11b depict the corre-sponding stability graphs. As in applicationsto CFS and ADNI data, πsel is set to 0.6 andbased on the search phase of S3L we foundthat πbic = 2. Figures 11a and 11b shows thattwo relevant edges were found. Applying thevisualization procedure, we get the baselinemodel in Figure 12a. We found that both pthand pcr were associated with kidney functionat baseline. The direction of these associationsremains unclear. From renal physiology, weknow that proteinuria may result in kidneydamage. However, kidney damage and pro-teinuria may be common consequences of hy-pertension at an earlier stage in the patient’shistory. The association between parathyroidhormone and GFR is unsurprising, as calciumand phosphate metabolism is disrupted in pa-tients with advanced kidney disease. However,elevated pth may in turn result in further kid-ney damage by increased vascular calcification.In other words, the associations seem plau-sible from a physiological point of view, butthe association may be in either direction. Inthe CKD example, a causal direction is almostimpossible to ascertain when only using cross-sectional data.

Next we discuss the transition model, whichtakes into account all causal relationshipsacross time slices. We set πsel = 0.6 and foundπbic = 23. Based on Figure 11c, we obtainedseventeen relevant edges, consisting of fourintra-slice (blue lines) and thirteen inter-slicerelationships of which five are between thesame variables (orange lines) and eight arebetween different variables (black lines). Basedon Figure 11d, we obtained twenty-six relevantcausal paths, consisting of five intra-slice (bluelines) and twenty-one inter-slice relationshipsof which five are between the same variables(orange lines) and sixteen are between differentvariables (black lines). Applying the visualiza-tion procedure, we get the transition model inFigure 12b. Most of the intra-slice and inter-slice causal relationships are very stable withselection probabilities close to 1. We foundinter-slice causal relationships from gfr, sbp,

pth, and pcr to inf cens. Furthermore, gfr, sbp,and pcr are well known determinants for CKDprogression. The causal relationship from pthto inf cens was somewhat surprising. However,pth is a marker for regulation of phosphatestores in the body and related to overall vascu-lar damage through vascular calcification, andmay thereby be related to mortality. Indeed,literature indicates that lowering pth in dialysispatients resulted in a reduction in mortality[61]. The same may hold true for patients whohave CKD and who do yet need dialysis treat-ment. Perhaps most surprising are the relationsbetween sbp and pcr and gfr, respectively. Fromrenal physiology we know that higher filtra-tion pressures due to higher blood pressurecauses the short term glomerular filtration rateto increase slightly [62]. Likewise, at higherfiltration pressure, more and larger proteins arepushed out of the blood stream and into thepro-urine and are ultimately excreted via theurine. In the long term, chronically elevated fil-tration pressures and elevated levels of proteinin the pro-urine cause kidney damage and ulti-mately even end stage kidney disease. Overall,the results are consistent with literature andphysiology [63].

4 CONCLUSION AND FUTURE WORK

Causal discovery from longitudinal data turnsout to be an important problem in many disci-plines. In the medical domain, revealing causalrelationships from a given data set may lead toimprovement of clinical practice, e.g., furtherdevelopment of treatment and medication. Inthe past decades, many causal discovery al-gorithms have been introduced. These causaldiscovery algorithms, however, have difficultydealing with the inherent instability in struc-ture estimation.

The present work introduces S3L, a noveldiscovery algorithm for longitudinal data thatis robust for finite samples, extending ourprevious method [25] on cross-sectional data.S3L adopts the concept of stability selectionto improve the robustness of structure learn-ing by taking into account a whole range ofmodel complexities. Since finding the optimalmodel structure for each model complexity

Page 20: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

20

0.0

0.2

0.4

0.6

0.8

1.0

Edge Stability

Model Complexity

Sel

ectio

n P

roba

bilit

y

0 1 2 3 4 5 6 7 8 9 10

πbic

πsel

pth — gfrpcr — gfr

(a)

0.0

0.2

0.4

0.6

0.8

1.0

Causal Path Stability

Model Complexity

Sel

ectio

n P

roba

bilit

y

0 1 2 3 4 5 6 7 8 9 10

πbic

πsel

(b)

Model complexity

Sel

ectio

n pr

obab

ility

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46

00.

20.

40.

60.

81

intra−slice relationship

inter−slice relationship(same variables)

inter−slice relationship(different variables)

πbic

πsel

(c)Model complexity

Sel

ectio

n pr

obab

ility

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46

00.

20.

40.

60.

81

intra−slice relationship

inter−slice relationship(same variables)

inter−slice relationship(different variables)

πbic

πsel

(d)

Fig. 11: The stability graphs of the baseline model in (a) and (b) and the transition model in (c)and (d) for chronic kidney disease, with edge stability in (a) and (c), and causal path stabilityin (b) and (d). The relevant regions, above πsel and left of πbic, contain the relevant structures.

is a hard optimization problem, we rephrasestability selection as a multi-objective optimiza-tion problem, so that we can jointly optimizeover the whole range of model complexitiesand find the corresponding optimal structures.Moreover, S3L is a general framework thatcan be combined with alternative approaches,without modifying their original assumptions,e.g., linearity, non-Gaussian noise, etc.

The comparison on the simulated data showsthat S3L achieves at least comparable perfor-mance as, but often a significant improvementover alternative approaches, mainly in obtain-ing the causal relations, and in the case ofsmall sample size. Moreover, the results ofexperiments on three real-world data sets arecorroborated by literature studies [53], [55],[56], [59], [61], [63], [64], [65], [66].

However, the current method considers only

longitudinal data with observed variables andcannot handle missing values (other thanthrough imputation as a preprocessing step).We also still assume that the time inter-vals between time slices is fairly uniformbetween subjects. Some existing approachescalled random-coefficient models, also termedmulti-level or hierarchical regression models [67],[68], are flexible to handle unequal intervalsbetween time slices within a subject and/oracross subjects. Future research will aim toaccount for these aforementioned issues.

ACKNOWLEDGMENTS

The authors wish to thank Thaddeus J. Haight,Falma Kemalasari, Joseph Ramsey and twoanonymous referees for their valuable discus-sions, comments, and suggestions.

Page 21: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

21

1

sbp(𝑡0)

ldl(𝑡0)

pcr(𝑡0)

pth(𝑡0)

gfr(𝑡0)

0.83

(a)

treatment(𝑡𝑖−1)

sbp(𝑡𝑖−1)

ldl(𝑡𝑖−1)

pcr(𝑡𝑖−1)

pth(𝑡𝑖−1)

gfr(𝑡𝑖−1)

inf_cens(𝑡𝑖)

sbp(𝑡𝑖)

ldl(𝑡𝑖)

pcr(𝑡𝑖)

pth(𝑡𝑖)

gfr(𝑡𝑖)1/0.88

0.98/-0.04

1/0.61

1/0.54

1/0.64

1/0.65

0.94/0.32

0.80/0.07

0.98/0.02

0.92/-0.33

0.83/0.14

0.82/-0.02

0.88/-0.13

1/-0.47

1/0.37

0.78

0.72/-0.03

(b)

Fig. 12: (a) The baseline model and (b) the transition model of chronic kidney disease. Thedashed line represents a strong relation between two variables but the causal direction cannotbe determined from the data. Each edge has a reliability score (the highest selection probability inthe relevant region of the edge stability graph) and a standardized total causal effect estimation.For example, the annotation “1/0.88” represents a reliability score of 1 and a standardized totalcausal effect of 0.88. Note that the standardized total causal effect represents not just the directcausal effect corresponding to the edge, but the total causal effect also including indirect effects

.

The OPTIMISTIC Consortium comprises:

Partner 1: Radboud University MedicalCentre, The Netherlands, Ms. ShaghayeghAbghari; Dr. Armaz Aschrafi; Mrs. SachaBouman; Ms. Yvonne Cornelissen; Dr. JeffreyGlennon; Dr. Perry Groot; Prof. Arend Heer-schap; Ms. Linda Heskamp; Prof. Tom Heskes;Ms. Katarzyna Kapusta; Mrs. Ellen Klerks; Dr.Hans Knoop; Mrs. Daphne Maas; Mr. KeesOkkersen; Dr. Geert Poelmans, Mr. Ridho Rah-madi; Prof. dr. Baziel van Engelen (Chief In-vestigator and Partner lead); Dr. Marlies vanNimwegen.Partner 2: University of Newcastle upon Tyne,UK, Dr. Grainne Gorman (Partner lead); Ms.Cecilia Jimenez Moreno; Prof. Hanns Lochm-ller; Prof. Mike Trenell; Ms. Sandra van Laar;Ms. Libby Wood.Partner 3: Ludwig- Maximilians-UniversittMnchen, Germany, Prof. dr. Benedikt Schoser(Partner lead); Dr. Stephan Wenninger; Dr. An-gela Schller.Partner 4: Assistance Publique-Hpitaux de

Paris, France, Mrs Rmie Auguston; Mr. LignierBaptiste; Dr. Caroline Barau; Prof. GuillaumeBassez (Partner lead); Mrs. Pascale Chevalier;Ms. Florence Couppey; Ms. Stphanie Delmas;Prof. Jean-Franois Deux; Mrs. Celine Dogan;Ms. Amira Hamadouche; Dr. Karolina Han-kiewicz; Mrs. Laure Lhermet; Ms. Lisa Minier;Mrs. Amandine Rialland; Mr. David Schmitz.Partner 5: University of Glasgow, UK, Prof.Darren G. Monckton (Partner lead); Dr. SarahA. Cumming; Ms. Berit Adam.Partner 6: The University of Dundee, UK,Prof. Peter Donnan (Partner lead); Mr. MichaelHannah; Dr. Fiona Hogarth; Dr. RobertaLittleford; Dr. Emma McKenzie; Dr. PetraRauchhaus; Ms. Erna Wilkie; Mrs. JenniferWilliamson.Partner 7: Catt-Sci LTD, UK, Prof. Mike Catt(Partner lead).Partner 8: concentris research managementgmbh, Germany, Mrs. Juliane Dittrich; Ms.Ameli Schwalber (Partner lead).Partner 9: The University of Aberdeen, UK,Prof. Shaun Treweek (Partner lead).

Page 22: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

22

The author(s) declared no potential conflictsof interest with respect to the research, author-ship, and/or publication of this article.

The author(s) disclosed receipt of the fol-lowing financial support for the research, au-thorship, and/or publication of this article:This work was supported, in part, by theDGHE of Indonesia as well as by the Euro-pean Community’s Seventh Framework Pro-gramme (FP7/2007-2013) under grant agree-ment n◦ 305697.

The collection and sharing of brain imagingdata used in one of the applications to real-world data was funded by the Alzheimer’s Dis-ease Neuroimaging Initiative (ADNI) (NationalInstitutes of Health Grant U01 AG024904)and DOD ADNI (Department of Defenseaward number W81XWH-12-2-0012). ADNI isfunded by the National Institute on Aging,the National Institute of Biomedical Imag-ing and Bioengineering, and through gen-erous contributions from the following: Ab-bVie, Alzheimer’s Association; Alzheimer’sDrug Discovery Foundation; Araclon Biotech;BioClinica, Inc.; Biogen; Bristol-Myers SquibbCompany; CereSpir, Inc.; Cogstate; Eisai Inc.;Elan Pharmaceuticals, Inc.; Eli Lilly and Com-pany; EuroImmun; F. Hoffmann-La Roche Ltdand its affiliated company Genentech, Inc.;Fujirebio; GE Healthcare; IXICO Ltd.; JanssenAlzheimer Immunotherapy Research & Devel-opment, LLC.; Johnson & Johnson Pharmaceu-tical Research & Development LLC.; Lumosity;Lundbeck; Merck & Co., Inc.; Meso Scale Di-agnostics, LLC.; NeuroRx Research; NeurotrackTechnologies; Novartis Pharmaceuticals Corpo-ration; Pfizer Inc.; Piramal Imaging; Servier;Takeda Pharmaceutical Company; and Transi-tion Therapeutics. The Canadian Institutes ofHealth Research is providing funds to sup-port ADNI clinical sites in Canada. Privatesector contributions are facilitated by the Foun-dation for the National Institutes of Health(www.fnih.org). The grantee organization is theNorthern California Institute for Research andEducation, and the study is coordinated bythe Alzheimer’s Disease Cooperative Study atthe University of California, San Diego. ADNIdata are disseminated by the Laboratory forNeuro Imaging at the University of Southern

California.

REFERENCES

[1] R. M. Daniel, M. G. Kenward, S. N. Cousens, and B. L.De Stavola, “Using causal diagrams to guide analysisin missing data problems,” Statistical methods in medicalresearch, vol. 21, no. 3, pp. 243–256, 2012.

[2] K. D. Hoover, “Causality in economics and econometrics,”The new Palgrave dictionary of economics, vol. 2, 2008.

[3] S. Abu-Bader and A. S. Abu-Qarn, “Government expendi-tures, military spending and economic growth: causalityevidence from egypt, israel, and syria,” Journal of PolicyModeling, vol. 25, no. 6, pp. 567–583, 2003.

[4] M. Taguri, J. Featherstone, and J. Cheng, “Causal me-diation analysis with multiple causally non-orderedmediators,” Statistical methods in medical research, p.0962280215615899, 2015.

[5] J. Pearl, “Causal inference from indirect experiments,”Artificial intelligence in medicine, vol. 7, no. 6, pp. 561–582,1995.

[6] J. Detilleux, J.-Y. Reginster, A. Chines, and O. Bruyere, “Abayesian path analysis to estimate causal effects of baze-doxifene acetate on incidence of vertebral fractures, eitherdirectly or through non-linear changes in bone massdensity,” Statistical methods in medical research, vol. 25,no. 1, pp. 400–412, 2016.

[7] P. Spirtes, “Introduction to causal inference,” The Journalof Machine Learning Research, vol. 11, pp. 1643–1662, 2010.

[8] S. la Bastide-van Gemert, R. P. Stolk, E. R. van denHeuvel, and V. Fidler, “Causal inference algorithms canbe useful in life course epidemiology,” Journal of clinicalepidemiology, vol. 67, no. 2, pp. 190–198, 2014.

[9] E. Sokolova, P. Groot, T. Claassen, and T. Heskes, “Causaldiscovery from databases with discrete and continuousvariables,” in Probabilistic Graphical Models. Springer,2014, pp. 442–457.

[10] G. F. Cooper, I. Bahar, M. J. Becich, P. V. Benos, J. Berg, J. U.Espino, C. Glymour, R. C. Jacobson, M. Kienholz, A. V.Lee et al., “The center for causal discovery of biomedicalknowledge from big data,” Journal of the American MedicalInformatics Association, p. ocv059, 2015.

[11] E. W. Frees, Longitudinal and panel data: analysis and appli-cations in the social sciences. Cambridge University Press,2004.

[12] G. M. Fitzmaurice, N. M. Laird, and J. H. Ware, Appliedlongitudinal analysis. John Wiley & Sons, 2012, vol. 998.

[13] D. Hedeker and R. D. Gibbons, Longitudinal data analysis.John Wiley & Sons, 2006, vol. 451.

[14] N. R. Swanson and C. W. Granger, “Impulse responsefunctions based on a causal approach to residual or-thogonalization in vector autoregressions,” Journal of theAmerican Statistical Association, vol. 92, no. 437, pp. 357–367, 1997.

[15] D. A. Bessler and S. Lee, “Money and prices: Us data1869–1914 (a study with directed graphs),” Empirical Eco-nomics, vol. 27, no. 3, pp. 427–446, 2002.

[16] S. Demiralp and K. D. Hoover, “Searching for the causalstructure of a vector autoregression,” Oxford Bulletin ofEconomics and statistics, vol. 65, no. s1, pp. 745–767, 2003.

[17] A. Moneta, “Graphical causal models and vars: an em-pirical assessment of the real business cycles hypothesis,”Empirical Economics, vol. 35, no. 2, pp. 275–300, 2008.

Page 23: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

23

[18] J. Kim, W. Zhu, L. Chang, P. M. Bentler, and T. Ernst, “Uni-fied structural equation modeling approach for the anal-ysis of multisubject, multivariate functional mri data,”Human Brain Mapping, vol. 28, no. 2, pp. 85–93, 2007.

[19] A. Moneta, N. Chlaß, D. Entner, and P. O. Hoyer, “Causalsearch in structural vector autoregressive models.” inNIPS Mini-Symposium on Causality in Time Series, 2011, pp.95–114.

[20] J. Peters, D. Janzing, and B. Scholkopf, “Causal inferenceon time series using restricted structural equation mod-els,” in Advances in Neural Information Processing Systems,2013, pp. 154–162.

[21] T. Chu and C. Glymour, “Search for additive nonlineartime series causal models,” Journal of Machine LearningResearch, vol. 9, no. May, pp. 967–991, 2008.

[22] A. Hyvarinen, S. Shimizu, and P. O. Hoyer, “Causalmodelling combining instantaneous and lagged effects: anidentifiable model based on non-gaussianity,” in Proceed-ings of the 25th international conference on Machine learning.ACM, 2008, pp. 424–431.

[23] S. Shimizu, P. O. Hoyer, A. Hyvarinen, and A. Kerminen,“A linear non-gaussian acyclic model for causal discov-ery,” Journal of Machine Learning Research, vol. 7, no. Oct,pp. 2003–2030, 2006.

[24] P. Spirtes, C. N. Glymour, and R. Scheines, Causation,prediction, and search. MIT press, 2000, vol. 81.

[25] R. Rahmadi, P. Groot, M. Heins, H. Knoop,and T. Heskes, “Causality on cross-sectional data:Stable specification search in constrained structuralequation modeling,” Applied Soft Computing, 2016.doi: http://dx.doi.org/10.1016/j.asoc.2016.10.003. [On-line]. Available: http://www.sciencedirect.com/science/article/pii/S1568494616305130

[26] N. Meinshausen and P. Buhlmann, “Stability selection,”Journal of the Royal Statistical Society: Series B (StatisticalMethodology), vol. 72, no. 4, pp. 417–473, 2010.

[27] N. Friedman, K. Murphy, and S. Russell, “Learning thestructure of dynamic probabilistic networks,” in Proceed-ings of the Fourteenth conference on Uncertainty in artificialintelligence. Morgan Kaufmann Publishers Inc., 1998, pp.139–147.

[28] M. H. Maathuis, M. Kalisch, P. Buhlmann et al., “Estimat-ing high-dimensional intervention effects from observa-tional data,” The Annals of Statistics, vol. 37, no. 6A, pp.3133–3164, 2009.

[29] D. J. Stekhoven, I. Moraes, G. Sveinbjornsson, L. Hennig,M. H. Maathuis, and P. Buhlmann, “Causal stability rank-ing,” Bioinformatics, vol. 28, no. 21, pp. 2819–2823, 2012.

[30] K. M. Gates, P. C. Molenaar, F. G. Hillary, andS. Slobounov, “Extended unified sem approach for mod-eling event-related fmri data,” NeuroImage, vol. 54, no. 2,pp. 1151–1158, 2011.

[31] J. D. Ramsey, S. J. Hanson, C. Hanson, Y. O. Halchenko,R. A. Poldrack, and C. Glymour, “Six problems for causalinference from fmri,” neuroimage, vol. 49, no. 2, pp. 1545–1558, 2010.

[32] D. Colombo and M. H. Maathuis, “Order-independentconstraint-based causal structure learning.” The Journal ofMachine Learning Research, vol. 15, no. 1, pp. 3741–3782,2014.

[33] J. Ramsey, J. Zhang, and P. L. Spirtes, “Adjacency-faithfulness and conservative causal inference,” arXivpreprint arXiv:1206.6843, 2012.

[34] J. Ramsey, “Improving accuracy and scalability of the

pc algorithm by maximizing p-value,” arXiv preprintarXiv:1610.00378, 2016.

[35] D. M. Chickering, “Learning equivalence classes ofBayesian-network structures,” The Journal of MachineLearning Research, vol. 2, pp. 445–498, 2002.

[36] J. Ramsey, M. Glymour, R. Sanchez-Romero, andC. Glymour, “A million variables and more: the fastgreedy equivalence search algorithm for learning high-dimensional graphical causal models, with an applicationto functional magnetic resonance images,” InternationalJournal of Data Science and Analytics, pp. 1–9, 2017.

[37] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “Afast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6,no. 2, pp. 182–197, 2002.

[38] J. Pearl, Causality: models, reasoning and inference. Cam-bridge Univ Press, 2000.

[39] C. Meek, “Causal inference and causal explanation withbackground knowledge,” in Proceedings of the Eleventhconference on Uncertainty in artificial intelligence. MorganKaufmann Publishers Inc., 1995, pp. 403–410.

[40] J. Pearl, “Statistics and causal inference: A review,” Test,vol. 12, no. 2, pp. 281–345, 2003.

[41] F. Drasgow, “Polychoric and polyserial correlations,” En-cyclopedia of statistical sciences, 1988.

[42] J. J. Grefenstette, “Optimization of control parameters forgenetic algorithms,” Systems, Man and Cybernetics, IEEETransactions on, vol. 16, no. 1, pp. 122–128, 1986.

[43] B. L. Miller and D. E. Goldberg, “Genetic algorithms,tournament selection, and the effects of noise,” Complexsystems, vol. 9, no. 3, pp. 193–212, 1995.

[44] R. Kline, Principles and Practice of Structural Equation Mod-eling, ser. Methodology in the social sciences. GuilfordPress, 2011. ISBN 9781606238769

[45] M. Kalisch, M. Machler, D. Colombo, M. H. Maathuis,and P. Buhlmann, “Causal inference using graphicalmodels with the R package pcalg,” Journal of StatisticalSoftware, vol. 47, no. 11, pp. 1–26, 2012. [Online].Available: http://www.jstatsoft.org/v47/i11/

[46] C. Wongchokprasitti, rcausal: R-Causal Library, 2016, rpackage version 0.99.8.

[47] T. Fawcett, “ROC graphs: Notes and practical considera-tions for researchers,” Machine learning, vol. 31, pp. 1–38,2004.

[48] E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson,“Comparing the areas under two or more correlatedreceiver operating characteristic curves: a nonparametricapproach,” Biometrics, pp. 837–845, 1988.

[49] X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C.Sanchez, and M. Muller, “pROC: an open-source packagefor R and S+ to analyze and compare ROC curves,” BMCbioinformatics, vol. 12, no. 1, p. 77, 2011.

[50] E. Venkatraman and C. B. Begg, “A distribution-freeprocedure for comparing receiver operating characteristiccurves from a paired experiment,” Biometrika, vol. 83,no. 4, pp. 835–848, 1996.

[51] R. A. Fisher, Statistical methods for research workers. Gen-esis Publishing Pvt Ltd, 1925.

[52] R. A. F. Frederick Mosteller, “Questions and answers,”The American Statistician, vol. 2, no. 5, pp. 30–31, 1948.[Online]. Available: http://www.jstor.org/stable/2681650

[53] M. J. Heins, H. Knoop, W. J. Burk, and G. Bleijenberg,“The process of cognitive behaviour therapy for chronicfatigue syndrome: Which changes in perpetuating cogni-tions and behaviour are related to a reduction in fatigue?”

Page 24: Causality on Longitudinal Data: Stable Specification Search ...fewer subjects in longitudinal studies are re-quired [13]. To date, a number of causal modeling methods have been developed

24

Journal of psychosomatic research, vol. 75, no. 3, pp. 235–241,2013.

[54] IBM SPSS Statistics for Windows, Version 24., IBM Corp.,Armonk, NY, 2016.

[55] J. Vercoulen, C. Swanink, J. Galama, J. Fennis, P. Jongen,O. Hommes, J. Van der Meer, and G. Bleijenberg, “Thepersistence of fatigue in chronic fatigue syndrome andmultiple sclerosis: development of a model,” Journal ofpsychosomatic research, vol. 45, no. 6, pp. 507–517, 1998.

[56] J. F. Wiborg, H. Knoop, L. E. Frank, and G. Bleijenberg,“Towards an evidence-based treatment model for cogni-tive behavioral interventions focusing on chronic fatiguesyndrome,” Journal of psychosomatic research, vol. 72, no. 5,pp. 399–404, 2012.

[57] M. W. Weiner, P. S. Aisen, C. R. Jack, W. J. Jagust,J. Q. Trojanowski, L. Shaw, A. J. Saykin, J. C. Morris,N. Cairns, L. A. Beckett et al., “The Alzheimer’s dis-ease neuroimaging initiative: progress report and futureplans,” Alzheimer’s & Dementia, vol. 6, no. 3, pp. 202–211,2010.

[58] R. C. Petersen, G. E. Smith, S. C. Waring, R. J. Ivnik, E. G.Tangalos, and E. Kokmen, “Mild cognitive impairment:clinical characterization and outcome,” Archives of neurol-ogy, vol. 56, no. 3, pp. 303–308, 1999.

[59] T. J. Haight, W. J. Jagust, and Alzheimer’s Disease Neu-roimaging Initiative, “Relative contributions of biomark-ers in Alzheimer’s disease,” Annals of epidemiology, vol. 22,no. 12, pp. 868–875, 2012.

[60] M. J. Peeters, A. D. van Zuilen, J. A. van den Brand,M. L. Bots, M. van Buren, M. A. ten Dam, K. A. Kaasjager,G. Ligtenberg, Y. W. Sijpkens, H. E. Sluiter et al., “Nursepractitioner care improves renal outcome in patients withCKD,” Journal of the American Society of Nephrology, vol. 25,no. 2, pp. 390–398, 2014.

[61] G. M. Chertow, G. A. Block, R. Correa-Rotter, T. B. Drueke,J. Floege, W. G. Goodman, C. A. Herzog, Y. Kubo, G. M.London, K. W. Mahaffey et al., “Effect of cinacalcet oncardiovascular disease in patients undergoing dialysis.”The New England journal of medicine, vol. 367, no. 26, pp.2482–2494, 2012.

[62] R. J. Johnson, J. Feehally, and J. Floege, Comprehensiveclinical nephrology. Elsevier Health Sciences, 2014.

[63] A. Levin, P. Stevens, R. Bilous, J. Coresh, A. De Fran-cisco, P. De Jong, K. Griffith, B. Hemmelgarn, K. Iseki,E. Lamb et al., “Kidney Disease: Improving Global Out-comes (KDIGO) CKD work group: Kdigo 2012 clinicalpractice guideline for the evaluation and management ofchronic kidney disease,” Kidney Int Suppl, vol. 3, no. 1, p.e150, 2013.

[64] W. Henneman, J. Sluimer, J. Barnes, W. Van Der Flier,I. Sluimer, N. Fox, P. Scheltens, H. Vrenken, andF. Barkhof, “Hippocampal atrophy rates in alzheimerdisease added value over whole brain volume measures,”Neurology, vol. 72, no. 11, pp. 999–1007, 2009.

[65] D. Mungas, B. Reed, W. Jagust, C. DeCarli, W. Mack,J. Kramer, M. Weiner, N. Schuff, and H. Chui, “VolumetricMRI predicts rate of cognitive decline related to AD andcerebrovascular disease,” Neurology, vol. 59, no. 6, pp.867–873, 2002.

[66] H. Rusinek, S. De Santi, D. Frid, W.-H. Tsui, C. Y. Tarshish,A. Convit, and M. J. de Leon, “Regional brain atrophyrate predicts future cognitive decline: 6-year longitudinalmr imaging study of normal aging 1,” Radiology, vol. 229,no. 3, pp. 691–696, 2003.

[67] S. W. Raudenbush and A. S. Bryk, Hierarchical linear

models: Applications and data analysis methods. Sage, 2002,vol. 1.

[68] I. G. Kreft and J. de Leeuw, Introducing multilevel modeling.Sage, 1998.