A Theory of Causal Learning in Children: Causal Maps and ...
Multivariate dynamical systems models for estimating causal...
Transcript of Multivariate dynamical systems models for estimating causal...
1Q1
2
34567
8
910111213141516171819202122232425
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
NeuroImage xxx (2010) xxx–xxx
YNIMG-07670; No. of pages: 17; 4C:
Contents lists available at ScienceDirect
NeuroImage
j ourna l homepage: www.e lsev ie r.com/ locate /yn img
Multivariate dynamical systems models for estimating causal interactions in fMRI
Srikanth Ryali a,⁎, Kaustubh Supekar b,c, Tianwen Chen a, Vinod Menon a,d,e,⁎a Department of Psychiatry & Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USAb Graduate Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, CA 94305, USAc Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA 94305, USAd Program in Neuroscience, Stanford University School of Medicine, Stanford, CA 94305, USAe Department of Neurology & Neurological Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
⁎ Corresponding authors. Department of PsychiatryWelch Rd, Room 201, Stanford University School of MediUSA. Fax: +1 650 736 7200.
E-mail addresses: [email protected] (S. Ryali), me
1053-8119/$ – see front matter © 2010 Published by Edoi:10.1016/j.neuroimage.2010.09.052
Please cite this article as: Ryali, S., et al., M(2010), doi:10.1016/j.neuroimage.2010.09.
a b s t r a c t
a r t i c l e i n f o26
27
28
29
30
31
32
33
34
35
36
37
38
Article history:Received 5 June 2010Revised 15 September 2010Accepted 21 September 2010Available online xxxx
Keywords:CausalityDynamical systemsVariational BayesBilinearExpectation maximizationKalman smootherDeconvolution
39
40
41
42
43
44
45
Analysis of dynamical interactions between distributed brain areas is of fundamental importance forunderstanding cognitive information processing. However, estimating dynamic causal interactions betweenbrain regions using functional magnetic resonance imaging (fMRI) poses several unique challenges. For one,fMRI measures Blood Oxygenation Level Dependent (BOLD) signals, rather than the underlying latentneuronal activity. Second, regional variations in the hemodynamic response function (HRF) can significantlyinfluence estimation of casual interactions between them. Third, causal interactions between brain regionscan change with experimental context over time. To overcome these problems, we developed a novel state-space Multivariate Dynamical Systems (MDS) model to estimate intrinsic and experimentally-inducedmodulatory causal interactions between multiple brain regions. A probabilistic graphical framework is thenused to estimate the parameters of MDS as applied to fMRI data. We show that MDS accurately takes intoaccount regional variations in the HRF and estimates dynamic causal interactions at the level of latent signals.We develop and compare two estimation procedures using maximum likelihood estimation (MLE) andvariational Bayesian (VB) approaches for inferring model parameters. Using extensive computer simulations,we demonstrate that, compared to Granger causal analysis (GCA), MDS exhibits superior performance for awide range of signal to noise ratios (SNRs), sample length and network size. Our simulations also suggest thatGCA fails to uncover causal interactions when there is a conflict between the direction of intrinsic andmodulatory influences. Furthermore, we show that MDS estimation using VB methods is more robust andperforms significantly better at low SNRs and shorter time series than MDS with MLE. Our study suggests thatVB estimation of MDS provides a robust method for estimating and interpreting causal network interactionsin fMRI data.
46
& Behavioral Sciences, 780cine, Stanford, CA 94305-5778,
[email protected] (V. Menon).
lsevier Inc.
ultivariate dynamical systems models for e052
© 2010 Published by Elsevier Inc.
4748
64
65
66
67
68
69
70
71
72
73
74
75
76
77
Introduction
Functional magnetic resonance imaging (fMRI) has emerged as apowerful tool for investigating human brain function and dysfunction.fMRI studies of brain function have primarily focused on identifyingbrain regions that are activated during performance of perceptual orcognitive tasks. There is growing consensus, however, that localiza-tion of activations provides a limited view of how the brain processesinformation and that it is important to understand functionalinteractions between brain regions that form part of a neurocognitivenetwork involved in information processing (Bressler and Menon,2010; Friston, 2009c; Fuster, 2006). Furthermore, evidence is nowaccumulating that the key to understanding the functions of anyspecific brain region lies in disentangling how its connectivity differs
78
79
80
81
82
from the pattern of connections of other functionally related brainareas (Passingham et al., 2002). A critical aspect of this effort is tobetter understand how causal interactions between specific brainareas and networks change dynamically with cognitive demands(Abler et al., 2006; Deshpande et al., 2008; Friston, 2009b; Goebel etal., 2003; Mechelli et al., 2003; Roebroeck et al., 2005; Sridharan et al.,2008). These and other related studies in the literature highlight theimportance of dynamic causal interactions for understanding brainfunction at the systems level.
In recent years, several methods have been developed to estimatecausal interactions in fMRI data (Deshpande et al., 2008; Friston et al.,2003; Goebel et al., 2003; Guo et al., 2008; Rajapakse and Zhou, 2007;Ramsey et al., 2009; Roebroeck et al., 2005; Seth, 2005; Smith et al.,2009; Valdes-Sosa et al., 2005). Of these, Granger causal analysis (GCA)(Roebroeck et al., 2005; Seth, 2005) and dynamic causal modeling(DCM) (Friston et al., 2003) are among the more commonly usedapproaches thus far. There is a growingdebate about the relativemeritsand demerits of these approaches for estimating causal interactionsusing fMRI data (Friston, 2009a,b; Roebroeck et al., 2009). The main
stimating causal interactions in fMRI, NeuroImage
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122Q2123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165166
167168
169170171
172
173
Fig. 1. Probabilistic graphical model for multivariate dynamical system (MDS). Allconditional interdependencies in MDS can be inferred from this model. The statevariables s(t) are modeled as a linear dynamical system. The non-diagonal elements ofmatrices A and C represent the intrinsic and modulatory connection strengthsrespectively. The diagonal elements of D represent the weight of external stimulus ati-th node.Q(m,m) is the state noise variance atm-th node. Each element of A, C andDhasprecision of α. Each element of α follows Gamma distribution with parameters co and doThe prior for 1/Q(m,m) follow Gamma distribution with parameters ao and bo. y(t)represents the observed BOLD signal, the elements of B represent weightscorresponding to the basis functions for HRFs and R(m,m) is the observation noisevariance atm-th node. Each element of B has precision ofα. Each element of α follow theGamma distribution with parameters co and do The prior for 1/R(m,m) follows theGamma distribution with parameters ao and bo. The random variables are indicated asopen circles and deterministic quantities as rectangles.
2 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
limitations of GCA highlighted by this debate are that: (1) GCAestimates causal interactions in the observed Blood-Oxygenation-Level-Dependent (BOLD) signals, rather than in the underlyingneuronal responses; (2) GCA may not be able to accurately recovercausal interactions because of regional variations in hemodynamicresponse; and (3) GCA does not take into account the experimentallyinduced modulatory effects while estimating causal interactions(Friston, 2009a,b). The main limitations of DCM highlighted in thisdebate are that: (1) DCM is a confirmatory method wherein severalcausal models are tested and the model with the highest evidence ischosen. This is problematic if the number of regions under investigationis large since a large number of models need to be tested increasesexponentially with the increase in the number of regions; (2)conventional DCMuses a deterministic model to describe the dynamicsof the latent neuronal signals whichmay not be adequate to capture thedynamics of the underlying neuronal processes (a stochastic versionwas recently proposed (Daunizeau et al., 2009)); and (3) theassumptions used byDCM for deconvolution of hemodynamic responsehave not yet been adequately verified (Roebroeck et al., 2009). Here, wedevelop a newmethod that incorporates the relativemerits of both GCAand DCMwhile attempting to overcome their limitations.
We propose a novel multivariate dynamical systems (MDS)approach (Bishop, 2006) for modeling causal interactions in fMRIdata. MDS is based on a state-space approach which can be used toovercome many of the aforementioned problems associated withestimating causal interactions in fMRI data. State-space models havebeen successfully used in engineering applications of control systemsand machine learning (Bishop, 2006) but their use in neurosciencehas been limited. Notable examples of state-space models includeHidden Markov models (HMM) which are widely used in speechrecognition applications (Rabiner, 1989) and Kalman filters for objecttracking (Koller and Friedman, 2009). Critically, state-space modelscan be represented as probabilistic graphical models (Koller andFriedman, 2009), which as we show below (Fig. 1), greatly facilitaterepresentation and inference for causal modeling of fMRI data.
Critically, MDS estimates causal interactions in the underlyinglatent signals, rather than the observed BOLD-fMRI signals. In order toestimate causal interactions from the observed fMRI data, it isimportant to take into account variations in hemodynamic responsefunction (HRF) across different brain regions (David et al., 2008). MDSis a state-spacemodel in which a “state equation” is used tomodel theunobserved states of the system and an “observation equation” is usedtomodel the observed data as a function of latent state signals (Fig. 1).The state equation is a vector autoregressive model incorporatingboth intrinsic and modulatory causal interactions. Intrinsic interac-tions reflect causal influences independent of external stimuli andtask conditions, while modulatory interactions reflect contextdependent influences. The observation models produce BOLD-fMRIsignals as a linear convolution of latent signals and basis functionsspanning the space of variations in HRF.
The latent signals and unknown parameters that characterize causalinteractions between brain regions are estimated using two differentapproaches. In the first approach, we use expectation maximization(EM) to obtainmaximum likelihood estimates (MLE) of the parametersand test the statistical significance of the estimated causal relationshipsbetweenbrain regions using a nonparametric approach.We refer to thisapproach as MDS-MLE. In the second approach, we use a VariationalBayes (VB) approach to compute the posterior distribution of latentvariables and parameters which cannot be computed analytically usinga fully Bayesian approach. We refer to this approach as MDS-VB. Byrepresenting MDS as probabilistic graphical network (Fig. 1), we showthat MDS-VB provides an elegant analytical solution for computing theposterior distributionsandderiving causal connectivity estimateswhichare sparse and more readily interpretable.
We first describe our MDS model and discuss MLE and VBapproaches for estimating intrinsic and modulatory causal interac-
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
tions between multiple brain regions. We test the performance ofMDS using computer-simulated data sets as a function of networksize, fMRI time points and signal to noise ratio (SNR). We evaluateperformance of ourMDSmodels with extensive computer simulationsand examine several metrics, including sensitivity, false positive rateand accuracy, in terms of correctly identifying both intrinsic andmodulatory causal interactions. Finally, we contrast our results withthose obtained with GCA.
Methods
Notation: In the following sections, we representmatrices by uppercase letters and scalars and vectors by lower-case letters. Randommatrices are represented by bold face letters whereas random vectorsand scalars are represented by bold face lower-case letters.
MDS Model
Consider the following state-space model to represent the multi-variate fMRI time series
s tð Þ = As t−1ð Þ + ∑Jj=1vj tð ÞCjs t−1ð Þ + Du tð Þ + w tð Þ ð1Þ
xm tð Þ = sm tð Þ sm t−1ð Þ…:sm t−L + 1ð Þ½ �′ ð2Þ
ym tð Þ = bmΦxm tð Þ + em tð Þ ð3Þ
In Eq. (1), s(t) is a M×1 vector of latent signals at time t of Mregions, A is an M×M connection matrix wherein A(m,n) denotes thestrength of intrinsic causal connection (which is independent of
ms models for estimating causal interactions in fMRI, NeuroImage
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276277278279280281282283284
285286287
288289
290291292
293294295296
3S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
external stimuli or task condition) from n-th region tom-th region. Cj isan M×M connection matrix ensued by modulatory input vj(t), J is thenumber of modulatory inputs. The non-diagonal elements of Cjrepresent the coupling of brain regions in the presence of modulatoryinputvj(t). Therefore, latent signals s(t) inM regions at time t is a bilinearfunction of modulatory inputs vj(t)and its previous state s(t-1). D is anM×M diagonal matrix wherein D(i, i) denotes external stimuli strengthto i-th region. u(t) is an M×1 binary vector whose elements representthe external stimuli tom-th region under investigation.w(t) is anM×1state noise vector whose distribution is assumed to be Gaussiandistributed with covariance matrix Q(w(t)∼N(0,Q)). Additionally,state noise vector at time instances 1,2,….,T (w(1),w(2)…w(T)) areassumed to be identical and independently distributed (iid). Eq. (1)represents the time evolution of latent signals inM brain regions. Morespecifically, the latent signals at time t, s(t), is expressed as a linearcombination of latent signals at time t-1, external stimulus at time t(u(t)), bilinear combination of modulatory inputs vj(t), j=1,2..J and itsprevious state, and state noise w(t). The latent dynamics modeled inEq. (1) gives rise to observed fMRI time series represented by Eqs. (2)and (3).
We model the fMRI time series in regionm as a linear convolutionof HRF and latent signal sm(t) in that region. To represent this linearconvolution model as an inner product of two vectors, the past Lvalues of sm(t) are stored as a vector. xm(t) in Eq. (2) represents anL×1 vector with L past values of latent signal at m-th region.
In Eq. (3), ym(t) is the observed BOLD signal at t ofm-th region.Φ isa p×Lmatrix whose rows contain bases for HRF. bm is a 1×pcoefficient vector representing the weights for each basis function inexplaining the observed BOLD signal ym(t). Therefore, the HRF inm-thregion is represented by the product bmΦ.The BOLD response in thisregion is obtained by convolving HRF (bmΦ) with the L past values ofthe region's latent signal (xm(t)) and is represented mathematicallyby the vector inner product bmΦ xm(t). Uncorrelated observationnoise em(t) with zeromean and variance σm
2 is then added to generatethe observed signal ym(t). em(t) is also assumed to be uncorrelatedwith w(τ), at all t and τ. Eq. (3) represents the linear convolutionbetween the embedded latent signal xm(t) and the basis vectors forHRF. Here, we use the canonical HRF and its time derivative as bases,as is common in most fMRI studies (Penny et al., 2005; Smith et al.,2009).
Eqs. (1)–(3) together represent a state-space model for estimatingthe causal interactions in latent signals based on observed multivar-iate fMRI time series. This model can be seen both as a multivariateextension of univariate time series models (Makni et al., 2008; Pennyet al., 2005), and also as an extension of GCA wherein vectorautoregressive model for latent, rather than BOLD-fMRI, signals areused to model the causal interactions among brain regions. Further-more, our MDSmodel also takes into account variations in HRF as wellas the influences of modulatory and external stimuli in estimatingcausal interactions between the brain regions.
Estimating causal interactions between M regions specified in themodel is equivalent to estimating the unknown parameters A and Cj,j=1,2..J. In order to estimate A and Cjs, the other unknown parametersD, Q , {bm}m=1
M and {σm2}m=1
M and the latent signal {s(t)} t=1T based on
the observations {ym(t)}m=1M , t=1,2..T, where T is the total number of
time samples, needs to be estimated. We use the followingMLE and VBmethods for estimating the parameters of the MDS model.
Maximum Likelihood Estimation (MLE)
EstimationMaximum likelihood estimates of MDS model parameters and
latent signals are obtained by maximizing the log-likelihood of theobserved fMRI data. We use EM algorithm to estimate the unknownparameters and latent variables of the model. EM algorithm is aniterativemethod consisting of two steps viz., E-step andM-step. In the
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
E-step, the posterior distribution of latent variables is computed giventhe current estimates of parameters. In the M-step, given the currentposterior distribution of latent variables, the parameters of the modelare estimated by maximizing the conditional expectation of log ofcomplete likelihood given the data. The E and M steps are repeateduntil convergence. The log-likelihood of the data is guaranteed toincrease or remain the same for every iteration of E and M steps. Also,the application of EM algorithm asymptotically gives maximumlikelihood estimates of the parameters. In the E step, the posteriordistributions are obtained by using Kalman filtering and smoothingalgorithms (Bishop, 2006). The detailed equations for E and M stepsare given in Appendix A. We refer this solution to MDS using MLE asMDS-MLE.
InferenceThe statistical significance of intrinsic (A(m,n)) and modulatory
(Cj(m,n), j=1,2.., J) causal connections estimated using the EMapproach was tested using a Bootstrap method. In this approach,the distribution of connection strengths, under the null hypothesisthat there are no connections between the regions, was generated byestimating A and C from 100 surrogate data sets constructed from theobserved data. A surrogate data set was obtained by applying a Fouriertransform to observed signal at them-th region and then randomizingits phase response by adding a random phase shift at every frequency.The phase shifts were obtained by randomly sampling in the interval[0, 2π]. Inverse Fourier transform was then applied to generate oneinstance of surrogate data (Prichard and Theiler, 1994). Randomiza-tion of the phase response destroys the causal interactions betweenthe brain regions while preserving their power spectra. EM algorithmwas then run on this surrogate data to obtain A and C under the nullhypothesis. This procedure was repeated on 100 surrogate data setsand the empirical distributions under null hypothesis were obtainedfor elements of A and C. The statistical significance of each connectionwas then estimated using these distributions at p value of 0.01. withBonferroni correction to account for multiple comparisons.
Variational Bayes (VB)
Estimation of posterior distributionsIn this approach, we use a VB framework to obtain the posterior
distributions of the unknown parameters and latent variables. LetΘ = A;C1; ::CJ;D;Q ;R;B
� �represent the unknown parameters and
S = s tð Þ; t = 1;2;…Tf g be the latent variables of the model. Given theobservations Y={y(t), t=1,2,…T} and the probabilistic model, theBayesian approach aims to find the joint posterior p S;ΘjYð Þ. However,obtaining this posterior distribution using a fully Bayesian approach isanalytically not possible for most models including MDS. In the VBapproach, we make analytical approximation to p S;ΘjYð Þ. Let q S;ΘjYð Þbe any arbitrary probability distribution then the log of the marginaldistribution of observations Y can be written as (Bishop, 2006)
log P Yð Þ = L qð Þ + KL q j jpð Þ ð4Þ
where
L qð Þ = ∫dSdΘq S;ΘjYð Þ log p Y; S;Θð Þq S;Θ jYð Þ ð5Þ
KLðq jjpÞ = −∫dSdΘq S;ΘjYð Þ log p S;Θ jYð Þq S;Θ jYð Þ ð6Þ
KL(q||p) is the Kullback–Leibler divergence between q S;ΘjYð Þ andp S;ΘjYð Þ. KL(q||p)≥0, with equality, if and only if, q S;ΘjYð Þ =p S;ΘjYð Þ. Therefore, L(q) serves as a lower bound on the log of theevidence (log P(Y)). The maximum of this lower bound occurs whenKL divergence is zero for which the optimal choice of q S;ΘjYð Þ is
ms models for estimating causal interactions in fMRI, NeuroImage
297298299300
301302303
304
305306
307308
309310311
312
313
314315316317318319320321322323324325326327328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351352353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
4 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
p S;ΘjYð Þ: Since p S;ΘjYð Þ is not tractable, certain assumptions on theform of q S;ΘjYð Þ are made and then the optimal distribution is foundbymaximizing the lower bound L(q). In this work, we assume that theposterior distribution q S;ΘjYð Þ factorizes over S and Θ, i.e.,
q S;ΘjYð Þ = qS SjYð ÞqΘ ΘjYð Þ ð7Þ
We note that no further assumptions are made on the functionalform of these distributions qS SjYð Þ and qΘ(Θ|Y). These quantities areobtained by taking functional derivatives of L(q) with respect toqS SjYð Þ and qΘ(Θ|Y). It can be shown that
log qS SjYð Þ∝EΘ logp Y; S;Θð Þð Þ ð8Þ
log qΘ ΘjYð Þ∝ES logp Y; S;Θð Þð Þ ð9Þ
Eqs. (8) and (9) are respectively VB-E and VB-M steps. Expecta-tions are computed with respect to qΘ(Θ|Y) in Eq. (8) and withrespect to qS SjYð Þ in Eq. (9). In the VB-E step, the distribution of latentsignal s(t), for each t, is updated given the current distribution of theparameters Θ. For reasons described below, s(t) has a Gaussiandistribution and in this step updating the distribution amounts toupdating the mean and variance of the Gaussian distribution.Therefore, in the VB-E step, estimating means of s(t) at every t isequivalent to estimating the latent signals. In the VB-M step, thedistributions for model parameters Θ are updated given the updatedistributions for latent signal s(t). These VB-E and VB-M steps arerepeated until convergence. Note that we do not make anyassumptions about the factorization of Θ and S. Any furtherconditional independencies in these sets are derived from theprobabilistic graphical model of MDS shown in Fig. 1. The details ofthe derivation of the posterior probabilities using the graphical modelare given in Appendix-B. Fig. 2 shows a flow chart of various stepsinvolved in both MDS-VB and MDS-MLE methods.
Choice of priors and inferenceThe Bayesian approach allows the specification of both informative
and non-informative priors on the model parameters. The specifica-tion of these priors helps in regularizing the solution and avoids overfitting in the case where number of parameters to be estimated islarge compared to the number of observations. This is generally thecase where the number of brain regions to be modeled is large. Sincewe do not have a priori information on these parameters, we specifynon-informative conjugate priors on these parameters. Here, webriefly explain the notion of non-informative conjugate priors(Bishop, 2006). Let z be a Gaussian random variable with mean μ
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
Fig. 2. Flow chart showing major steps in implementation of MDS. Weinerdeconvolution is used to get an initial estimate of latent signals and a least squareestimation procedure is used to find an initial estimation of model parameters. Theestimates of latent signals and model parameters are refined in E and M steps,respectively. These steps are repeated until convergence. The significance of modelparameters is then assessed in the inference step.
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
and variance σ2. If σ2→∞ or 1/σ2→0 (precision), the distributionbecomes flat and the random variable z can take any value between−∞ and ∞with equal probability. Here, we refer to such distributionsas non-informative. Let x and y be two random variables withprobabilities p(x) and p(y) respectively. p(y) is said to be a conjugateprior for p(x) if the functional form of the posterior p(x|y) is same asthat of p(y). Specifying conjugate priors leads to elegant analyticalsolutions, and it also allows us to specify priors in such a way that weget sparse and interpretable solutions. For example, one can specifythe Gaussian priors, on the elements of connection matrices A and C. Ifthe prior on each element of A (or Cj) is
Ai;j eN 0;1λi;j
!ð10Þ
where, λi, j is the prior precision for Ai, j. Such a specification of priorshelps in automatic relevance determination (ARD) of the connectionsAi, j between the regions (Tipping, 2001). During the learning processof A and λ's, a significant proportion of λi, j 's go towards infinity andthe corresponding connections Ai, j 's have posterior distributionswhose mean values shrink towards its prior mean which is zero. Theelements of the matrix Awhich do not have significant values becomevery small and only the elements which are significant survive.Therefore, adopting this procedure helps in automatically identifyingthe relevant entries of the matrix A and hence the name “Automaticrelevance determination”. This is very important because unlike theMLE approach, inference on connection weights (A and Cj 's) is nowstraightforward. The details of prior specification for various para-meters are given in Appendix-B. We test the significance ofparameters by thresholding the corresponding posterior probabilitiesat a p-value of 0.01 with Bonferroni correction to account for multiplecomparisons.
Simulated data sets
Data sets with modulatory effects and external stimuliWe assess the performance of MDS using a number of computer-
simulated data sets generated at various SNRs (10, 5 and 0 dB), fordifferent number of brain regions or nodes (M=2, 3 and 5) and fordifferent number of time samples (T=200, 300 and 500).
Fig. 3 shows the intrinsic and modulatory connectivity of threenetworks with 2, 3 and 5 nodes. For example, in the two nodesnetwork (Fig. 3A), node 1 receives an external input and there is anintrinsic causal connection from node 1 to node 2 with a weight of−0.3 (A(2,1)=−0.3). A modulatory input induces a connection fromnode 1 to node 2 with weight of 0.5 (C(2,1)=0.5) and whose sign isopposite to that of intrinsic connection. Similarly in the five-nodestructure (Fig. 3C), node 1 receives the external input and has causalinfluences on nodes 2, 3 and 4 (matrix elements A). Nodes 4 and 5have bidirectional influences. Modulatory inputs induce causalinfluences from node 1 to 2 and from node 3 to 2 (matrix C). Notethat all three networks have intrinsic and modulatory connectionsfrom node 1 to 2 with weights −0.3 and 0.5 respectively. Wesimulated networks with these weights to explicitly test whetherMDS could recover these interactions which could be missed by GCAbecause of opposing signs of the weights of intrinsic and modulatoryconnections. We set the autocorrelations of the time series (diagonalelements of matrix A) to 0.7 (Ge et al., 2009; Roebroeck et al., 2005).Our simulations also modeled variations in HRFs across regions. Fig. 4shows the simulated HRFs at each of the network nodes in thestructure. The HRFs were constructed in such a way that the directionof hemodynamic delays is confounded with the direction of latentsignal delay, making the task of recovering network parameters morechallenging. For example in the 5-node network, node 1 drives nodes2, 3 and 4 at the level of latent signals. But the HRF at node 1 peaks
ms models for estimating causal interactions in fMRI, NeuroImage
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422423424
425
426
427
428
429
430
431
432
433
434
435
436
437
Fig. 3. Simulated models with intrinsic and experimentally induced modulatoryconnections for (A) 2-node, (B) 3-node and (C) 5-node networks. Intrinsic connectionsare shown in solid lines and modulatory connections are shown in broken lines andhighlighted with the connecting back circles. A(i,j) and C(i,j) are the weights of intrinsicand modulatory connections, respectively, between nodes i and j. D(i,i) is the strengthof external stimulus to the i-th node.
5S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
later than that in the nodes 2, 3 and 4. These HRFs were simulatedusing a linear combination of canonical HRF and its temporalderivative. Analogous to most fMRI data acquisition paradigms, weassume that the sampling interval, also referred to as repetition time
Fig. 4. Variable regional hemodynamic response function used in the simulatio
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
(TR), is 2 s. This corresponds to an embedding dimension of L=16,which is also the length of the HRF. Note that the duration of thecanonical HRF is approximately about 32 s.
Fig. 5 shows the experimental and modulatory inputs applied tothe nodes shown in Fig. 3. The external input was simulated to reflectan event-related fMRI design with stimuli occurring at randominstances with the constraint that the time difference between twoconsecutive events is at least 2 TR (Fig. 5A). This input can also be aslow event or block design. In the MDS framework, there is norestriction on the nature of the experiment design. The modulatoryinput is assumed to be a boxcar function (Fig. 5B). The modulatoryinputs indicate the time periods wherein the network configurationcould change because of context specific influences such as changes inattention, alertness and explicit experimental manipulations.
The simulated data sets were generated using the model describedin the Eqs. (1)–(3). The latent noise covariance was fixed at Q=0.1IM,where IM is identity matrix of size M. The observed noise variance atm-th region for a given SNR was computed as
σ2m = Var ymð Þ10−0:1SNR ð11Þ
We assume that the canonical HRF and its temporal derivativespan the space of HRFs. Therefore, they constitute the rows ofΦwhichwould be a 2×16 matrix. The coefficients of the matrices A and C foreach network structure are shown in Fig. 3.
We generated 25 data sets for each SNR, network structure andtime samples. The performance of the method was assessed using theperformance metrics described in the next section.
Data sets without modulatory effects and external stimuliTo examine the performance of the methods when there are no
modulatory or external stimulus effects, we simulated 25 data setsfor a 3-node network shown in Fig. 3B at 5 dB SNR. We set theweights to the same values as in the previous case except that thereare no causal interactions from modulatory inputs. The weightscorresponding to external stimuli were all set to zero. The diagonal
ns for each node in (A) 2, (B) 3 and (C) 5 node networks shown in Fig. 3.
ms models for estimating causal interactions in fMRI, NeuroImage
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473474
475476
477478479480481482
483
484
485
486
487
488
489
490
491
492
493
494
495
496497
Fig. 5. Onset and duration of event related experimental stimuli (A) and modulatory inputs (B) used in the simulations.
6 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
elements (autocorrelations) in matrix A were set to 0.8, 0.7 and 0.6,respectively. These data sets were created to provide a moreappropriate, albeit less general, comparison of MDS with GCA.
Effects of fMRI down-samplingInteractions between brain regions occur at finer time scales
compared to the sampling intervals of fMRI signals. fMRI signals aretypically sampled at TR=1 or 2 s while neuronal processes occur atmillisecond resolution. To investigate the effects of fMRI downsampling on MDS performance, we adopted the approach describedby Deshpande et al. (2009).We generated data sets with 1 mssampling interval at 0 dB network shown in Fig. 3A. We obtainedneuronal signals in node 1 and node 2 with a delay of dn millisecondsbetween them. In this case, node 1 drives node 2 under both intrinsicand modulatory conditions with weights shown in Fig. 3A. Theautocorrelations in node1 and 2 were set to 0.8 and 0.7 respectively.We then convolved neuronal signal at node 1 with a canonical HRFgenerated again at 1 KHz sampling rate and then re-sampled tosampling interval of TR=2 s to obtain fMRI signal. In node 2, weconvolved the “neuronal” signal with the HRF which was delayed bydh milliseconds with respect to the HRF in node 1, and again re-sampled to TR=2 s to obtain fMRI signal. We obtained simulated datasets at various neuronal delays dn={0, 200, 400, 600, 800, 1000} andHRF delays dh={0, 500, 2500} milliseconds (Deshpande et al., 2009).We also examined two cases for HRF delays: (1) HRF delay is in thesame direction of neuronal delay and (2) HRF delay is in the oppositedirection of neuronal delay. The second case represents the scenariowhere HRF confounds the causal interactions at neuronal level. Wegenerated 25 simulated data sets for each combination of dn and dh.Supplementary Table S1 summarizes the characteristics of each dataset used for evaluating the performance of MDS.
Performance metrics
The performance of MDS in discovering the intrinsic and modulatorycausal interactions in simulated data sets was assessed using various
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
performancemetrics such as sensitivity, false positive rate and accuracy incorrectly identifying causal intrinsic andmodulatory interactions, where :
sensitivity ¼ TPTPþ FN
ð12Þ
false positive rate ¼ FPTNþ FP
ð13Þ
accuracy =TP + TN
TP + FP + FN + TNð14Þ
where, TP is the number of true positives, TN is the number of truenegatives, FN is the number of false negatives and FP is the number offalse positives. These performance metrics are computed for each ofthe 25 data sets and then are averaged to obtain the overallperformance.
Results
Applying MDS – An example
We first illustrate the performance of MDS-MLE and MDS-VB bycomputing the estimated intrinsic and modulatory connections andthe deconvolved (or estimated) latent signals of a five node networksimulated at 5 dB noise shown in Fig. 3C. The MDS approach, usingeither MLE or VB, simultaneously estimates both latent signals andunknown parameters in the model using E and M steps, respectively.The left and right panels in Fig. 6, respectively, show the actual andestimated latent signals and the actual and estimated BOLD signal atthe five nodes in the network using MDS-MLE and MDS-VB. Theestimated BOLD signal ym at m-th node using these methods wascomputed as follows
ym = bm′Φxm tð Þ ð15Þ
ms models for estimating causal interactions in fMRI, NeuroImage
498
499500501502503504505506507508509510511512513514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
Table 1 t1:1
Mean Q14square error (MSE) between actual and estimated neuronal and actual andestimated BOLD signals using MLE-MBDS and VB-MBDS at five nodes of the network.
t1:2t1:3Nodes Neuronal signals BOLD signals
t1:4MLE VB MLE VB
t1:51 0.024 0.023 0.027 0.027t1:62 0.024 0.024 0.015 0.014t1:73 0.019 0.019 0.025 0.024t1:84 0.017 0.017 0.021 0.02t1:95 0.018 0.017 0.02 0.02
7S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
Where, bm′ are the estimated coefficients (using MLE or VB)corresponding to the basis functions spanning the subspace of HRFsand xm′ (using MLE or VB) is the estimated latent signal at the m-thnode. As shown in this figure, both MDS-MLE and MDS-VB were ableto recover the latent and BOLD signals at this SNR. Table 1 shows themean square error (MSE) between the estimated and latent signalsand estimated and actual BOLD-fMRI responses in each node usingthese twomethods. TheMSE in estimating these signals is very low byboth methods. Fig. 7A and B, respectively, shows the estimatedintrinsic and modulatory causal interactions by MDS-MLE and MDS-VB in the simulated five node network. MDS-VB correctly identifiedboth intrinsic (solid lines) and modulatory connections (dotted lines)in this network as shown in Fig. 7B. MDS-MLE also correctly recoveredboth intrinsic and modulatory networks but it introduced anadditional false modulatory connection from node 3 to node 1 asshown in Fig. 7A.
We next compare the performance of MDS with that of GCA usingthe same simulated data. This analysis was performed using themultivariate GCA toolbox developed by Seth (Seth, 2010). We appliedGCA on the same data set to verify whether it can recover the causalconnections (either intrinsic or modulatory). As shown in Fig. 7C, GCAlikely missed both the intrinsic and modulatory interactions fromnode 1 to 2 but it was able to recover modulatory interactions fromnode 3 to 4 in addition to other connections. However, unlike MDS,GCA cannot distinguish between intrinsic and modulatory interac-tions. GCA missed the connection from node 1 to 2 because these
Fig. 6. Left panel: actual and estimated latent signals at each of the nodes of the 5-node netwMDS using both MLE and VB approaches, accurately recovered latent signals and predicted
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
nodes have both intrinsic and modulatory interactions with opposingactions. Since GCA does not model these interactions separately, thenet connection strength between these nodes is not significant. On theother hand, MDS models these interactions explicitly and therefore isable to recover both types of connections. This example demonstratesthat GCA cannot recover all the connections under these conditionswhile both MDS methods could recover all the connections and at thesame time differentiate between the different types of interactions.
Performance of MDS on simulated data with modulatory effects andexternal stimuli
We evaluated the performance of MDS-MLE and MDS-VB onsimulated data sets by computing sensitivity, false positive rate and
ork shown in Fig. 3C. Right panel: estimated and actual BOLD-fMRI signals at each node.the fMRI signals based on the estimated model parameters and latent signals.
ms models for estimating causal interactions in fMRI, NeuroImage
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
Fig. 7. (A) Intrinsic and modulatory connections estimated by MDS using Maximumlikelihood estimates (MDS-MLE) and (B) estimates Variational Bayes estimates (MDS-VB). (C) Causal interactions estimated by Granger Causal Analysis (GCA). MDS-VBcorrectly identified both intrinsic and modulatory connections. MDS-MLE correctlyestimated all the intrinsic and modulatory connections in the five node network butalso introduced a false modulatory connection from node 3 to node 1. GCA missed bothintrinsic andmodulatory connections from node 1 to 2 for reasons described in the text.
8 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
accuracy in finding intrinsic and modulatory causal interactions as afunction of SNR, network size and the number of time samples.Figs. 8–10 respectively show the performance of MDS-MLE and MDS-VB for time samples T=500, 300, 200. For each T and network size,the performance of MDS-MLE and MDS-VB is evaluated for SNR=0, 5and 10 dB. In each of these figures panels A, B and C show theperformance of MDS-MLE and MDS-VB with respect to sensitivity,false positive rate and accuracy in identifying the intrinsic andmodulatory interactions for 2, 3 and 5 node networks, respectively.The performance of these methods improvedwith the increase in SNRand time samples (T).
Between the two MDS methods, MDS-VB showed superiorperformance compared to MDS-MLE across all the SNRs, time samplesand network size. MDS-VB showed significantly greater performanceat low SNR and for shorter time series. For example, for SNR=0 dB,T=200 time points and number of nodes=5 (Fig. 10C), thesensitivity of MDS-VB in recovering intrinsic and modulatoryinteractions was about 0.75 and 0.6, respectively, while MDS-MLEhas sensitivity of only about 0.3 and 0.5, respectively. The accuracy ofMDS-VB is also high (N0.8) under all conditions (panel (C) in Figs. 8–10) because the sensitivities are high and false positive rates of thismethod are very low (panels A and B in Figs. 8–10). More generally, incases with high noise, lower sample length, and larger network sizeMDS-VB consistently outperforms MDS-MLE.
Comparison of MDS and GCA on simulated data with modulatory effectsand external stimuli
Finally, we compared the performance of GCA with MDS methodson the same data sets and performance metrics. Fig. 8–10, respec-tively, shows the comparative performance of GCA with MDS-VB andMDS-MLE for T=500, 300 and 200 at various SNRs and network sizes.The results suggest that performance of GCA is poor compared to bothMDS methods with respect to sensitivity and accuracy in indentifyingcausal interactions between brain nodes. Since GCA cannot distin-guish between intrinsic and modulatory interactions, we computedthe performance metrics by considering both the connection types.The performance of GCA declined with the decrease in SNR fornetworks of size 3 and 5 (Figs. 8–10). The performance of GCA isworse even for a 2-node network as shown in Figs. 8A, 9A and 10A.
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
The sensitivity of GCA for this network is less than 10% becauseintrinsic and modulatory interactions have weights with oppositesigns. Since GCA does not model these interactions explicitly, it doesnot detect the interactions between the two nodes. On the other hand,both MDS methods showed better sensitivity and accuracy inidentifying both types of interactions at all SNRs for this network.
Comparison of MDS and GCA on simulated data in the absence ofmodulatory effects and external stimuli
Table 2 shows the relative performance of MDS-VB, MDS-MLE andGCA on 25 data sets simulated for a 3-node network at 5 dB SNRwithout any modulatory effects and external stimuli. The perfor-mance of GCA improved and recovered the causal network withsensitivity of 0.9, FPR of 0 and accuracy of 0.98. In this case, theperformance of GCA is comparable to MDS, suggesting that in theabsence of modulatory effects and external stimuli GCA can performas well as MDS even in the presence of HRF variations.
Effects of fMRI down-sampling on MDS performance
We examined the performance of MDS-VB on simulated data inwhich latent signals were generated at various delays using asampling interval of 1 ms and convolved with various delays in theHRF. Causal interactions were then estimated based on the observedtime series obtained at sampling interval of 2 s. We examined theperformance of MDS-VB under four different cases:
No latent signal delay but HRF is delayed by 500 and 2500 msIn this case, there are no causal interactions between the two
nodes with respect to the latent signals but the observed fMRI timeseries are delayedwith respect to each other because of delays in theirrespective HRFs. In this case, MDS-VB was accurate in that it did notrecover any causal interactions (both intrinsic and modulatory)despite variations in HRF.
No HRF delaysIn this case, HRFs in both nodes are identical. As shown in Fig. S1
(A), the sensitivity of MDS-VB in recovering both intrinsic andmodulatory interactions is above 0.9 (left panel) with FPRs below 0.1(middle panel) and therefore with accuracies of above 0.9 (rightpanel) at all the latent signal delays. The performance of MDS-VBimproved with the increase in latent signal delays.
Latent and HRF delays both in the same directionIn this scenario, HRF delays do not confound the causal interac-
tions between the nodes at the latent signal level. For HRF delay of 500and 2500 ms, the performance metrics shown in Figs. S1 (B) and (C)suggest that MDS-VB is able to recover both intrinsic and modulatorycausal interactions reliably. For the HRF delay of 2500 ms, there is asmall drop in sensitivity for latent signal delays of 800 and 1000 ms.
HRF delays oppose the delays in latent signalsThis is the most difficult situation for any method because HRF
delays confound the causal interactions at latent signal level. Fig. S2(B) shows the performance of MDS-VB when HRF in node 2 peaks500 ms before the HRF in node 1 while node1 drives node 2 at latentsignal level. The performance of MDS-VB improved with the increasein latent signal delay. Fig. S2(C) shows the performance of MDS-VB forHRF delay of 2500 ms. Although the sensitivities for latent signaldelays from 200 to 600 ms are higher (left panel) but areaccompanied by greater false positives (middle panel) and thereforehave poor accuracies (right panel). The performance of MDS-VBimproved for latent signal delays of 800 and 1000 ms in recoveringcausal interactions.
ms models for estimating causal interactions in fMRI, NeuroImage
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
Fig. 8. (A) Sensitivity, false positive rate (FPR) and Accuracy in identifying causal intrinsic and modulatory connections in the 2-node network (shown in Fig. 3A) at SNRs of 0, 5 and10 dB using MDS-MLE, MDS-VB and GCA. (B) Similar results for 3-node (shown in Fig. 3B) and (C) 5-node (shown in Fig. 3C) networks. The performance of MDS is superior to GCAfor all 3 networks and SNRs. Among MDS methods, the performance of MDS-VB is superior to MDS-MLE. Sample size T=500 time points.
9S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
Discussion
We have developed a novel dynamical systems method to modelintrinsic and modulatory interactions in fMRI data. MDS uses a vectorautoregressive state-space model incorporating both intrinsic andmodulatory causal interactions. Intrinsic interactions reflect causalinfluences independent of external stimuli and task conditions, whilemodulatory interactions reflect context dependent influences. Ourproposed MDS method overcomes key limitations of commonly usedmethods for estimating the causal relations using fMRI data.
Critically, causal interactions in MDS are modeled at the level oflatent signals, rather than at the level of the observed BOLD-fMRIsignals. Our simulations clearly demonstrate that this has the addedeffect of eliminating the confounding effects of regional variability inHRF. The parameters and latent variables of the state-space modelwere estimated using two different methods. In the MDS-MLEmethod, the statistical significance of the parameters of the stateequation, which represent the causal interactions between multiplebrain nodes, was tested using a Bootstrap method. In the MDS-VBmethod, we used non-informative priors to facilitate automaticrelevance detection. We first discuss findings from our simulations,and show that MDS-VB provides the robust and accurate solutions
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
even at low SNRs and smaller number of observed samples (timepoints). We then contrast the performance of MDS with the widelyused GCA method. In this context, we highlight instances where GCAworks reasonably well and where it fails. Finally, we discuss severalimportant conceptual issues concerning the investigation of dynamiccausal interactions in fMRI, contrasting MDS with other recentlydeveloped methods.
Performance of MDS on simulated data sets—contrasting MLE and VBapproaches
In the following sections, we evaluate and discuss the performanceof MDS under various scenarios. Importantly, we demonstrate, for thefirst time, that VB approaches provide better estimates of modelparameters than MLE based approaches. We investigated theperformance of MDS-MLE and MDS-VB on simulated data setsgenerated at SNRs of 0, 5 and 10 dB for network structures of sizes 2,3 and 5 and time samples of 200, 300 and 500. We simulated regionalHRF variations in such a way that the directions of hemodynamicresponse delays were in opposite direction to the delays in the latentsignals (Fig. 4). HRF delays could therefore influence the estimation ofcausal interactions when applied directly on the observed BOLD-fMRI
ms models for estimating causal interactions in fMRI, NeuroImage
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
Fig. 9. (A) Sensitivity, FPR and Accuracy in identifying causal intrinsic and modulatory connections in the 2-node network (shown in Fig. 3A) at SNRs of 0, 5 and 10 dB using MDS-MLE, MDS-VB and GCA. (B) Similar results for 3-node (shown in Fig. 3B) and (C) 5-node (shown in Fig. 3C) networks. The performance of MDS is superior to GCA for all 3 networksand SNRs. Among MDS methods, the performance of MDS-VB is superior to MDS-MLE. In contrast to Fig. 8, the sample size (T) here is 300 time points.
10 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
signals. This makes the problem of estimating causal interactionsparticularly challenging and provides novel insights into strengths andweaknesses of multiple approaches used here.
The performance of MDS was found to be robust when testedunder various simulated conditions. Specifically, MDS was able toreliably recover both intrinsic and modulatory causal interactionsfrom the simulated data sets and its performance was found to besuperior to the conventional approaches such as GCA. Among MDSmethods, the performance of MDS-VB was found to be superior toMDS-MLE with respect to performance metrics such as sensitivity,false positive rate and accuracy in identifying intrinsic and modula-tory causal interactions (Figs. 7–10). MDS-VB showed significantlyimproved performance over MDS-MLE, especially at adverse condi-tions such as low SNRs, large network size and for less number ofobserved samples (Fig. 10C).
The superior performance of MDS-VB can be attributed to theregularization imposed by priors in this method. Our priors not onlyregularized the solution but also helped in achieving sparse solutions.By using sparsity promoting priors, the weights corresponding toinsignificant links are driven towards zero and therefore enableautomatic relevance detection (Tipping, 2001). This approach is notonly useful for regularizing solutions when the number of unknown
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
parameters is high, but also for providing sparse and interpretablesolutions. This feature of VB can be especially important in analyzingnetworks with large number of nodes, an aspect often overlooked inmost analyses of causality in complex networks.
Another advantage of Bayesian analysis lies in computing thestatistical significance of the network connections estimated by MDSmethods. In the MLE approach, we need to resort to a Bootstrapapproach, which can be computationally expensive. MDS-VB, on theother hand, provides posterior probabilities of each model parameter,as opposed to point estimates in MLE, which can be used to computetheir statistical significance. From a computational perspective, MDS-VB is several orders ofmagnitude faster thanMLE-MLE because it doesnot require nonparametric tests for statistical significance testing.Taken together, these findings suggest that MDS-VB is a superior andmore powerful method than MDS-MLE.
Comparison with GCA
We demonstrated the importance of modeling the influence ofboth external and modulatory stimuli for estimating the causalnetworks by applying GCA on a five-node network. On this data set,GCA failed to detect both the modulatory and intrinsic connections
ms models for estimating causal interactions in fMRI, NeuroImage
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
Fig. 10. (A) Sensitivity, FPR and Accuracy in identifying causal intrinsic and modulatory connections in the 2-node network (shown in Fig. 3A) at SNRs of 0, 5 and 10 dB using MDS-MLE, MDS-VB and GCA. (B) Similar results for 3-node (shown in Fig. 3B) and (C) 5-node (shown in Fig. 3C) networks. The performance of MDS is superior to GCA for all 3 networksand SNRs. Among MDS methods, the performance of MDS-VB is superior to MDS-MLE. In contrast to Fig. 8, the sample size (T) here is 200 time points.
t2:1
t2:2t2:3
t2:4
t2:5
t2:6
11S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
between nodes 1 and 2 (Fig. 7C). As mentioned earlier, GCA missedthis connection because the network has both intrinsic and modula-tory connections between these nodes but with weights of oppositesigns. Therefore, in GCA the net strength of this connection is verysmall and it did not survive a conservative test of statisticalsignificance. Our MDS methods, on the other hand, uncovered boththese connections. This phenomenon is most obvious in oursimulations of the 2-node network wherein GCA could not find causalinteractions between the nodes (Figs. 8A, 9A and 10A). In this networkalso both intrinsic and modulatory connections have weights withopposite signs. These results demonstrate the importance of explicitlymodeling the influence of external and modulatory stimuli. Overall,the performance of GCA, when applied on the data sets generated at
740
741
742
743
744
745
746
747 Q3748
Table 2Relative performance of MDS and GCA in the absence of modulatory effects.
Method Sensitivity FPR Accuracy
MDS-VB 0.98 0.02 0.98MDS-MLE 0.92 0.03 0.96GCA 0.9 0 0.98
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
various SNR, networks and for different number of observations, wasfound to be inferior when compared to both MDS methods. This wastrue with respect to both sensitivity and accuracy in indentifyingcausal interactions between multiple brain nodes (Figs. 8–10). Whencompared to MDS, the performance of GCA drops significantly atlower SNRs. These results suggest that MDS is more robust againstobservation noise than GCA. Therefore, our simulations suggest thatMDS-VB outperforms GCA for networks consisting of less than 6nodes. More extensive simulations however are needed to comparethe performance of MDS with GCA, for larger networks.
Conventional GCA methods do not take into account dynamicchanges in modulatory inputs and their effect on context dependentcausal interactions. In order to compare GCA more directly with MDS,we examined causal interactions in the absence of modulatoryinfluences. As expected, in this case, the performance of GCA wascomparable to that of MDS. Together, these findings suggest that GCAcan accurately recover causal interactions in the absence of modula-tory effects. Although newer dynamic GCA methods have beenproposed, they appear to be designedmore for improving estimationsof causal interactions rather than examining context dependentdynamic changes in causal interactions (Havlicek et al., 2010;Hemmelmann et al., 2009; Hesse et al., 2003; Sato et al., 2006).
ms models for estimating causal interactions in fMRI, NeuroImage
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
12 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
Further simulation studies are needed to assess how well dynamicGCA can estimate context specific modulatory effects.
We next contrast our findings using GCA andMDS in the context ofthe equivalence of ARIMA and structural time series models. GCA isbased on autoregressive integrated moving average (ARIMA) modelsproposed by Box and Jenkins whereas MDS is a structural time seriesmodel (Box et al., 1994). In the econometrics literature, it is wellknown that the linear structural time series models have equivalentARIMA model representations (Box et al., 1994). This equivalence hasbeen under-appreciated in the neuroimaging literature, as demon-strated by the recent discussion regarding the relative merits of GCAand DCM (Friston, 2009a,b; Roebroeck et al., 2009). Our detailedsimulations suggest that, under certain conditions, GCA is able torecover much of the causal network structure in spite of the presenceof HRF delay confounds. This is most clearly illustrated using thesimulations shown in Fig. 7C, where we found that GCA could recoverthe network structure except for the intrinsic/modulatory connectionfrom node 1 to 2. Our simulations also suggest that GCA may not beable to uncover causal connections when there is a conflict betweenintrinsic and modulatory connections (Figs. 7C, 8–10) but for othercases it is able to recover the underlying networks. In estimating thecausal interactions, the estimated model order for GCA using theAkaike information criterion (AIC) was more than 3. Note that in oursimulations, the causal interactions at the latent signal level weregenerated using VAR with model order 1 (Eq. (1)). It is plausible thatin GCA, the higher model order is used to compensate for variations inHRF delay and experimental effects such as context specificmodulatory connections (Deshpande et al., 2009). Our simulationssuggest that this is indeed that case and that optimal model orderselection in GCA results in improved estimation of causal interactionsbetween nodes. Nevertheless, structural time series based models likeMDS and DCM can provide better interpretation of network structuresince they can distinguish between intrinsic and context specificmodulatory causal interactions in latent signals.
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
Effects of down-sampling on MDS performance
In most fMRI studies, data are typically acquired at sampling ratesof about 2 s (or TR=2 s). However, dynamical interactions betweenbrain regions occur at faster time scales of 10–100 ms. To examine theeffects of downsampling fMRI data on the performance of MDS, wefirst simulated interactions between nodes at a sampling rates of1KHz and then re-sampled the time series to 0.5 Hz after convolvingwith region-specific HRFs. MDS-VB was then applied on these datasets to estimate causal interactions between nodes.We also examinedthe influence of HRF delays on the estimation of causal interactionsunder four scenarios (Figs. S1 and S2), similar to the strategy used byDeshpande and colleagues (Deshpande et al., 2009) to study the effectof HRF variability on GCA. In the first scenario, there were no causalinteractions between nodes but HRFs were delayed between thenodes. In this case, MDS-VB performed accurately and did not inferany false causal interactions. This shows that MDS-VB can model andremove the effects HRF variation while estimating causal interactionsat latent signal level. In the second scenario, we introduced causalinteractions between nodes, but without HRF variations. MDS reliablyestimated causal interactions for various delays in latent signals (Fig.S1A). In the third scenario, we introduced causal interactions betweennodes and also varied HRFs such that delays in latent signals and HRFswere in the same direction. In this case, MDS was able to recover bothintrinsic and modulatory causal interactions accurately (Figs. S1B,S1C). In the fourth scenario, when delays in latent signal and HRFopposed each other performance dropped significantly, just as withGCA (Deshpande et al., 2009). Further research is needed to examinewhether causal interactions under this scenario are inherentlyunresolvable by MDS and other techniques such as DCM.
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
Comparison of MDS with other approaches
As noted above, like GCA, MDS can be used to estimate causalinteractions between a large numbers of brain nodes. Unlike GCA,however, causal interactions are estimated on the underlying latentsignals while simultaneously accounting for regional variations in theHRF. Furthermore, unlike GCA, MDS can differentiate betweenintrinsic and stimulus-induced modulatory interactions. Like DCM,MDS takes into account regional variations inHRFwhile estimating thecausal interactions between brain regions. And like DCM, MDS alsoexplicitly models external and modulatory inputs, allowing us tosimultaneously estimate intrinsic and modulatory causal interactionsbetween brain regions. Unlike DCM, however, MDS does not requirethe investigator to test multiple models and choose one with thehighest model evidence. This overcomes an important limitation ofDCM - as the number of brain regions of interest increases, anexponentially large number of models needs to be examined; as aresult, the computational burden in evaluating these models andidentifying the appropriatemodel can becomeprohibitively high.MDSovercomes such problems and as our study illustrates, MDS incorpo-rates the relative merits of both GCA and DCM while attempting toovercome their limitations.
Both DCM and MDS are state-space modes but DCM uses adeterministic state model, (although a stochastic version has beenrecently developed (Daunizeau et al., 2009)) whereas MDS employs astochastic model. Modeling latent interactions as a stochastic processis important for taking into account intrinsic variations in latent signalsthat are not induced by experimental stimuli. Another importantdifference is that MDS uses empirical basis functions to modelvariations in HRF whereas DCM uses a biophysical Balloon model(Friston et al., 2003). Since the Balloon model is a nonlinear model,several approximations are required to solve it. In contrast, empiricalHRF basis functions allow MDS to use a linear dynamical systemsframework. The relative accuracy of these approaches is currently notknown.
One important advantage of MDS is that it does not assume thatthe fMRI time series is stationary unlike methods based on vectorautoregressive modeling such as GCA. This is important because thedynamics of the latent signals can be altered significantly byexperimental stimuli, leading to highly non-stationary signals. InGCA, the time series is tested for stationarity by either examining theautocorrelation of the time series or by investigating the presence ofunit roots. If the time series is found to be nonstationary then onecommonly used approach to remove non-stationarity is to replace theoriginal time series with a difference of the current and previous timepoints (Seth, 2010). A problem with the use of such a manipulation isit acts as a high-pass filter that can significantly distort the estimatedcausal interactions (Bressler and Seth, 2010).
Two methods based on dynamical systems based approach formodeling fMRI data have been proposed recently (Ge et al., 2009;Smith et al., 2009). Smith and colleagues used a switching lineardynamical systemsmodel wherein modulatory inputs were treated asrandomvariables (Smith et al., 2009). In contrast,MDSmodels themasdeterministic quantities which are known for a given fMRI experi-ment. Modeling modulatory inputs as unknown random variables isuseful for fMRI experiments in which the occurrence of modulatoryinputs is unknown. However, formost fMRI studiesmodulatory inputsare known and modeling them as unknown quantities unnecessarilyincreases the number of parameters to be estimated. Also, theswitching dynamical systems model makes additional assumptionsin computing the probability distributions of the state variables(Murphy, 1998). Further, Smith and colleagues used anMLE approachto estimate latent signals and model parameters. As we show in thisstudy, compared to MLE, a VB approach yields more robust,computationally efficient and accurate model estimation even whenthe SNR and the number of time points are low, as is generally the case
ms models for estimating causal interactions in fMRI, NeuroImage
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957958959
960961
962963
964965966
13S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
with fMRI data. Another difference is that Smith and colleaguescombine intrinsic and modulatory matrices i.e., for every j-thmodulatory input, the connection matrix (Aj=A+Cj) is estimatedfrom the data. In MDS, we estimate intrinsic A and modulatorymatrices Cj separately which explicitly dissociates intrinsic andmodulatory effects on causal interactions between brain regions.Another difference lies in testing the statistical significance of theestimated causal connections. In MLE- MDS, we use a non-parametricapproach and in MDS-VB, posterior probabilities of the modelparameters are used for testing the significance of the causalinteractions. Finally, it should be noted that the performance of thismethod under varying SNRs and sample sizes is not known since nosimulations were performed.
Ge et al. (2009) used a different state-space approach to estimatecausal interactions in the presence of external stimuli. They usedvector autoregressive modeling for the state equation to model causalinteractions among brain regions, whereas the observationmodel wasnonlinear. They used an extended Kalman filtering approach toestimate the state variables and model parameters. This method wasapplied on local field potential data, so its usefulness for fMRI data isunclear. However, there are several differences betweenMDS and thisapproach. MDS has been developed explicitly for fMRI data to accountfor HRF variations in brain regions while simultaneously estimatingcausal interactions. In the work of Ge and colleagues, both statevariables and unknown model parameters were treated as statevariables and extended Kalman filtering is used to obtain maximumlikelihood estimates of these variables (Ge et al., 2009). In MDS, wehave taken a different approach - state variables are separated frommodel parameters. This allowed us to use sparsification promotingpriors in the MDS-VB approach. Our results on simulated data suggestthat MDS-VB outperforms MDS-MLE especially at low SNRs andsmaller number of temporal observations. Finally, Ge and colleaguesused Kalman filtering to estimate state variables while in MDS weused Kalman smoothing for estimating the latent signals (Ge et al.,2009). In Kalman smoothing, both past and future data is used toestimate latent signals whereas filtering uses only past values toestimate the current values. In general, smoothing provides betterestimates of latent signal than the filtering approach (Bishop, 2006).Finally, although Ge and colleagues validated their approach on twothree-node toy models, the performance of this method is not knownunder varying conditions such as different SNRs, network sizes andnumber of data samples.
967968
969970
971972
973974
975976
977978
979980
981982
Conclusions
The Bayesian multivariate dynamical system framework we havedeveloped here provides a robust method for estimating andinterpreting causal network interactions in simulated BOLD-fMRIdata. Extensive computer simulations demonstrate that this MDSmethod is more accurate and robust than GCA and among the MDSmethods developed here MDS-VB exhibits superior performance overMDS-MLE. Critically, MDS estimates both intrinsic and experimentallyinduced modulatory interactions in the latent signals, rather than theobserved BOLD-fMRI signals. Unlike DCM, our proposed MDSframework does not require testing multiple models and maytherefore be more useful for analyzing networks with a large numberof nodes and connections. One limitation of this work is that oursimulations were based on data sets created using the same model asthe one used to estimate causal interactions. In this vein, preliminaryanalysis using simulations with delayed latent signals at millisecondtemporal resolution suggests that MDS can accurately recoverintrinsic and modulatory causal interactions in the presence ofconfounding delays in HRF. Future studies will examine theperformance of MDS using more realistic simulations in which causalinfluences are generated independently of any one particular model,
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
as well as the application of MDS real experimental fMRI data (Menonet al., in preparation).
Acknowledgments
This research was supported by grants from the National Institutesof Health (HD047520, HD059205, HD057610) and the NationalScience Foundation (BCS-0449927).
Appendix A
In this appendix, we provide detailed equations for estimating themodel parameters and latent states of MDS using an expectationmaximization algorithm.
Solving MDS Using Maximum Likelihood Estimation
The state space and observation Eqs. (1)–(3) can be expressed inthe standard state form so that Kalman filtering and smoothingrecursive equations can be used to estimate the probability distribu-tion of latent signals which constitutes the E-step in our EM algorithm(Penny et al., 2005).
Let
x tð Þ = s′ tð Þs′ t−1ð Þ…::s′ t−L + 1ð Þ½ �′ ðA:1Þ
The Eqs. (1)–(3) can be written in terms of x(t) as
x tð Þ = G tð Þx t−1ð Þ + Du tð Þ + w tð Þ ðA:2Þ
y tð Þ = BΦx tð Þ + e tð Þ ðA:3Þ
where,
G tð Þ = A + ∑J
j=1vj tð ÞCj; G tð Þ = A + ∑
J
j=1vj tð ÞCj
A =A 0M� L−1ð ÞΨ 0M
!; Cj =
Cj 0M� L−1ð ÞΨ 0M
! ðA:4Þ
Ψ is the M(L−2)×ML delay matrix that fills the lower rows of Ã.Similarly,
D =D 0M� L−1ð Þ
0M L−2ð Þ�ML 0M
� �ðA:5Þ
U tð Þ = u′ tð Þ 01;M L−1ð Þ� �
′ ðA:6Þ
w tð Þ = w′ tð Þ 01;M L−1ð Þ� �
′ ðA:7Þ
w tð Þ eN 0; Q�
ðA:8Þ
Q =Q 0M� L−1ð Þ
0M L−2ð Þ�ML 0M
� �ðA:9Þ
B =b1 ⋯ 0⋮ ⋱ ⋮0 ⋯ bM
24 35 ðA:10Þ
Φ =⋯
⋮ ⋱ ⋮⋯
24 35 ðA:11Þ
e tð Þ = e1 tð Þ e2 tð Þ−−−em tð Þ½ �′ ðA:12Þ
e tð Þ eN 0;Rð Þ ðA:13Þ
ms models for estimating causal interactions in fMRI, NeuroImage
983984
985986987
988989990991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
100810091010
1011
10121013
10141015
10161017
10181019
102010211022
1023
1024
1025
1026
102710281029
1030
10311032
10331034
103510361037
1038Q41039
1040
1041
1042
1043
1044
1045
1046
1047
10481049
1050
1051
1052
105310541055
1056
1057
1058
105910601061
10621063
10641065
10661067
10681069
14 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
R = diag σ21;σ
22;…:σ2
M
� ðA:4Þ
Let x(0) be the initial state and uncorrelated with state noise w tð Þand observation noise e(t) and is normally distributed with mean μoand covarianceΣo. The Eqs. (9) and (10) are now in standard linearstate-space system. Therefore, Kalman filtering and smoothingrecursions can be used to carry out the E-step.
E-step
In E-step, the probability of latent signal x(t), t=1,2…..,T giventhe observed data y(t), t=1,2…..,T is computed. Since the state,observation noise and initial state x(0) are assumed to be Gaussian,the latent signals x(t), t=1,2…..,T are also normally distributedwhose means and covariances can be estimated by forward (filtering)and backward (smoothing) recursion steps in Kalman estimationframework. In the filtering step, mean and covariance of x(t) arecomputed given the observations y(τ), τ=1,2….., t. In the smoothingstep, the above means and covariances are updated such that theyrepresent mean and covariance at each time t given all the data y(t),t=1,2…..,T.
Kalman filteringIn the filtering step, the goal is to compute the following posterior
distribution of x(t) given the observations y(τ), τ=1,2….., t and theparameters of the model
p x tð Þjy 1ð Þ; y 2ð Þ…:y tð Þð Þ = N xtt ;Σtt
� ðA:15Þ
The mean and covariance of this distribution can be computedusing the following forward recursive steps.
xt−1t = Gxt−1
t−1 + D U tð Þ ðA:16Þ
Σt−1t = G tð ÞΣt−1
t−1G′ + Q ðA:17Þ
K tð Þ = Σt−1t E′ EΣt−1
t E′ + R� −1 ðA:18Þ
where, E = BΦ
xtt = xt−1t + K tð Þ y tð Þ−Ext−1
t
� ðA:19Þ
Σtt = Σt−1
t −K tð ÞEΣt−1t ðA:20Þ
The above recursion is initialized using x10=μo and Σ1
0=Σo.
Kalman SmoothingIn the smoothing step, the goal is to compute the posterior
distribution of x(t) given all the observations y(τ), τ=1,2…..,T andthe parameters
p x tð Þjy 1ð Þ; y 2ð Þ…:y Tð Þð Þ = N xTt ;ΣTt
� ðA:21Þ
The mean xtT and covariance Σt
T at each time t can be estimatedusing the following backward recursions.
xTt = xtt + Jt xTt + 1−G tð Þxtt�
ðA:22Þ
ΣTt = Σt
t + Jt ΣT + 1t −Σt−1
t
� J′t ðA:23Þ
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
where, Jt is defined as
Jt = ΣttG′ tð Þ Σt−1
t
� −1 ðA:24Þ
The above backward recursions are initialized noting the fact thatfor t=T, xT
T and ΣTT can be obtained from Eqs. (21) and (22)
respectively.
M-step
In the M-step, the goal is to find the unknown parameters Θ={A,C1,..,CJ,D,Q,B,R} given the data and the current posterior distributionsof x(t), t=1,2…..,T. The parameters Θ can be estimated bymaximizing the expected complete log-likelihood of the data andtherefore the resulting estimates are called as maximum likelihoodestimates.
The complete log-likelihood of the data is given by
L = logp x 1ð Þ; x 2ð Þ…x Tð Þ; y 1ð Þ; y 2ð Þ…y Tð ÞjΘð Þ
= logp x 1ð Þ jΘð Þ + ∑T
t=2logp x tð Þjx t−1ð Þ;Θð Þ + ∑
T
t=1logp y tð Þjx tð Þ;Θð Þ
ðA:25Þ
Estimation of A, Cj's and DThe complete log-likelihood that depends on the parameters A, C
and D is given by
L A;C1; ::CJ ;D�
∝−0:5 ∑T
t=2xs tð Þ− A + ∑
J
j=1vj tð ÞCj
!Fx t−1ð Þ−du tð Þ
!′
�Q−1 xs tð Þ− A + ∑J
j=1vj tð ÞCj
!Fx t−1ð Þ−du tð Þ
! ðA:26Þ
where, F = IM 0M� L−1ð Þ� �
; d = diag Dð Þ and xs(t)=x(1:M,t)=s(t).Taking expectations of Eq. (A.26) with respect to p(x(t)|y(1),
y(2)….,y(T)) and then differentiating the resulting equation with A,C and d results in the following coupled linear equations:
A C1; ::CJ dh i ∑
T
t=2F tð ÞP t−1ð ÞF tð Þ′ ∑
T
t=2F tð ÞxTt−1u tð Þ
∑T
t=2xTt−1u tð ÞF tð Þ′ ∑
T
t=2u tð Þ2
0BBBB@1CCCCA
= ∑T
t=2Ps t; t−1ð ÞF tð Þ′ ∑
T
t=2ms tð Þu tð Þ
" #ðA:27Þ
where,
P tð Þ = ΣTt + xTt
� xTt�
′ ðA:28Þ
ms tð Þ = xTt 1 : Mð Þ ðA:29Þ
F tð Þ = IM v1 tð ÞIM…_:vJ tð ÞIMh i′
F ðA:30Þ
(ms(t) is first M elements of xtT) ,
P t; t−1ð Þ = Jt−1ΣTt + xTt
� xTt�
′ ðA:31Þ
Ps(t, t−1) is the first M×M sub matrix of P(t, t−1).
ms models for estimating causal interactions in fMRI, NeuroImage
1070
1071
1072
1073
107410751076
1077
1078Q51079Q6
1080
1081
1082
1083
1084
1085
108610871088
1089
1090
1091
1092
1093
1094
1095
1096
10971098
1099
1100
1101
11021103
110411051106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117111811191120112111221123112411251126112711281129113011311132
1133
1134
1135
1136
11371138
113911401141
1142
11431144
1145114611471148
114911501151
1152
1153
115411551156
1157
1158
115911601161
1162
11631164
15S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
Estimation of QTaking expectations of Eq. (A.26) with respect to p(x(t)|y(1),
y(2)….,y(T)) and then differentiating the resulting equation with Q,the estimate of Q is given by
Q =1
T−1∑T
t=2ðPs tð Þ−Ps t; t−1ð ÞF′G tð Þ′−ms tð Þu tð Þ′d−G tð ÞFPs t; t−1ð Þ
+ G tð ÞFP t−1ð ÞF′G tð Þ′ + G tð ÞFxTt−1u′tð Þd−du tð Þm′s tð Þ
+ du tð Þ xTt−1′ F′G tð Þ′ + d′u tð Þ2d�
ðA:32Þ
Where,Ps(t) is the first M×M sub matrix of P(t). Note that the estimated
values of A A�
; C C�
and d (d) obtained by solving Eq. (34) are usedin Eq. (35) in place of A, C and d.
Estimation of BEach row vector bm, m=1,2…..,M can be estimated (we assume
the noise covariance matrix R to be diagonal) by maximizing theconditional expectation of the complete log-likelihood given inEq. (A.25). By taking the derivative of the conditional expectationand equating it to zero, the estimate of bm is given by
bm′ = Φ ∑T
t=1Pm tð ÞΦ′
!−1
Φ ∑T
t=1ym tð ÞxTt mð Þ ðA:33Þ
Where, Pm(t)=E(sm(t)s′m(t)|y(1),y(2)….y(T)) that can be easilyfound from P(t).
ym(t) and xtT(m) are m-th elements of the vectors y(t) and xt
T
respectively.
Estimation of RThe diagonal observation covariance matrix R can be estimated by
maximizing the conditional expectation of the complete log-likeli-hood given in Eq. (A.25). The estimation of diagonal components of Rare given by
R m;mð Þ = 1T∑T
t=1ðy2m tð Þ−2bm′ Φym tð ÞxTt mð Þ + trace bm′ ΦPr tð ÞΦbm
� ;
m = 1;2;…MðA:34Þ
Estimation of μo and Σo
The maximum likelihood estimates of initial state mean μo andcovariance Σo are given by
μo = xT1 ðA:35Þ
Σo = ΣT1 ðA:36Þ
The above E and M steps are repeated until the change in log-likelihood of the data between two iterations is below a specifiedthreshold.
Appendix B
Solving MDS using VB framework
In VB, the goal is to find the posterior distributions of latentvariables qS SjYð Þ and parameters qΘ(Θ|Y) by maximizing the lowerbound on the log evidence L(q) given in Eq. (5).
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
VB-E-step
In this step, the posterior distributions of latent variables qS SjYð Þare estimated given the current posterior probability of the modelparameters qΘ(Θ|Y). As in MLE approach, we compute the posteriorsof embedded latent signals x(t) fromwhich the posterior of s(t) can beobtained. The distribution over these latent variables is obtained usinga sequential algorithm similar to Kalman smoothing which we used inthe E-step of MLE approach. In the VB version of Kalman smoothing,the point estimates of the parameters are replaced by theirexpectations of the type E(ZWZ′) where Z is some parameter of themodel and W a matrix. Although, these expectations are straightfor-ward to compute, but are computationally expensive for higher ordermodels.We, therefore, use the approximation E(AWA′)=E(A)WE(A′),which gives qualitatively similar results and is computationallyefficient. This approach was also taken in (Cassidy and Penny,2002). As a result, the VB-E step is same as the E-step in MLEapproach. Therefore, mean and covariance of x(t) are given by xt
T andΣtT.
VB-M step
In this step, the posterior distributions of model parameters qΘ(Θ|Y) are estimated given the current posterior probability of the latentvariablesqS SjYð Þ . Using the probabilistic graphical model in Fig. 1, onecan show that the joint posterior distribution of parameters qΘ(Θ|Y)further factorizes as
qΘ ΘjYð Þ = q A;C1; ::CJ ;D;Q�
q B;Rð Þ ðB:1Þ
In this work, we also assume that the state and observation noisecovariance matrices (Q&RÞ to be diagonal. Therefore, the distributionof the elements in the rows of A;C1; ::CJ ;D&B can be inferredseparately. Consider the state equation for the m-th node
sm tð Þ = am + ∑J
j=1vj tð Þcj;m
!sm t−1ð Þ + dmu tð Þ + wm tð ÞÞ;
wm tð Þ eN 0;βmð Þ
ðB:2Þ
where, am and cj,m are m-th rows of A and Cj respectively andβm = 1
Q m;mð Þ : In terms of embedded signal x(t), the above equationcan be written as:
sm tð Þ = θ′m F tð Þx tð Þ;u tð Þ½ � + wm tð Þ ðB:3Þ
Where, θ′m=[am,c1,m,…,cJ,m,dm]F(t)=[IM v1(t)IM…..vJ(t)IM]′F.We assume the following Gaussian-Gamma conjugate priors for θm
and βm
p θm; βmjαð Þ eN 0; βmAαð Þ−1�
Ga ao; boð Þ ðB:4Þ
Where,α=[α1,α2,….,α2M+1] are the hyperpriors on each elementof θm and Aα=diag(α).
Let the prior on α be
p αð Þ = ∏2M+1
i=1Ga co;doð Þ ðB:5Þ
Therefore, by applying Eq. (9), the joint posterior for θm&βm isgiven by
q θm; βmjYð Þ = N θm ;β−1m Σm
� Ga am;N ; bmN
� ðB:6Þ
ms models for estimating causal interactions in fMRI, NeuroImage
1165
11661167
11681169
11701171
117211731174
117511761177
11781179
118011811182
1183
1184
1185
1186
118711881189
1190
119111921193
1194
1195
119611971198
11991200
12011202
12031204
12051206
12071208
12091210
12111212
121312141215
1216
1217
1218
1219
1220
1221
1222
1223122412251226122712281229
1230
1231
1232
1233
12341235123612371238123912401241Q712421243Q81244124512461247124812491250125112521253Q912541255125612571258Q1012591260126112621263126412651266126712681269
16 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
Where,
∑−1m =
∑T
t=2F tð ÞP t−1ð ÞF tð Þ′ ∑
T
t=2F tð ÞxTt−1u tð Þ
∑T
t=2xTt−1u tð ÞF tð Þ′ ∑
T
t=2u tð Þ2
0BBBB@1CCCCA ðB:7Þ
θm = Σm
∑T
t=2F tð ÞE sm tð Þx t−1ð Þð Þ
∑T
t=2Ps t; t−1ð ÞF tð Þ′
0BBBB@1CCCCA ðB:8Þ
am;N = ao +T + 2M
2ðB:9Þ
bm;N = bo + 0:5 ∑T
t=2E s2m tð Þ�
−θm′ Σ−1m θm
!ðB:10Þ
The posterior for hyper parameters α is given by
q αjYð Þ = ∏2M+1
i=1Ga cN ;dNið Þ ðB:11Þ
Where,
cN = co +12
ðB:12Þ
dNi = do +12
θ2m ið Þ am;N
bm;N+ Σm i; ið Þ
!ðB:13Þ
The posteriors for θm, βm and α are estimated for each m=1,2,…,M from which the posteriors for A;C1;…;CJ ;D&Q are computed.
Similarly, the posterior distribution for the model parameters inthe output equation is computed. Since R is assumed to be diagonal,the observation equation in m-th node is given by
ym tð Þ = bmΦxm tð Þ + em tð Þ; em tð Þ eN 0;λmð Þ ðB:14Þ
Where, λm = 1R m;mð Þ. Again assuming Gaussian-Gamma conjugate
priors for bm and λm
p bm; λmjαð Þ eN 0; λmAαð Þ−1�
Ga ao; boð Þ ðB:15Þ
Where, α=[α1,α2,….,αP] are the hyper priors on each element ofbm and Aα=diag(α).
Let the prior on α be
p αð Þ = ∏P
i=1Ga ao; boð Þ ðB:16Þ
By applying Eq. (9), the joint posterior for bm and λm is given by
q bm; λmjYð Þ = N bm;λ−1m Vm
� Ga am;N ; bmN
� ðB:17Þ
V−1m = Φ ∑
T
t=1Pm tð ÞΦ′ + Aα ðB:18Þ
bm = VmΦ ∑T
t=1ym tð ÞxTt mð Þ ðB:19Þ
am;N = ao +T + P−1
2ðB:20Þ
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
bm;N = bo + 0:5 ∑T
t=2E y2m tð Þ�
−bm′ V−1m bm
!ðB:21Þ
cN = co +12
ðB:22Þ
dNi = do +12
b2m ið Þ am;N
bm;N+ Vm i; ið Þ
!ðB:23Þ
Aα i; ið Þ = cNdNi
ðB:24Þ
The posteriors for bm, λm andα are estimated for eachm=1,2,...,M.In this work, we set the hyperparameters ao, bo, co and do to 10−3.
The VB-E and VB-M steps are repeated until convergence.
Appendix C
Initialization
The above iterative procedure (E and M step) needs to beinitialized. In this work, we estimate the initial values of latent signalss tð Þ �
by estimating them at each node using the Weiner deconvolu-tion method of (Glover, 1999) wherein the canonical HRF is used forthis deconvolution step. We then estimate the initial values of A, C, dand Q by solving Eq. (1) by least squares assuming that the s tð Þ0s aretrue values. Similarly the parameters B and R are estimated fromEq. (3) by least squares approach. The EM algorithm is then run usingthese initial values till the required convergence is obtained.
Appendix D. Supplementary data
Supplementary data to this article can be found online atdoi:10.1016/j.neuroimage.2010.09.052.
References
Abler, B., Roebroeck, A., Goebel, R., Hose, A., Schonfeldt-Lecuona, C., Hole, G., Walter, H.,2006. Investigating directed influences between activated brain areas in a motor-response task using fMRI. Magn. Reson. Imaging 24, 181–185.
Bishop, C., 2006. Pattern Recognition and Machine Learning. Springer.Box, G.E.P., Jenkins, G.M., Reinsel, G.C., 1994. Time Series Analysis Forecasting and
Control. Pearson Education.Bressler, S.L., Menon, V., 2010. Large-scale brain networks in cognition: emerging
methods and principles. Trends Cogn. Sci.Bressler, S.L., Seth, A.K., 2010. Wiener–Granger causality: a well established
methodology. Neuroimage.Cassidy, M.J., Penny, W.D., 2002. Bayesian nonstationary autoregressive models for
biomedical signal analysis. IEEE Trans. Biomed. Eng. 49, 1142–1152.Daunizeau, J., Friston, K.J., Kiebel, S.J., 2009. Variational Bayesian identification and
prediction of stochastic nonlinear dynamic causal models. Physica D 238,2089–2118.
Deshpande, G., Hu, X., Stilla, R., Sathian, K., 2008. Effective connectivity during hapticperception: a study using Granger causality analysis of functional magneticresonance imaging data. Neuroimage 40, 1807–1814.
Deshpande, G., Sathian, K., Hu, X., 2009. Effect of hemodynamic variability on Grangercausality analysis of fMRI. Neuroimage.
Friston, K., 2009a. Causal modelling and brain connectivity in functional magneticresonance imaging. PLoS Biol. 7, e33.
Friston, K., 2009b. Dynamic causal modeling and Granger causality comments on: Theidentification of interacting networks in the brain using fMRI: model selection,causality and deconvolution. Neuroimage.
Friston, K.J., 2009c. Modalities, modes, and models in functional neuroimaging. Science326, 399–403.
Friston, K.J., Harrison, L., Penny, W., 2003. Dynamic causal modelling. Neuroimage 19,1273–1302.
Fuster, J.M., 2006. The cognit: a network model of cortical representation. Int. J.Psychophysiol. 60, 125–132.
Ge, T., Kendrick, K.M., Feng, J., 2009. A novel extended Granger Causal Model approachdemonstrates brain hemispheric differences during face recognition learning. PLoSComput. Biol. 5, e1000570.
Glover, G.H., 1999. Deconvolution of impulse response in event-related BOLD fMRI.Neuroimage 9, 416–429.
ms models for estimating causal interactions in fMRI, NeuroImage
127012711272127312741275
1277127812791280Q11128112821283128412851286128712881289129012911292129312941295129612971298
1299130013011302130313041305130613071308Q1213091310131113121313131413151316131713181319Q1313201321132213231324132513261327
1329
17S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx
Goebel, R., Roebroeck, A., Kim, D.S., Formisano, E., 2003. Investigating directed corticalinteractions in time-resolved fMRI data using vector autoregressive modeling andGranger causality mapping. Magn. Reson. Imaging 21, 1251–1261.
Guo, S., Wu, J., Ding, M., Feng, J., 2008. Uncovering interactions in the frequency domain.PLoS Comput. Biol. 4, e1000087.
Havlicek, M., Jan, J., Brazdil, M., Calhoun, V.D., 2010. Dynamic Granger causality basedon Kalman filter for evaluation of functional network connectivity in fMRI data.Neuroimage.
Hemmelmann, D., Ungureanu, M., Hesse, W., Wustenberg, T., Reichenbach, J.R., Witte,O.W., Witte, H., Leistritz, L., 2009. Modelling and analysis of time-variant directedinterrelations between brain regions based on BOLD-signals. Neuroimage.
Hesse, W., Moller, E., Arnold, M., Schack, B., 2003. The use of time-variant EEG Grangercausality for inspecting directed interdependencies of neural assemblies. J.Neurosci. Methods 124, 27–44.
Koller, D., Friedman, N., 2009. Probabilistic Graphical Models Principles and Techniques.The MIT Press.
Makni, S., Beckmann, C., Smith, S., Woolrich, M., 2008. Bayesian deconvolution of[corrected] fMRI data using bilinear dynamical systems. Neuroimage 42,1381–1396.
Mechelli, A., Price,C.J., Noppeney,U., Friston,K.J., 2003.Adynamiccausalmodelingstudyoncategory effects: bottom-up or top-down mediation? J. Cogn. Neurosci. 15, 925–934.
Murphy, K.P., 1998. Switching Kalman Filters. Technical report, DEC/Compaq Cam-bridge Research Labs.
Passingham, R.E., Stephan, K.E., Kotter, R., 2002. The anatomical basis of functionallocalization in the cortex. Nat. Rev. Neurosci. 3, 606–616.
Penny, W., Ghahramani, Z., Friston, K., 2005. Bilinear dynamical systems. Philos. Trans.R. Soc. Lond. B Biol. Sci. 360, 983–993.
Prichard, D., Theiler, J., 1994. Generating surrogate data for time series with severalsimultaneously measured variables. Phys. Rev. Lett. 73, 951–954.
1328
Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052
Rabiner, L.R., 1989. A tutorial on hidden Markov models and selected applications inspeech recognition. Proc. IEEE 77, 257–285.
Rajapakse, J.C., Zhou, J., 2007. Learning effective brain connectivity with dynamicBayesian networks. Neuroimage 37, 749–760.
Ramsey, J.D., Hanson, S.J., Hanson, C., Halchenko, Y.O., Poldrack, R.A., Glymour, C., 2009.Six problems for causal inference from fMRI. Neuroimage 49, 1545–1558.
Roebroeck, A., Formisano, E., Goebel, R., 2005. Mapping directed influence over thebrain using Granger causality and fMRI. Neuroimage 25, 230–242.
Roebroeck, A., Formisano, E., Goebel, R., 2009. The identification of interacting networksin the brain using fMRI: model selection, causality and deconvolution. Neuroimage.
Sato, J.R., Junior, E.A., Takahashi, D.Y., de Maria Felix, M., Brammer, M.J., Morettin, P.A.,2006. A method to produce evolving functional connectivity maps during thecourse of an fMRI experiment using wavelet-based time-varying Granger causality.Neuroimage 31, 187–196.
Seth, A.K., 2005. Causal connectivity of evolved neural networks during behavior.Network 16, 35–54.
Seth, A.K., 2010. AMATLAB toolbox for Granger causal connectivity analysis. J. Neurosci.Methods 186, 262–273.
Smith, J.F., Pillai, A., Chen, K., Horwitz, B., 2009. Identification and validation of effectiveconnectivity networks in functional magnetic resonance imaging using switchinglinear dynamic systems. Neuroimage.
Sridharan, D., Levitin, D.J., Menon, V., 2008. A critical role for the right fronto-insularcortex in switching between central-executive and default-mode networks. Proc.Natl. Acad. Sci. U. S. A. 105, 12569–12574.
Tipping, M., 2001. Sparse Bayesian learning and relevant vector machine. J. Mach. Learn.Res. 1, 211–244.
Valdes-Sosa, P.A., Sanchez-Bornot, J.M., Lage-Castellanos, A., Vega-Hernandez, M.,Bosch-Bayard, J., Melie-Garcia, L., Canales-Rodriguez, E., 2005. Estimating brainfunctional connectivity with sparse multivariate autoregression. Philos. Trans. R.Soc. Lond. B Biol. Sci. 360, 969–981.
ms models for estimating causal interactions in fMRI, NeuroImage