Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector...

15
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 20, OCTOBER 15, 2019 5325 Nonlinear Structural Vector Autoregressive Models With Application to Directed Brain Networks Yanning Shen , Member, IEEE, Georgios B. Giannakis , Fellow, IEEE, and Brian Baingana , Member, IEEE Abstract—Structural equation models (SEMs) and vector au- toregressive models (VARMs) are two broad families of approaches that have been shown useful in effective brain connectivity studies. While VARMs postulate that a given region of interest in the brain is directionally connected to another one by virtue of time-lagged influences, SEMs assert that directed dependencies arise due to instantaneous effects, and may even be adopted when nodal mea- surements are not necessarily multivariate time series. To unify these complementary perspectives, linear structural vector autore- gressive models (SVARMs) that leverage both instantaneous and time-lagged nodal data have recently been put forth. Albeit simple and tractable, linear SVARMs are quite limited since they are incapable of modeling nonlinear dependencies between neuronal time series. To this end, the overarching goal of the present paper is to considerably broaden the span of linear SVARMs by capturing nonlinearities through kernels, which have recently emerged as a powerful nonlinear modeling framework in canonical machine learning tasks, e.g., regression, classification, and dimensionality reduction. The merits of kernel-based methods are extended here to the task of learning the effective brain connectivity, and an efficient regularized estimator is put forth to leverage the edge sparsity inherent to real-world complex networks. Judicious kernel choice from a preselected dictionary of kernels is also addressed using a data-driven approach. Numerical tests on ECoG data captured through a study on epileptic seizures demonstrate that it is possible to unveil previously unknown directed links between brain regions of interest. Index Terms—Network topology inference, structural vector autoregressive models, nonlinear models. Manuscript received September 18, 2018; revised April 8, 2019 and August 28, 2019; accepted August 29, 2019. Date of publication September 11, 2019; date of current version September 20, 2019. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Vincent Y. F. Tan. This work was supported by in part by National Science Foundation (NSF) under Grants 1901134, 1711471, and 1500713, and in part by the National Institutes of Health (NIH) under Grant 1R01GM104975-01. The work of Y. Shen was also supported by the UMN Doctoral dissertation Fellowship. (Corresponding author: Georgios B. Giannakis.) Y. Shen was with Department of Electrical and Computer Engineering and the Digital Technology Center, University of Minnesota, Minneapolis, MN 55455 USA. She is now with the Department of Electrical Engineering and Computer Science and the Center for Pervasive Communications and Com- puting, University of California, Irvine, CA 92697 USA (e-mail: yannings@ uci.edu). G. B. Giannakis is with the Department of Electrical and Computer Engineer- ing and the Digital Technology Center, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: [email protected]). B. Baingana was with Department of Electrical and Computer Engineering and the Digital Technology Center, University of Minnesota, Minneapolis, MN 55455 USA. He is now with the NextEra Analytics, St Paul, MN 55107 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TSP.2019.2940122 I. INTRODUCTION S EVERAL contemporary studies in the neurosciences have converged on the well-accepted view that information pro- cessing capabilities of the brain are facilitated by the existence of a complex underlying network; see e.g., [39] for a comprehen- sive review. The general hope is that understanding the behavior of the brain through the lens of network science will reveal important insights, with an enduring impact on applications in both clinical and cognitive neuroscience. However, brain networks are not directly observable, and must be inferred from processes observed or measured at nodes. To this end, functional magnetic resonance imaging (fMRI) has emerged as a powerful tool, capable of revealing varying blood oxygenation patterns modulated by brain activity [37]. Other related brain imaging modalities include positron emis- sion tomography (PET), electroencephalography (EEG), and electrocorticography (ECoG), to name just a few. Most state- of-the-art tools for inference of brain connectivity leverage variants of causal and correlational analysis methods, applied to time-series obtained from the imaging modalities [7], [12], [13], [17], [18]. Contemporary brain connectivity analyses fall under two broad categories, namely, functional connectivity which pertains to discovery of non-directional pairwise correlations between regions of interest (ROIs), and effective connectivity which in- stead focuses on inference of directional dependencies between them [14]. Granger causality [38], vector autoregressive models (VARMs) [18], structural equation models (SEMs) [34], [36], and dynamic causal modeling (DCM) [15] constitute widely used approaches for effective connectivity studies. VARMs postulate that connected ROIs exert time-lagged dependencies among one another, while SEMs assume instantaneous directed interactions among them. Interestingly, these points of view are unified through the so-termed structural vector autoregressive model (SVARM) [9], which postulates that the spatio-temporal behavior observed in brain imaging data results from both in- stantaneous and time-lagged interactions between ROIs. It has been shown that SVARMs lead to markedly more flexibility and explanatory power than VARMs and SEMs treated separately, at the expense of increased model complexity [9]. The fundamental appeal of the aforementioned effective con- nectivity approaches stems from their inherent simplicity, since they adopt linear models. However, this is an oversimplification that is highly motivated by the need for tractability, even though consideration of nonlinear models for directed dependence may 1053-587X © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector...

Page 1: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 20, OCTOBER 15, 2019 5325

Nonlinear Structural Vector Autoregressive ModelsWith Application to Directed Brain Networks

Yanning Shen , Member, IEEE, Georgios B. Giannakis , Fellow, IEEE, and Brian Baingana , Member, IEEE

Abstract—Structural equation models (SEMs) and vector au-toregressive models (VARMs) are two broad families of approachesthat have been shown useful in effective brain connectivity studies.While VARMs postulate that a given region of interest in the brainis directionally connected to another one by virtue of time-laggedinfluences, SEMs assert that directed dependencies arise due toinstantaneous effects, and may even be adopted when nodal mea-surements are not necessarily multivariate time series. To unifythese complementary perspectives, linear structural vector autore-gressive models (SVARMs) that leverage both instantaneous andtime-lagged nodal data have recently been put forth. Albeit simpleand tractable, linear SVARMs are quite limited since they areincapable of modeling nonlinear dependencies between neuronaltime series. To this end, the overarching goal of the present paperis to considerably broaden the span of linear SVARMs by capturingnonlinearities through kernels, which have recently emerged asa powerful nonlinear modeling framework in canonical machinelearning tasks, e.g., regression, classification, and dimensionalityreduction. The merits of kernel-based methods are extended here tothe task of learning the effective brain connectivity, and an efficientregularized estimator is put forth to leverage the edge sparsityinherent to real-world complex networks. Judicious kernel choicefrom a preselected dictionary of kernels is also addressed usinga data-driven approach. Numerical tests on ECoG data capturedthrough a study on epileptic seizures demonstrate that it is possibleto unveil previously unknown directed links between brain regionsof interest.

Index Terms—Network topology inference, structural vectorautoregressive models, nonlinear models.

Manuscript received September 18, 2018; revised April 8, 2019 and August28, 2019; accepted August 29, 2019. Date of publication September 11, 2019;date of current version September 20, 2019. The associate editor coordinating thereview of this manuscript and approving it for publication was Prof. Vincent Y.F. Tan. This work was supported by in part by National Science Foundation(NSF) under Grants 1901134, 1711471, and 1500713, and in part by theNational Institutes of Health (NIH) under Grant 1R01GM104975-01. The workof Y. Shen was also supported by the UMN Doctoral dissertation Fellowship.(Corresponding author: Georgios B. Giannakis.)

Y. Shen was with Department of Electrical and Computer Engineering andthe Digital Technology Center, University of Minnesota, Minneapolis, MN55455 USA. She is now with the Department of Electrical Engineering andComputer Science and the Center for Pervasive Communications and Com-puting, University of California, Irvine, CA 92697 USA (e-mail: [email protected]).

G. B. Giannakis is with the Department of Electrical and Computer Engineer-ing and the Digital Technology Center, University of Minnesota, Minneapolis,MN 55455 USA (e-mail: [email protected]).

B. Baingana was with Department of Electrical and Computer Engineeringand the Digital Technology Center, University of Minnesota, Minneapolis, MN55455 USA. He is now with the NextEra Analytics, St Paul, MN 55107 USA(e-mail: [email protected]).

Digital Object Identifier 10.1109/TSP.2019.2940122

I. INTRODUCTION

S EVERAL contemporary studies in the neurosciences haveconverged on the well-accepted view that information pro-

cessing capabilities of the brain are facilitated by the existence ofa complex underlying network; see e.g., [39] for a comprehen-sive review. The general hope is that understanding the behaviorof the brain through the lens of network science will revealimportant insights, with an enduring impact on applications inboth clinical and cognitive neuroscience.

However, brain networks are not directly observable, andmust be inferred from processes observed or measured at nodes.To this end, functional magnetic resonance imaging (fMRI)has emerged as a powerful tool, capable of revealing varyingblood oxygenation patterns modulated by brain activity [37].Other related brain imaging modalities include positron emis-sion tomography (PET), electroencephalography (EEG), andelectrocorticography (ECoG), to name just a few. Most state-of-the-art tools for inference of brain connectivity leveragevariants of causal and correlational analysis methods, applied totime-series obtained from the imaging modalities [7], [12], [13],[17], [18].

Contemporary brain connectivity analyses fall under twobroad categories, namely, functional connectivity which pertainsto discovery of non-directional pairwise correlations betweenregions of interest (ROIs), and effective connectivity which in-stead focuses on inference of directional dependencies betweenthem [14]. Granger causality [38], vector autoregressive models(VARMs) [18], structural equation models (SEMs) [34], [36],and dynamic causal modeling (DCM) [15] constitute widelyused approaches for effective connectivity studies. VARMspostulate that connected ROIs exert time-lagged dependenciesamong one another, while SEMs assume instantaneous directedinteractions among them. Interestingly, these points of view areunified through the so-termed structural vector autoregressivemodel (SVARM) [9], which postulates that the spatio-temporalbehavior observed in brain imaging data results from both in-stantaneous and time-lagged interactions between ROIs. It hasbeen shown that SVARMs lead to markedly more flexibility andexplanatory power than VARMs and SEMs treated separately,at the expense of increased model complexity [9].

The fundamental appeal of the aforementioned effective con-nectivity approaches stems from their inherent simplicity, sincethey adopt linear models. However, this is an oversimplificationthat is highly motivated by the need for tractability, even thoughconsideration of nonlinear models for directed dependence may

1053-587X © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

5326 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 20, OCTOBER 15, 2019

lead to more accurate approaches for inference of brain connec-tivity. In fact, recognizing the limitations associated with linearmodels, several variants of nonlinear SEMs have been put forthin a number of recent works; see e.g., [16], [19], [23], [24], [26],[29], [51].

For example, [29] and [30] advocate SEMs in which nonlineardependencies only appear among the so-termed exogenous vari-ables. Furthermore, [23] puts forth a hierarchical Bayesian non-linear modeling approach in which unknown random parameterscapture the strength and directions of directed links among vari-ables. Several other studies adopt polynomial SEMs, which offeran immediate extension to classical linear SEMs; see e.g., [19],[24], [26], [42], [51]. In all these contemporary approaches, itis assumed that the network connectivity structure is known apriori, and developed algorithms only estimate the unknownedge weights. The Bayesian network method proposed in [22]also relies on probabilistic assumptions that are also related toprior information of the structure of the network. However, thisis a rather major limitation since such prior information may notbe available in practice, especially when dealing with potentiallymassive networks, e.g., the brain.

Similarly, several variants of nonlinear VARMs have beenshown useful in unveiling links that often remain undiscoveredby traditional linear models; see e.g., [31]–[33], [47]. Morerecently, [31] proposed a kernel-based VARM, with nonlineardependencies among nodes encoded by unknown functions be-longing to a reproducing kernel Hilbert space.

Building upon these prior works, the present paper putsforth a novel additive nonlinear VARM to capture dependen-cies between observed ROI-based time-series, without explicitknowledge of the edge structure. Similar to [43], [44], kernelsare advocated as an encompassing framework for nonlinearlearning tasks. Note that SVARMs admit an interesting interpre-tation as SEMs, with instantaneous terms viewed as endogenousvariables, and time-lagged terms as exogenous variables. Sincenumerical measurement of external brain stimuli is often im-practical, or extremely challenging in conventional experiments,adoption of such a fully-fledged SEM (with both endo- andexogenous inputs) is often impossible with traditional imagingmodalities.

A key feature of the novel approach is the premise thatedges in the unknown network are sparse, that is, each ROIis linked to only a small subset of all potential ROIs that wouldconstitute a maximally-connected power graph. This sparse edgeconnectivity has recently motivated the development of efficientregularized estimators, promoting the inference of sparse net-work adjacency matrices; see e.g., [1], [3], [21], [31], [48], [49]and references therein. Based on these prior works, this paperdevelops a sparse-regularized kernel-based nonlinear SVARMto estimate the effective brain connectivity from per-ROI timeseries. Compared with [31], the novel approach incorporatesinstantaneous variables, turns out to be more computation-ally efficient, and facilitates a data-driven approach for kernelselection.

The rest of this paper is organized as follows. Section IIintroduces the conventional SVARM, while Section III putsforth its novel nonlinear variant. Section IV advocates a

sparsity-promoting regularized least-squares estimator fortopology inference from the nonlinear SVARM, while Section Vdeals with an approach to learn the kernel that ‘best’ matchesthe data. Results of extensive numerical tests based on EEGdata from an Epilepsy study are presented in Section VI, andpertinent comparisons with linear variants demonstrate theefficacy of the novel approach. Finally, Section VII concludesthe paper, and highlights several potential future researchdirections opened up by this work.

Notation: Bold uppercase (lowercase) letters will denote ma-trices (column vectors), while operators (·)�, and diag(·) willstand for matrix transposition and diagonal matrices, respec-tively. The identity matrix will be represented by I, while 0will denote the all-zero matrix, and their dimensions will beclear from the context. Finally, �p and Frobenius norms will bedenoted by ‖ · ‖p, and ‖ · ‖F , respectively.

II. PRELIMINARIES ON LINEAR SVARMS

Consider a directed network whose topology is unknown,comprising N nodes, each associated with an observable timeseries {yit}Tt=1 measured over T time-slots, for i = 1, . . . , N .Note that yit denotes the t-th sample of the time series measuredat node i. In the context of the brain, each node could repre-sent a ROI, while the per-ROI time series are obtainable fromstandard imaging modalities, e.g., EEG or fMRI time courses.The network topology or edge structure will be captured by theweighted graph adjacency matrix A ∈ RN×N , whose (i, j)-thentryaij is nonzero only if a directed effect exists from region i toregion j.

In order to unveil the hidden directed network topology,traditional linear SVARMs postulate that each yjt can be rep-resented as a linear combination of instantaneous measure-ments at other nodes {yit}i�=j , and their time-lagged versions{{yi(t−�)}Ni=1}L�=1 [9]. Specifically, yjt admits the followinglinear instantaneous plus time-lagged model

yjt =∑

i�=j

a0jiyit +

N∑

i=1

L∑

�=1

a�jiyi(t−�) + ejt (1)

witha�ij capturing the directed influence of region iupon region jover a lag of � time points, while a0ij encodes the correspondinginstantaneous directed relationship between them. The coeffi-cients encode the directed structure of the network, that is, adirected link exists from nodes i to j, i.e., aij �= 0, only ifa0ij �= 0, or if there exists a�ij �= 0 for � = 1, . . . , L. If a0ij =0 ∀i, j, then (1) reduces to classical Granger causality [38]. Sim-ilarly, setting a�ij = 0 ∀i, j, � �= 0 reduces (1) to a linear SEMwith no exogenous inputs [25]. Defining yt := [y1t, . . . , yNt]

�,et := [e1t, . . . , eNt]

�, and the time-lagged adjacency matrixA� ∈ RN×N with the (i, j)-th entry [A�]ij := a�ij , one canwrite (1) in vector form as

yt = A0yt +

L∑

�=1

A�yt−� + et (2)

where A0 has zero diagonal entries a0ii = 0 for i = 1, . . . , N .

Page 3: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

SHEN et al.: NONLINEAR STRUCTURAL VECTOR AUTOREGRESSIVE MODELS WITH APPLICATION TO DIRECTED BRAIN NETWORKS 5327

Given the multivariate time series {yt}Tt=1, the goal is toestimate matrices {A�}L�=0, and consequently unveil the hiddennetwork topology. Admittedly, overfitting is a potential risk sinceL is assumed prescribed. Nevertheless, this can be mitigated viastandard order selection methods that control model complex-ity, e.g., the Bayesian information criterion [10], or Akaike’sinformation criterion [6].

Knowing which entries ofA0 are nonzero, several approacheshave been put forth to estimate their values. Examples are basedupon ordinary least-squares [9], and hypothesis tests developedto detect presence or absence of pairwise directed links underprescribed false-alarm rates [38]. Albeit conceptually simpleand computationally tractable, the linear SVARM is incapableof capturing nonlinear dependencies inherent to complex net-works such as the human brain. To this end, the present papergeneralizes the linear SVARM in (1) to a nonlinear kernel-basedSVARM.

It is also worth noting that most real world networks (includ-ing the brain) exhibit edge sparsity, the tendency for each nodeto link with only a few other nodes compared to the maximalO(N) set of potential connections per node. This means thatper j, only a few coefficients {a�ij} are nonzero. In fact, severalrecent approaches exploiting edge sparsity have been advocated,leading to more efficient topology estimation; see e.g., [1],[3], [31].

III. FROM LINEAR TO NONLINEAR SVARMS

To enhance flexibility and accuracy, this section general-izes (1) so that nonlinear directed dependencies can be captured.The most general nonlinear model with both instantaneous (spa-tial) and time-lagged (temporal) dependencies can be written inmultivariate form asyt = f(yt, {yt−�}L�=1) + et, or, entry-wiseas

yjt = fj(y−jt, {yt−�}L�=1) + ejt, j = 1, . . . , N (3)

where y−jt := [y1t, . . . , y(j−1)t, y(j+1)t, . . . , yNt]� collects

all but the j-th nodal observation at time t, yt−� :=[y1(t−�), . . . , yN(t−�)]

�, and fj(.) denotes a nonlinear functionof its multivariate argument. With limited (NT ) data available,fj in (3) entails (L+ 1)N − 1 variables. This fact motivatessimpler model functions to cope with the emerging ‘curse ofdimensionality’ in estimating {fj}Nj=1. A simplified form of (3)has been studied in [31] with L = 1, and without instantaneousinfluences y−jt, which have been shown of importance in ap-plications such as brain connectivity [34] and gene regulatorynetworks [8]. Such a model is simplified compared with (3) be-cause the number of variables of fj reduces toN . Nevertheless,estimating such anN -variate functional model still suffers fromthe curse of dimensionality, especially when the size of typicalnetworks scales up.

To circumvent this challenge, we further posit that the mul-tivariate function in (3) is separable with respect to each of its(L+ 1)N − 1 variables. Such a simplification of (3) amountsto adopting a generalized additive model (GAM) [20, Ch. 9]. Inthe present context, the GAM adopted is fj(y−jt, {yt−�}L�=1) =∑

i �=j f0ij(yit) +

∑Ni=1

∑L�=1 f

�ij(yi(t−�)), where the nonlinear

functions {f �ij} will be specified in the next section. Definingf �ij(y) := a�ijf

�ij(y), the node j observation at time t is a result

of both instantaneous and multi-lag effects; that is [cf. (1)]

yjt =∑

i�=j

a0ijf0ij(yit) +

N∑

i=1

L∑

�=1

a�ijf�ij(yi(t−�)) + ejt (4)

where similar to (1), {a�ij} define the matrices {A�}L�=0. Inorder to avoid scaling ambiguity, we a�ij ∈ {0, 1} as binaryvariables indicating the presence of edge, meaning a directededge from node j to node i exists if the corresponding a�ij �=0 for any � = 0, 1, . . . , L. Instead of having to estimate an[(L+ 1)N − 1]-variate function in (3) or an N -variate func-tion in [31], (4) requires estimating (L+ 1)N − 1 univariatefunctions. Note that conventional linear SVARMs in (1) assumethat the functions {f �ij} in (4) are linear, a limitation that theensuing Section IV will address by resorting to a reproducingkernel Hilbert space (RKHS) formulation to model {f �ij}.

Problem statement: Given {yt ∈ RN}Tt=1, the goal now be-comes to find the nonlinear functions {f �ij}, as well as theadjacency matrices {A�}L�=0 in (4).

IV. KERNEL-BASED SPARSE SVARMS

Suppose that each univariate function f �ij(.) in (4) belongs tothe RKHS

H�i :=

{f �ij |f �ij(y) =

∞∑

t=1

β�ijtκ

�i(y, yi(t−�))

}(5)

where κ�i(y, ψ) : R × R → R is a preselected basis (so-termedkernel) function that measures the similarity between y andψ. Different choices of κ�i specify their own basis expansionspaces, and the linear functions can be regarded as a specialcase associated with the linear kernel κ�i(y, ψ) = yψ. An al-ternative popular kernel is the Gaussian one that is given byκ�i(y, ψ) := exp[−(y − ψ)2/(2σ2)]. Defining the inner prod-uct as 〈κ�i(y, ψ1), κ

�i(y, ψ2)〉 :=

∑τ κ

�i(yτ , ψ1)κ

�i(yτ , ψ2), a

kernel is reproducing if it satisfies 〈κ�i(y, ψ1), κ�i(y, ψ2)〉 =

κ�i(ψ1, ψ2), which induces the RKHS norm ‖f �ij‖2H�i=

∑τ

∑τ ′ β�

ijτβ�ijτ ′κ�i(yiτ , yiτ ′) [50].

Considering the measurements per node j, with functionsf �ij ∈ Hl

i, for i = 1, . . . , N and � = 0, 1, . . . , L, the presentpaper advocates the following regularized least-squares (LS)estimates of the aforementioned functions obtained as

{f �ij} = arg min{f�

ij∈H�i}1

2

T∑

t=1

[yjt −

i�=j

a0ijf0ij(yit)

−N∑

i=1

L∑

�=1

a�ijf�ij(yit)

]2+ λ

N∑

i=1

L∑

�=0

Ω(‖a�ijf �ij‖H�) (6)

where Ω(.) denotes a regularizing function, which will be spec-ified later. An important result that will be used in the followingis the representer theorem [20, p. 169], according to which the

Page 4: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

5328 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 20, OCTOBER 15, 2019

optimal solution for each f �ij in (6) is given by

f �ij(y) =T∑

t=1

β�ijtκ

�i(y, yi(t−�)). (7)

Although the function spaces in (5) include infinite basis ex-pansions, since the given data are finite, namely T per node,the optimal solution in (7) entails a finite basis expansion.Substituting (7) into (6), and letting β�

ij := [β�ij1, . . . , β

�ijT ]

�,

and α�ij := a�ijβ

�ij , the functional minimization in (6) boils

down to optimizing over vectors {α�ij}. Specifically, (6) can

be equivalently written for each j in vector form as

{α�ij} = arg min

{α�ij}

1

2

∥∥∥∥yj −∑

i�=j

K0iα

0ij −

N∑

i=1

L∑

�=1

K�iα

�ij

∥∥∥∥2

2

+ λ

N∑

i=1

L∑

�=0

Ω

(√(α�

ij)�K�

iα�ij

)(8)

where yj := [yj1, . . . , yjT ]�, and the T × T matrices {K�

i}have entries [K�

i ]t,τ = κ�i(yit, yi(τ−�)). Furthermore, collectingall the observations at different nodes in Y := [y1, . . . ,yN ] ∈RT×N and letting K� := [K�

1 . . .K�N ], (8) can be written as

{α�ij} = arg min

α0ii=0,{α�

ij}1

2

∥∥∥∥Y −L∑

l=1

K�W�α

∥∥∥∥2

F

+ λ

N∑

j=1

N∑

i=1

L∑

�=0

Ω

(√(α�

ij)�K�

iα�ij

)(9)

where the NT ×N block matrix

W�α :=

⎢⎣α�

11 · · · α�1N

.... . .

...α�

N1 · · · α�NN

⎥⎦ (10)

exhibits a structure ‘modulated’ by the entries of A�. Forinstance, if a�ij = 0, then α�

ij := a�ijβ�ij is an all-zero block,

irrespective of the values taken by β�ij .

Instead of the LS cost used in (6) and (9), alternative lossfunctions could be employed to promote robustness using theε-insensitive, or, the �1-error norm; see e.g., [20, Ch. 12].Regarding the regularizing function Ω(.), typical choices areΩ(z) = |z|, or, Ω(z) = z2. The former is known to promotesparsity of edges, which is prevalent to most networks; seee.g., [39]. In principle, leveraging such prior knowledge natu-rally leads to more efficient topology estimators, since {A�} arepromoted to have only a few nonzero entries. The sparse natureof A� manifests itself as block sparsity in W�

α. Specifically,using Ω(z) = |z|, one obtains the following estimator of thecoefficient vectors {α�

ij}

{α�ij} = arg min

α0ii=0,{α�

ij}

1

2

∥∥∥∥Y −L∑

l=1

K�W�α

∥∥∥∥2

F

+ λL∑

�=0

N∑

j=1

N∑

i=1

√(α�

ij

)�K�

iα�ij . (11)

Recognizing that summands in the regularization term of (11)

can be written as√

(α�ij)

�K�iα

�ij = ‖(K�

i)1/2α�

ij‖2, which

is the weighted �2-norm of αi,j , the entire regularizer canhenceforth be regarded as the weighted �2,1-norm of W�

α, thatis known to be useful for promoting block sparsity. It is clearthat (11) is a strongly convex problem, which admits a globallyoptimal solution. In fact, the problem structure of (11) lends itselfnaturally to efficient iterative proximal optimization methodse.g., proximal gradient descent iterations [5, Ch. 7], or, thealternating direction method of multipliers (ADMM) [41].

For a more detailed description of algorithmic approachesadopted to unveil the hidden topology by solving (11), the readeris referred to Appendix A. All in all, Algorithm 1 is a summaryof the novel iterative solver of (11) derived based on ADMMiterations. Per iteration, the complexity of ADMM is in the orderof O(T 2NL), which is linear in the network size N . A coupleof remarks are now in order.

Remark 1: Selecting Ω(z) = z2 is known to controlmodel complexity, and thus prevent overfitting [20, Ch. 3].Let D� := Bdiag(K�

1 . . .K�N ), and D := Bdiag(D0 . . .DL),

where Bdiag(.) is a block diagonal of its matrix arguments.Substituting Ω(z) = z2 into (9), one obtains

{α�ij} = arg min

α0ii=0,{α�

ij}

1

2

∥∥∥∥Y − KWα

∥∥∥∥2

F

+ λ trace(W�αDWα) (12)

where K := [K0 . . . KL], and Wα := [(W0α)

� . . . (WLα)

�]�.Problem (12) is convex and can be solved in closed form as

ˆαj =(K�

j Kj + 2Dj

)−1K�

j yj (13)

where αj denotes the (NL− 1)T × 1 vector obtained afterremoving entries of the j-th column of Wα indexed by Ij :={(j − 1)T + 1, . . . , jT}; Kj collects columns of K excludingthe columns indexed by Ij ; and the block-diagonal matrix Dj

is obtained after eliminating rows and columns of D indexed byIj . Using the matrix inversion lemma, the complexity of solving(13) is in the order of O(T 3NL).

Remark 2: Relying on an operator kernel (OK), the approachin [31] offers a more general nonlinear VARM (but not SVARM)than the one adopted here. However, [31] did not account forinstantaneous or the multiple-lagged effects. Meanwhile, esti-mating f(yt−1) in [31] does not scale well as the size of thenetwork (N ) increases. Also OK-VARM is approximated in [31]using the Jacobian, which again adds to the complexity of the al-gorithm, and may degrade the generality of the proposed model.Finally, the model in [31] is limited in its ability to incorporatethe structure of the network (e.g., edge sparsity). In order toincorporate prior information on the model structure, [31] endsup solving a nonconvex problem, which might experience localminima, and the flexibility in choosing kernel functions will alsobe sacrificed. In contrast, our approach entails a natural extensionto a data-driven kernel selection, which will be outlined in thenext section.

Remark 3: Estimation of a nonlinear function generally re-quires a large number of samples (T ), consequently incurringincreased complexity. Clearly, the proposed method is prone

Page 5: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

SHEN et al.: NONLINEAR STRUCTURAL VECTOR AUTOREGRESSIVE MODELS WITH APPLICATION TO DIRECTED BRAIN NETWORKS 5329

Fig. 1. (left) A simple illustration of a 5-node brain network; and (right) a set of five neuronal time series (e.g., ECoG voltage) each associated with a node.Per interval t, SVARMs postulate that directed dependencies between the 5 nodal time series may be due to both the instantaneous effects (blue links), and/ortime-lagged effects (red links). Estimating the values of the unknown coefficients amounts to learning the directed (link) structure of the network.

to scalability issues when dealing with even moderately-sizednetworks. This motivates solving the kernel-based optimizationproblem “on the fly,”; see also [40], [46] for recent examples.However, deriving such efficient and real-time solvers is beyondthe scope of the present paper, whose focus is the novel nonlinearmodeling framework. Only batch algorithms have been pre-sented, but pursuit of more efficient online algorithms constitutesan important future direction.

Remark 4: Note that the sparsity prior in the present workis considered separately for each A�. This results in the blocksparsity of W�

α, see (10). This can be easily adapt to the casewhere more prior information is available with respect to thestructure of the network. For example, in case when {A�} sharethe same sparsity pattern. One could stack the weight matricesin (10), and impose group sparse structure on the correspondingblocks of {W�

α}. Such adaptation will not influence the convex-ity of the problem in (11), henceforth the resulting problem canstill be solved to the global optimum.

Remark 5: Stability of the generative KSVARM depends onthe matrices {A�}, the noise level, the initial condition, as wellas the choice of the kernel functions. It is an interesting futureresearch direction, but goes beyond the scope of the presentpaper.

V. DATA-DRIVEN KERNEL SELECTION

Choice of the kernel function determines the associatedHilbert space, and it is therefore of significant importance inestimating the nonlinear functions {f �ij}. Although Section IVassumed that the kernels {κ�i} are available, this is not the casein general, and this section advocates a data-driven strategyfor selecting them. Given a dictionary of reproducing kernels{κp}Pp=1, it has been shown that any function in the convex

hull K := {κ|κ =∑P

p=1 θpκp, θp ≥ 0,∑P

p=1 θp = 1} is a re-producing kernel [35], [45]. Therefore, the goal of the presentsection is to select a kernel from K that best fits the data. Forease of exposition, consider κ�i = κ ∈ K, for all � = 0, 1, . . . , L

and i = 1, . . . , N in (6), therefore H�i = H(κ). Note that the

formulation can be readily extended to settings when {κ�i} aredifferent. Incorporating κ as a variable function in (6) yields

{f �ij} = arg minκ∈K,{f�

ij∈H(κ)}1

2

T∑

t=1

[yjt −

i�=j

a0ijf0ij(yit)

−N∑

i=1

L∑

�=1

a�ijf�ij(yit)

]2+ λ

N∑

i=1

L∑

�=0

Ω(‖a�ijf �ij‖H(κ))

(14)

where H(κ) denotes the Hilbert space associated with kernelfunction κ. With Hp denoting the RKHS induced by κp, it hasbeen shown in [4] and [35] that the optimal {f �ij} in (14) isexpressible in a separable form as

f �ij(y) :=P∑

p=1

f �,pij (y) (15)

where f �,pij belongs to RKHSHp, for p = 1, . . . , P . Substituting(15) into (14), one obtains

{f �ij} = arg min{f�,p

ij ∈Hp}

1

2

T∑

t=1

[yjt −

i�=j

P∑

p=1

a0ijf0,pij (yit)

−N∑

i=1

L∑

�=1

P∑

p=1

a�ijf�,pij (yit)

]2+ λ

N∑

i=1

L∑

�=0

P∑

p=1

Ω(‖a�ijf �,pij ‖Hp).

(16)

Note that (16) and (6) have similar structure, and their onlydifference pertains to an extra summation over P candidatekernels. Hence, (16) can be solved in an efficient manner alongthe lines of the iterative solver of (6) listed under Algorithm 1[cf. the discussion in Section IV]. Further details of the solutionare omitted due to space limitations.

Page 6: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

5330 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 20, OCTOBER 15, 2019

Fig. 2. Plot of EIER vs. measurement ratio (T/N), with simulated datagenerated via a polynomial kernel of order P = 2. Note that K-SVARMsconsistently outperform LSVARMs.

Remark 6: Note that {θp} does not show up in the optimiza-tion problem in (16), since all the coefficients can be readily ab-sorbed into the nonlinear functions without affecting optimality.This is a consequence of the property that given any functionbelonging to a prescribed RKHS, all its scaled versions belongto the same RKHS [c.f. (5)]; see also [35] for a detailed proof.

VI. NUMERICAL TESTS

This section presents results from numerical tests conductedon both synthetic and real data to corroborate the effectivenessof the proposed approach. Simulated data were generated via adifferent model in order to assess the impact of the presence ofnonlinear dependencies. Tests on real data were based on seizureexperiments captured from a number of subjects.

A. Synthetic Data Tests

Data generation: Setting L = 1, synthetic data were gener-ated via a random 20-node (N = 20) Erdos-Rényi graph, withedge probability 0.4. The resulting graph was encoded as abinary 20× 20 adjacency matrix. Using this graph, simulateddata were generated via both linear and nonlinear models. Afterdrawing vectors yn,1 ∼ N (0, 1) and matrices {Al}, the outputdata {yt}Tt=2 were generated recursively. Furthermore, matrices{K�

m} were generated using prescribed kernels, that is, entry(i, j) of Km was set to [K�

t ]ij = k(yit, yjt), where the kernelfunction k(·, ·) is known a priori. Entries of coefficient vectorsαij ∈ RT were drawn independently fromN (0, 1), while noiseterms were generated i.i.d. as ei1 ∼ N (0, σ2

e).Experiments were run for different values of T , with the edge

detection threshold τ selected in each setting to obtain the lowestedge identification error rate (EIER), defined as

EIER :=‖A− A‖0N(N − 1)

× 100%. (17)

Fig. 3. Plot of EIER vs. (T/N) with data generated using a Gaussian kernelwith σ2 = 1; K-SVARMs uniformly lead to lower errors than linear LSVARMsover varying SNR levels, based on empirical observations.

Fig. 4. Plot of EIER vs. (T/N)with simulated data generated via a polynomialkernel of orderP = 2; it can be empirically observed that K-SVARMs uniformlyoutperform LSVARMs across varying edge densities.

The operator ‖ · ‖0 denotes the number of nonzero entries of itsargument. For all experiments, error plots were generated withvalues of EIER averaged over 100 independent runs.Test results: Figs. 2 and 3 depict EIER values plotted againstthe measurement ratio (T/N ) under varying signal-to-noiseratios (SNR := 10 log(Psignal/Pnoise), wherePsignal denotes thevariance of yj1) for polynomial and Gaussian kernels, respec-tively. The synthetic graph was generated with edge probabilityp = 0.3. Fig. 2 plots the EIER when data are generated by (4),using a polynomial kernel of order P = 2, and Fig. 3 plots theerror performance realized with data generated via a Gaussiankernel with bandwidthσ2 = 1. It is clear that adoption of nonlin-ear SVARMs yields markedly better performance than topologyinference approaches based on linear SVARMs, which corrobo-rates the effectiveness of the proposed algorithm in identifyingthe network topology when the dependencies among nodes are

Page 7: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

SHEN et al.: NONLINEAR STRUCTURAL VECTOR AUTOREGRESSIVE MODELS WITH APPLICATION TO DIRECTED BRAIN NETWORKS 5331

Fig. 5. Plots of actual and inferred adjacency matrices resulting from adopting linear and nonlinear SVARM with T = 40.

Fig. 6. ROC curves generated under different modeling assumptions: (a) K-SVARM based on a Gaussian kernel withσ2 = 1; (b) K-SVARM based on polynomialkernel of order P = 2; and (c) Linear SVARM.

nonlinear. It is also worth observing that as SNR decreases, theperformance of the proposed algorithm deteriorates slightly, butcan still yield much better performance than the linear approach.

Fig. 5 plots heatmaps of actual and inferred adjacency ma-trices, under varying modeling assumptions. Plots of inferredadjacency matrices are based on a single realization of T = 40samples, with white entries representing presence of an edge,that is, a�ij �= 0. As shown by the plots, accounting for nonlin-earities yields more accurate recovery of the unknown networktopology. Fig. 4 plots EIER values against the measurement ratio(T/N ) when the underlying Erdos-Rényi graphs are generatedusing several values of the edge probability p ∈ {0.1, 0.5, 0.9},with λ and η optimally chosen for each scenario. For each valueof p, the nonlinear approach outperforms the linear variant, sincenonlinear dependencies are accounted for.

In order to assess edge detection performance, receiver operat-ing characteristic (ROC) curves are plotted under different mod-eling assumptions in Figure 6. WithPD denoting the probabilityof detection, and PFA the probability of false alarms, each pointon the ROC corresponds to a pair (PFA, PD) for a prescribedthreshold. Fig. 6(a) results from tests run on data generatedby Gaussian kernels with σ2 = 1, while Fig. 6(b) correspondsto polynomial kernels of order P = 2. Using the area underthe curve (AUC) as the edge-detection performance criterion,Figs. 6(a) and (b) clearly emphasize the benefits of accounting

for nonlinearities. In both plots, kernel-based approaches resultin the higher AUC metrics than approaches resorting to linearSVARMs. Moreover, the kernel based SVARM outperformsthe kernel based VARM which does not take into account theinstantaneous effects.

Fig. 6(c) plots ROC curves based on linear and kernel basedSVARMs, with simulated data actually generated using a linearSVARM. The curves are parameterized by the sparsity-controlparameter λ. Not surprisingly, kernel-based SVARMs adoptingpolynomial kernels underperform the linear SVARM, due tothe inherent model mismatch. However, the kernel SVARMendowed with a multi-kernel learning scheme (MK-SVARM) isshown to attain comparable performance to the linear SVARMwhen the prescribed dictionary comprises both linear and poly-nomial kernels.

Fig. 7 plots the ROC curves based on linear and kernel basednonlinear SVARMs with L = 2. The trend exhibited by thesecurves is similar to those in Fig. 6.

B. Real Data Tests

This section presents test results on seizure data, capturedthrough experiments conducted in an epilepsy study [28].Epilepsy refers to a chronic neurological condition characterizedby recurrent seizures, globally afflicting over 20 million people,

Page 8: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

5332 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 20, OCTOBER 15, 2019

Fig. 7. ROC curves generated under different modeling assumptions: (a) K-SVARM based on a Gaussian kernel with σ2 = 1, L = 2; and (b) K-SVARM basedon a polynomial kernel of order P = 2, L = 2.

and often associated with abnormal neuronal activity within thebrain. Diagnosis of the condition sometimes involves comparingEEG or ECoG time series obtained from a patient’s brain beforeand after onset of a seizure. Recent studies have shown increas-ing interest in analysis of connectivity networks inferred fromthe neuronal time series, in order to gain more insights aboutthe unknown physiological mechanisms underlying epilepticseizures. In this section, connectivity networks are inferred fromthe seizure data using the novel approach, and a number ofcomparative measures are computed from the identified networktopologies.

C. Seizure Data Description

Seizure data were obtained for a 39-year-old female subjectwith a case of intractable epilepsy at the University of California,San Francisco (UCSF) Epilepsy Center; see also [28]. An 8× 8subdural electrode grid was implanted into the cortical surfaceof the subject’s brain, and two accompanying electrode strips,each comprising six electrodes (a.k.a., depth electrodes) wereimplanted deeper into the brain. Over a period of five days,the combined electrode network recorded 76 ECoG time series,consisting of voltage levels measured in a region within closeproximity of each electrode.

ECoG epochs containing eight seizures were extracted fromthe record and analyzed by a specialist. The time series at eachelectrode were first passed through a bandpass filter, with cut-offfrequencies of 1 and 50 Hz, and the so-termed ictal onset ofeach seizure was identified as follows. A board-certified neu-rophysiologist identified the initial manifestation of rhythmichigh-frequency, low-voltage focal activity, which characterizesthe onset of a seizure. Samples of data before and after thisseizure onset were then extracted from the ECoG time series.The per-electrode time series were then divided into 1s windows,with 0.5 s overlaps between consecutive windows, and theaverage spectral power between 5 Hz and 15 Hz was computedper window. Finally, power spectra over all electrodes were

averaged, and the ictal onset was identified by visual inspectionof a dramatic increase (by at least an order of magnitude) in theaverage power. Two temporal intervals of interest were pickedfor further analysis, namely, the preictal and ictal intervals.The preictal interval is defined as a 10 s interval precedingseizure onset, while the ictal interval comprises the 10 s im-mediately afterwards. Further details about data acquisition andpre-processing are provided in [28].

The goal here was to assess whether modeling nonlineari-ties, and adopting the novel kernel-based approach would yieldsignificant insights pertaining to causal/effective dependenciesbetween brain regions, that linear variants would otherwise failto capture. Toward this goal, several standard network analysismeasures were adopted to characterize the structural propertiesof the inferred networks.

D. Inferred Networks

Prior to running the developed algorithm, 10 s intervals werechosen from the preprocessed ECoG data with the sampling rateset to 400 Hz, and then divided into 20 successive segments, eachcomprising 200 data samples over a 0.5 s horizon. To illustratethis, suppose the 10 s interval starts from t = 0 s and ends att = 10 s, then the first segment comprises samples taken over theinterval [0 s, 0.5 s], the second one would be [0.5 s, 1 s], and soon. After this segmentation of the time series, directed networktopologies were inferred using Algorithm 1 with L = 1, basedon the 0.5 s segments, instead of the entire signal, to ensure thatthe signal is approximately stationary per experiment run. Adirected link from electrode i to j was drawn if at least one ofthe estimates of a�ij turned out to be nonzero.

Upon inference of the 20 networks the data pertaining toeach seizure, presence/absence of an edge is established via at-test withPFA = 0.1. Inference results were not averaged sincedifferent seizures may originate from disparate parts of the brain,leading to non-trivial differences between connectivity patternsamong electrodes.

Page 9: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

SHEN et al.: NONLINEAR STRUCTURAL VECTOR AUTOREGRESSIVE MODELS WITH APPLICATION TO DIRECTED BRAIN NETWORKS 5333

Fig. 8. Visualizations of 76-electrode networks inferred from ECoG data: (a) linear SVARM with L = 1 on preictal time series; (b) linear SVARM on ictal timeseries; (c) K-SVARM on preictal time series, using Gaussian kernel with σ = 1; (d) the same K-SVARM on ictal time series; (e) K-SVARM with kernel selectionon preictal time series; and finally (f) K-SVARM with kernel selection on ictal time series.

Networks inferred from the preictal and ictal intervals werecompared using linear, the kernel-based (K-)SVARMs, and K-SVARM with data-driven kernel selection. The lag lengths wereset to L = 1 for all cases. For K-SVARM, a Gaussian kernelwith σ = 1 was selected, and with ρ = 10, Algorithm 1 withregularization parameterλwas selected via cross-validation. Forthe data-driven kernel selection scheme, two candidate kernelswere employed, namely, a linear kernel, and a polynomial kernelof order 2.

Fig. 8 depicts networks inferred from different algorithmsfor both preictal and ictal intervals of the time series. Thefigure illustrates results obtained by the linear SVARM, and theK-SVARM approach with and without kernel selection. Eachnode in the network is representative of an electrode, and itis depicted as a circle, while the node arrangement is forced toremain consistent across the six visual representations. A cursoryinspection of the visual maps reveals significant variations inconnectivity patterns between ictal and preictal intervals forboth models. Specifically, networks inferred via the K-SVARMs,reveal a global decrease in the number of links emanating fromeach node, while those inferred via the linear model depictincreases and decreases in links connected to different nodes.Interestingly, the K-SVARM with kernel selection recoveredmost of the edges inferred by the linear and the K-SVARM using

a Gaussian kernel, which implies that both linear and nonlinearinteractions may exist in brain networks. Moreover, networktopologies inferred via K-SVARM with kernel selection aremore similar to those obtained using a single kernel than a linearSVARM, implying that the simple linear model is insufficientto capture the network topology in complex brain networks.However, one is unlikely to gain much insight only by visualinspection of the network topologies. Moreover, to further ana-lyze differences between inferred networks from both models,and to assess the potential benefits gained by adopting the novelscheme, several network topology metrics are computed andcompared in the next subsection.

E. Comparison of Network Metrics

First, in- and out-degree was computed for nodes in each ofthe inferred networks. Note that the in-degree of a node countsits number of incoming edges, while the out-degree counts thenumber of out-going edges. The total degree per node sums thein- and out-degrees, and is indicative of how well-connected agiven node is. Figure 9 depicts nodes in the network and theirtotal degrees encoded by the radii of circles associated with thenodes. As expected from the previous subsection, Figs. 9(a) and(b) demonstrate that the linear SVARM yields both increases

Page 10: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

5334 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 20, OCTOBER 15, 2019

Fig. 9. Node degrees of networks inferred from ECoG data encoded by circle radii: (a) linear SVARM on preictal data; (b) linear SVARM on ictal data;(c) K-SVARM on preictal time series; (d) K-SVARM on ictal data; (e) MKL-SVARM on preictal time series; (f) MKL-SVARM on ictal time series.

TABLE ICHANGES OF EDGES AND NODAL DEGREES FROM PREICTAL TO ICTAL STAGES

USING NETWORKS INFERRED FROM ECOG SEIZURE DATA USING THE LINEAR,K-SVARM, AND K-SVARM WITH KERNEL SELECTION SCHEMES ↑:

APPEAR/INCREASE, ↓: DISAPPEAR/DECREASE

and deceases in the inferred node degree. On the other hand,the nonlinear SVARM leads to a more spatially consistent ob-servation with most nodes exhibiting a smaller degree after theonset of a seizure (see Figs. 9(c) and (d)), which may implythat directed dependencies thin out between regions of the brainonce a seizure starts, the same trend is also revealed by theK-SVARM with kernel selection (see Figures 9(e) and (f)); seealso Table I for the number of disappearing and appearing edges,as well as the number of nodes for which the degree increases ordecreases.

In order to assess the reachability of brain regions beforeand after seizure onset, comparisons of the so-termed averageshortest path lengths were done. Average shortest path of a nodecomputes the average length of shortest paths between the given

node and all other nodes; see e.g., [27] for more details. Theper-node average shortest path length for each inferred networkis depicted in Fig. 10, with node radii similarly encoding thecomputed values. Little variation between preictal and ictal av-erage shortest path length is seen for the linear model (Figs. 10(a)and (b)), while variations are more marked for the K-SVARM,see Figs. 10(c-f). It can be seen that modeling nonlinearitiesreveals subtle changes in reachability of nodes between preictaland ictal phases.

Fig. 11 depicts the closeness centrality computed per node inthe inferred networks. Closeness centrality measures how reach-able a node is from all other nodes, and is generally defined asthe reciprocal of the sum of geodesic distances of the node fromall other nodes in the network; see also [27]. Once again, Fig. 11depicts a more general decrease in closeness centralities afterseizure onset in networks inferred by the nonlinear SVARM, ascompared to the linear variant. This empirical result indicates achange in reachability between regions of the brain during anepileptic seizure.

In addition to the local metrics, a number of global measureswere computed over entire inferred networks, and pertinentcomparisons were drawn between the two phases; see Table IIfor a summary of the average global measures of the inferrednetworks from 8 different seizures. Several global metrics wereconsidered, e.g., network density, global clustering coefficient,

Page 11: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

SHEN et al.: NONLINEAR STRUCTURAL VECTOR AUTOREGRESSIVE MODELS WITH APPLICATION TO DIRECTED BRAIN NETWORKS 5335

Fig. 10. Same as in Figure 9 for comparison based on average shortest path length of inferred graphs.

TABLE IICOMPARISON OF GLOBAL METRICS ASSOCIATED WITH NETWORKS INFERRED FROM ECOG SEIZURE DATA USING THE LINEAR, K-SVARM, AND K-SVARMWITH KERNEL SELECTION SCHEME. MAJOR DIFFERENCES BETWEEN THE COMPUTED METRICS INDICATE THAT ONE MAY GAIN INSIGHTS FROM NETWORK

TOPOLOGIES INFERRED VIA MODELS THAT CAPTURE NONLINEAR DEPENDENCIES

network diameter, and average number of neighbors.These met-rics are obtained by averaging the metrics of networks inferredfrom 8 seizures.

Network density refers to the number of actual edges dividedby the number of potential edges, while the global clustering co-efficient is the fraction of connected triplets that form triangles,adjusted by a factor of three to compensate for double counting.On the other hand, network diameter is the length of the longestgeodesic, excluding infinity. Table II shows that networks in-ferred via the K-SVARMs and MKL-SVARMs exhibit lowernetwork cohesion after seizure onset, as captured by networkdensity, global clustering coefficient, and average number ofneighbors, while the network diameter increases. These changesprovide empirical evidence that the brain network becomes less

connected, and diffusion of information is inhibited after theonset of an epileptic seizure.

In addition, note that we do not have access to the ground truthtopologies for the brain networks to evaluate the effectivenessof the nonlinear methods. In order to verify the benefit of thenonlinear method, we further carried out classification on theobtained adjacency matrix as well as the nodal feature vectors,including degree, average shortest path length, as well as clus-tering coefficients of the network nodes. We adopted SVM with5-fold cross-validation. It can been observed from Table III thatextracting nonlinear dependencies can improve the accuracy ofclassification between ictal and pre-ictal data. Therefore, thenovel nonlinear model can help improve the accuracy of diseasediagnosis.

Page 12: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

5336 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 20, OCTOBER 15, 2019

Fig. 11. Same as in Figure 9 for comparison based on closeness centrality of inferred graphs.

TABLE IIICOMPARISON OF CLASSIFICATION ACCURACY (%) USING NETWORKS

INFERRED FROM ECOG SEIZURE DATA USING THE LINEAR, K-SVARM, AND

K-SVARM WITH KERNEL SELECTION SCHEME

VII. CONCLUSIONS

This paper put forth a novel nonlinear SVARM frameworkthat leverages kernels to infer effective connectivity networksin the brain. Postulating a generalized additive model withunknown functions to capture the hidden network structure, anovel regularized LS estimator that promotes sparse solutionswas advocated. In order to solve the ensuing convex optimizationproblem, an efficient algorithm that resorts to ADMM iterationswas developed, and a data-driven approach was introduced toselect the appropriate kernel. Extensive numerical tests wereconducted on ECoG seizure data from a study on epilepsy.

In order to assess the utility of the novel approach, sev-eral local and global metrics were adopted and computed onnetworks inferred before and after the onset of a seizure. Byobserving changes in network behavior that are revealed bystandard metrics before and after seizure onset, it is possible

identify key structural differences that may be critical to ex-plain the mysteries of epileptic seizures. With this in mind,the paper focused on identifying structural differences in thebrain network that could not be captured by the simpler linearmodel. Interestingly, empirical results support adoption of anonlinear modeling perspective when analyzing differences ineffective brain connectivity for epilepsy patients. Specifically,adopting the novel kernel-based approach revealed more sig-nificant differences between the preictal and ictal phases ofECoG time series. For instance, it turned out that some regionsexhibited fewer dependencies, reduced reachability, and weak-ened information-routing capabilities after the onset of a seizure.Since the kernel-based model includes the linear SVARM as aninstance, the conducted experiments suggest that one may gainmore insights by adopting the nonlinear model, a conclusionthat may yield informative benefits to studies of epilepsy thatleverage network science.

This work paves the way for a number of exciting researchdirections in analysis of brain networks. Although it has beenassumed that inferred networks are static, overwhelming evi-dence suggests that topologies of brain networks are dynamic,and may change over rather short time horizons. Future studieswill extend this work to facilitate tracking of dynamic brainnetworks. Furthermore, the novel approach will be empiricallytested on a wider range of neurological illnesses and disorders,and pertinent comparisons will be done to assess the merits ofadopting the advocated nonlinear modeling approach.

Page 13: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

SHEN et al.: NONLINEAR STRUCTURAL VECTOR AUTOREGRESSIVE MODELS WITH APPLICATION TO DIRECTED BRAIN NETWORKS 5337

APPENDIX

TOPOLOGY INFERENCE VIA ADMM

Algorithm 1: ADMM for network topology identification.

1: Input: Y, {{K�i}Ni=1}L�=1, τα, λ, ρ

2: Initialize: Γ[0] = 0NT×N , Ξ[0] = 0NT×N , k = 03: for � = 1, . . . , L do4: D�=Bdiag(K�

1, . . . ,K�N )

5: K� = [K�1 . . .K

�N ]

6: end for7: K := [K0 . . . KL], D=Bdiag(D0 . . .DL)8: for j = 1, . . . , N do9: Ij := {(j − 1)T + 1, . . . , jT}

10: Ij := {(k, l)|k /∈ Ij , or l /∈ Ij}11: Ij := {(k, l)|l /∈ Ij}12: Dj = [D]Ij , Kj =

[K]Ij

13: end for14: while not converged do15: for j = 1, . . . , N (in parallel) do16: qj [k] = ρD

1/2j γj [k] + K�

j yj −D1/2j ξj [k]

17: αj [k + 1] = (K�j Kj + ρDj)

−1qj [k]

18: γ�ij [k] = Pλ/ρ((K�

i)1/2α�

ij [k + 1] + ξ�ij [k]/ρ),19: for i = 1, . . . N , � = 0, . . . L20: end for21: Wα[k + 1] :=

[(W0α)

�[k + 1], . . . , (WLα)

�[k + 1]]�

22: Γ[k + 1] := [(Γ0[k + 1])�, . . . , (ΓL[k + 1])�]�

23: Ξ[k + 1] = Ξ[k] + ρ(D1/2Wα[k + 1]− Γ[k + 1])24: k = k + 125: end while26: Edge identification:(after converging to α∗

ij)27: a∗ij �= 0 if ‖α∗

ij‖ ≥ τα, else a∗ij = 0, ∀ (i, j, �)

28: return {A�}L�=0

Given matrices Y and K := [K0 . . . KL], this section capi-talizes on convexity, and the nature of the additive terms in (11)to develop an efficient topology inference algorithm. Proximaloptimization approaches have recently been shown useful forconvex optimization when the cost function comprises the sumof smooth and nonsmooth terms; see e.g., [11]. Prominentamong these approaches is the alternating direction method ofmultipliers (ADMM), upon which the novel algorithm is based;see e.g., [41] for an early application of ADMM to distributedestimation.

For ease of exposition, let the equality constraints (α�jj = 0)

temporarily remain implicit. Introducing the change of variablesγ�ij := (K�

i)1/2α�

ij , problem (11) can be recast as

arg min{α�

ij}(1/2)‖Y −

L∑

�=0

K�W�α‖2F +

L∑

�=0

g(Γ�)

s.t. γ�ij − (K�

i)1/2α�

ij = 0 ∀i, j, � (18)

where Γ� := [γ�1 . . .γ

�N ], γ�

j := [(γ�1j)

� . . . (γ�Nj)

�]�, and

g(Γ�) := λ∑N

i=1

∑Nj=1 ‖γ�

ij‖2 is the nonsmooth regular-izer. Let D�:=Bdiag(K�

1 . . .K�N ), and D:=Bdiag(D0 . . .D�),

where Bdiag(.) is a block diagonal of its matrix arguments. Onecan then write the augmented Lagrangian of (18) as

Lρ(Wα,Γ,Ξ) = (1/2)‖Y − KWα‖2F + g(Γ)

+ 〈Ξ,D1/2Wα − Γ〉+ (ρ/2)‖Γ−D1/2Wα‖2F (19)

where Wα := [(W0α)

� . . . (WLα)

�]�, and Γ := [(Γ0)� . . .(ΓL)�]�. Note that Ξ is a matrix of dual variables that collectsLagrange multipliers corresponding to the equality constraintsin (18), 〈P,Q〉 denotes the inner product between P and Q,while ρ > 0 a prescribed penalty parameter. ADMM boils downto a sequence of alternating minimization iterations to minimizeLρ(Wα,Γ,Ξ) over the primal variables Wα, and Γ, followedby a gradient ascent step over the dual variables Ξ; see also [2],[41]. Per iteration k + 1, this entails the following provably-convergent steps, see e.g. [41]

Wα[k + 1] = argminWα

Lρ(Wα,Γ[k],Ξ[k]) (20a)

Γ[k + 1] = argminΓ

Lρ(Wα[k + 1],Γ,Ξ[k]) (20b)

Ξ[k + 1] = Ξ[k] + ρ(D1/2Wα[k + 1]− Γ[k + 1]). (20c)

Focusing on Wα[k + 1], note that (20a) decouples acrosscolumns of Wα, and admits closed-form, parallelizable so-lutions. Incorporating the structural constraint α0

jj = 0, oneobtains the following decoupled subproblem per column j

αj [k + 1]

= argminαj

(1/2)α�j

(K�

j Kj + ρDj

)αj − α�

j qj [k] (21)

where qj [k] is constructed by removal of entries indexed byIj from ρD1/2γj [k] + K�yj −D1/2ξj [k], with ξj [k] denot-ing the j-th column of Ξ[k]. Assuming (K�

j Kj + ρDj) isinvertible, the per-column subproblem (21) admits the followingclosed-form solution per j

αj [k + 1] =(K�

j Kj + ρDj

)−1qj [k]. (22)

On the other hand, (20b) can be solved per component vectorγ�ij , and a closed-form solution can be obtained via the so-termed

block shrinkage operator for each i and j, namely,

γ�ij [k] = Pλ/ρ

((K�

i)1/2α�

ij [k + 1] + ξ�ij [k]/ρ)

(23)

where Pλ(z) := (z/‖z‖2)max(‖z‖2 − λ, 0). Upon conver-gence, {a�ij} can be determined by thresholding α�

ij , and declar-

ing an edge present from i to j, if there exists any α�ij �= 0, for

� = 1, . . . , L.

REFERENCES

[1] D. Angelosante and G. B. Giannakis, “Sparse graphical modeling ofpiecewise-stationary time series,” in Proc. Int. Conf. Acoust., Speech,Signal Process., Prague, Czech Republic, May 2011, pp. 1960–1963.

Page 14: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

5338 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 67, NO. 20, OCTOBER 15, 2019

[2] B. Baingana, G. Mateos, and G. B. Giannakis, “Dynamic structural equa-tion models for tracking topologies of social networks,” in Proc. Work-shop Comput. Advances Multi-Sensor Adaptive Process., Saint Martin,Dec. 2013, pp. 292–295.

[3] B. Baingana, G. Mateos, and G. B. Giannakis, “Proximal-gradient algo-rithms for tracking cascades over social networks,” IEEE J. Sel. TopicsSignal Process., vol. 8, no. 4, pp. 563–575, Aug. 2014.

[4] J. A. Bazerque and G. B. Giannakis, “Nonparametric basis pursuit viasparse kernel-based learning: A unifying view with advances in blindmethods,” IEEE Signal Process. Mag., vol. 30, no. 4, pp. 112–125,Jul. 2013.

[5] D. P. Bertsekas, Nonlinear Programming, 2nd ed. Belmont, MA, USA:Athena Scientific, 1999.

[6] H. Bozdogan, “Model selection and Akaike’s information criterion (AIC):The general theory and its analytical extensions,” Psychometrika, vol. 52,no. 3, pp. 345–370, Sep. 1987.

[7] C. Büchel, and K. J. Friston, “Modulation of connectivity in visualpathways by attention: Cortical interactions evaluated with structuralequation modelling and fMRI.” Cerebral Cortex, vol. 7, no. 8, pp. 768–778,Dec. 1997.

[8] X. Cai, J. A. Bazerque, and G. B. Giannakis, “Inference of gene regulatorynetworks with sparse structural equation models exploiting genetic pertur-bations,” PLoS Comp. Biol., vol. 9, no. 5, May 2013, Art. no. e1003068.

[9] G. Chen et al., “Vector autoregression, structural equation modeling, andtheir synthesis in neuroimaging data analysis,” Comput. Biol. Med., vol. 41,no. 12, pp. 1142–1155, Dec. 2011.

[10] S. Chen and P. Gopalakrishnan, “Speaker, environment and channel changedetection and clustering via the Bayesian information criterion,” in Proc.DARPA Broadcast News Transcription Understanding Workshop, Vir-ginia, USA, Aug. 1998, pp. 127–132.

[11] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholdingalgorithm for linear inverse problems with a sparsity constraint,” Comm.Pure Appl. Math., vol. 57, no. 11, pp. 1413–1457, Aug. 2004.

[12] G. Deshpande, X. Hu, R. Stilla, and K. Sathian, “Effective connectivityduring haptic perception: A study using Granger causality analysis offunctional magnetic resonance imaging data,” NeuroImage, vol. 40, no. 4,pp. 1807–1814, May 2008.

[13] K. J. Friston, C. D. Frith, and R. S. Frackowiak, “Time-dependent changesin effective connectivity measured with PET,” Human Brain Mapping,vol. 1, no. 1, pp. 69–79, Jan. 1993.

[14] K. J. Friston, “Functional and effective connectivity in neuroimaging: Asynthesis,” Human Brain Map., vol. 2, no. 1/2, pp. 56–78, Jan. 1994.

[15] K. J. Friston, L. Harrison, and W. Penny, “Dynamic causal modelling,”NeuroImage, vol. 19, no. 4, pp. 1273–1302, Aug. 2003.

[16] G. B. Giannakis, Y. Shen, and G. V. Karanikolas, “Topology identificationand learning over graphs: Accounting for nonlinearities and dynamics,”Proc. IEEE, vol. 106, no. 5, pp. 787–807, May 2018.

[17] D. R. Gitelman, W. D. Penny, J. Ashburner, and K. J. Friston, “Modelingregional and psychophysiologic interactions in fMRI: The importance ofhemodynamic deconvolution,” NeuroImage, vol. 19, no. 1, pp. 200–207,May 2003.

[18] R. Goebel, A. Roebroeck, D.-S. Kim, and E. Formisano, “Investigatingdirected cortical interactions in time-resolved fMRI data using vectorautoregressive modeling and Granger causality mapping,” Magn. Reson.Imag., vol. 21, no. 10, pp. 1251–1261, Dec. 2003.

[19] J. R. Harring, B. A. Weiss, and J.-C. Hsu, “A comparison of methodsfor estimating quadratic effects in nonlinear structural equation models.”Psychological Methods, vol. 17, no. 2, pp. 193–214, Jun. 2012.

[20] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of StatisticalLearning. Berlin, Germany: Springer, 2009.

[21] S. Haufe, R. Tomioka, G. Nolte, K.-R. Müller, and M. Kawanabe, “Model-ing sparse connectivity between underlying brain sources for EEG/MEG,”IEEE Trans. Biomed. Eng., vol. 57, no. 8, pp. 1954–1963, May 2010.

[22] S. Itani, M. Ohannessian, K. Sachs, G. P. Nolan, and M. A. Dahleh, “Struc-ture learning in causal cyclic networks,” in Proc. Workshop Causality:Objectives Assessment, 2010, pp. 165–176.

[23] X. Jiang, S. Mahadevan, and A. Urbina, “Bayesian nonlinear structuralequation modeling for hierarchical validation of dynamical systems,”Mech. Syst. Signal Process., vol. 24, no. 4, pp. 957–975, Apr. 2010.

[24] K. G. Jöreskog, F. Yang, G. Marcoulides, and R. Schumacker, “Nonlin-ear structural equation models: The Kenny-Judd model with interactioneffects,” Advanced Structural Equation Modeling: Issues and Techniques.London, U.K.: Psychology Press, Jan. 1996, pp. 57–88.

[25] D. Kaplan, Structural Equation Modeling: Foundations and Extensions.Newbury Park, CA, USA: Sage, 2009.

[26] A. Kelava, B. Nagengast, and H. Brandt, “A nonlinear structural equationmixture modeling approach for nonnormally distributed latent predictorvariables,” Structural Equ. Modeling: A Multidisciplinary J., vol. 21, no. 3,pp. 468–481, Jun. 2014.

[27] E. D. Kolaczyk, Statistical Analysis of Network Data: Methods and Mod-els. Berlin, Germany: Springer, 2009.

[28] M. A. Kramer, E. D. Kolaczyk, and H. E. Kirsch, “Emergent networktopology at seizure onset in humans,” Epilepsy Res., vol. 79, no. 2,pp. 173–186, May 2008.

[29] S.-Y. Lee and X.-Y. Song, “Model comparison of nonlinear structuralequation models with fixed covariates,” Psychometrika, vol. 68, no. 1,pp. 27–47, Mar. 2003.

[30] S.-Y. Lee, X.-Y. Song, and J. C. Lee, “Maximum likelihood estimationof nonlinear structural equation models with ignorable missing data,” J.Educ. Behavioral Statist., vol. 28, no. 2, pp. 111–134, Jun. 2003.

[31] N. Lim, F. d’Alché Buc, C. Auliac, and G. Michailidis, “Operator-valuedkernel-based vector autoregressive models for network inference,” Mach.Learn., vol. 99, no. 3, pp. 489–513, Jun. 2015.

[32] D. Marinazzo, M. Pellicoro, and S. Stramaglia, “Kernel-Granger causalityand the analysis of dynamical networks,” Physical Rev. E, vol. 77, no. 5,May 2008, Art. no. 056215.

[33] D. Marinazzo, M. Pellicoro, and S. Stramaglia, “Kernel method for non-linear Granger causality,” Phys. Rev. Lett., vol. 100, no. 14, Apr. 2008, Art.no. 144103.

[34] A. Mclntosh and F. Gonzalez-Lima, “Structural equation modeling andits application to network analysis in functional brain imaging,” HumanBrain Mapping, vol. 2, no. 1/2, pp. 2–22, Oct. 1994.

[35] C. A. Micchelli and M. Pontil, “Learning the kernel function via regular-ization,” J. Mach. Learn. Res., vol. 6, no. Jul, pp. 1099–1125, Jul. 2005.

[36] J. Peters, D. Janzing, and B. Schölkopf, “Causal inference on time seriesusing restricted structural equation models,” in Proc. Advances Neural Inf.Process. Syst., 2013, pp. 154–162.

[37] R. Roche and S. Commins, Pioneering Studies in Cognitive Neuroscience.New York, NY, USA: McGraw-Hill Education, 2009.

[38] A. Roebroeck, E. Formisano, and R. Goebel, “Mapping directed influenceover the brain using Granger causality and fMRI,” NeuroImage, vol. 25,no. 1, pp. 230–242, Mar. 2005.

[39] M. Rubinov and O. Sporns, “Complex network measures of brainconnectivity: Uses and interpretations,” NeuroImage, vol. 52, no. 3,pp. 1059–1069, Sep. 2010.

[40] D. Sahoo, S. C. Hoi, and B. Li, “Online multiple kernel regression,” inProc. Intl. Conf. Know. Discov. Data Mining, Aug. 2014, pp. 293–302.

[41] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, “Consensus in ad hocWSNs with noisy links—Part I: Distributed estimation of deterministicsignals,” IEEE Trans. Signal Process., vol. 56, pp. 350–364, Jan. 2008.

[42] Y. Shen, B. Baingana, and G. B. Giannakis, “Nonlinear structural equationmodels for network topology inference,” in Proc. Conf. Info. Sci. Syst.,Mar. 2016, pp. 163–168.

[43] Y. Shen, B. Baingana, and G. B. Giannakis, “Kernel-based structuralequation models for topology identification of directed networks,” IEEETrans. Signal Process., vol. 65, no. 10, pp. 2503–2516, May 2017.

[44] Y. Shen, B. Baingana, and G. B. Giannakis, “Topology inference ofdirected graphs using nonlinear structural vector autoregressive models,”in Proc. Intl. Conf. Acoust., Speech, Signal Process., 2017, pp. 6513–6517.

[45] Y. Shen, T. Chen, and G. B. Giannakis, “Random feature-based onlinemulti-kernel learning in environments with unknown dynamics,” J. Mach.Learn. Res., vol. 20, no. 22, pp. 1–36, 2019.

[46] Y. Shen and G. B. Giannakis, “Online identification of directional graphtopologies capturing dynamic and nonlinear dependencies,” in Proc. IEEEData Science Workshop, 2018, pp. 195–199.

[47] X. Sun, “Assessing nonlinear Granger causality from multivariate timeseries,” in Proc. Eur. Conf. Mach. Learn. Knowl. Discov. Databases,Sep. 2008, pp. 440–455.

[48] P. A. Valdés-Sosa et al., “Estimating brain functional connectivity withsparse multivariate autoregression,” Philos. Trans. Roy. Soc. London B:Bio. Sci., vol. 360, no. 1457, pp. 969–981, May 2005.

[49] G. Varoquaux, A. Gramfort, J.-B. Poline, and B. Thirion, “Brain co-variance selection: better individual functional connectivity models usingpopulation prior,” in Proc. Advances Neural Info. Process. Syst., 2010,pp. 2334–2342.

[50] G. Wahba, Spline Models for Observational Data (CBMS-NSF RegionalConference Series in Applied Mathematics). Philadelphia, PA, USA:SIAM, 1990.

[51] M. Wall and Y. Amemiya, “Estimation for polynomial structural equationmodels,” J. Amer. Statist. Assoc., vol. 95, no. 451, pp. 929–940, Oct. 2000.

Page 15: Nonlinear Structural Vector Autoregressive Models With ... · Nonlinear Structural Vector Autoregressive Models ... incapable of modeling nonlinear dependencies between neuronal time

SHEN et al.: NONLINEAR STRUCTURAL VECTOR AUTOREGRESSIVE MODELS WITH APPLICATION TO DIRECTED BRAIN NETWORKS 5339

Yanning Shen (M’19) received the B.Sc. and M.Sc.degrees in electrical engineering from the Univer-sity of Electronic Science and Technology of China(UESTC), Chengdu, China, in 2011 and 2014, respec-tively, and the Ph.D degree in electrical engineeringfrom the University of Minnesota, Minneapolis, MN,USA, in 2019. She joined the Department of Electri-cal Engineering and Computer Science as an Assis-tant Professor in August 2019. Her research interestsspan the areas of machine learning, network science,data science, and statistical-signal processing.

She received the UESTC distinguished B.Sc. thesis Award in 2011, anddistinguished M.Sc. Thesis Award in 2014. She was a Best Student Paper Awardfinalist of the 2017 IEEE International Workshop on Computational Advancesin Multi-Sensor Adaptive Processing, and the 2017 Asilomar Conference onSignals, Systems, and Computers. She was selected as Rising Stars in EECSby Stanford University in 2017, and received the UMN Doctoral DissertationFellowship in 2018.

Georgios B. Giannakis (F’97) received the Diplomain electrical engineering. from the National TechnicalUniversity of Athens, Athens, Greece, 1981, and theM.Sc. degree in electrical engineering, the M.Sc. de-gree in mathematics, and the Ph.D. degree in electricalengineering from the University of Southern Cali-fornia (USC), Los Angeles, CA, USA, 1983, 1986,and 1986. From 1982 to 1986, he was with USC.He was with the University of Virginia from 1987 to1998, and since 1999, he has been a Professor withthe University of Minnesota, Minneapolis, MN, USA,

where he holds an Endowed Chair in Wireless Telecommunications, a Universityof Minnesota McKnight Presidential Chair in ECE, and is the Director of theDigital Technology Center.

His general interests span the areas of communications, networking andstatistical signal processing—subjects on which he has authored or co-authoredmore than 450 journal papers, 750 conference papers, 25 book chapters, twoedited books, and two research monographs (h-index 143). His research focuseson learning from big data, wireless cognitive radios, and network science withapplications to social, brain, and power networks with renewables. He is the(co-)inventor of 33 patents issued. He is the (co-)recipient of 8 best paperawards from the IEEE Signal Processing (SP) and Communications Societies,including the G. Marconi Prize Paper Award in Wireless Communications. Healso received Technical Achievement Awards from the SP Society (2000), fromEURASIP (2005), a Young Faculty Teaching Award, the G. W. Taylor Award forDistinguished Research from the University of Minnesota, and the IEEE FourierTechnical Field Award (2015). He is a Fellow of EURASIP, and was with theIEEE in a number of posts, including that of a Distinguished Lecturer for theIEEE-SP Society.

Brian Baingana (M’16) received the B.Sc. degreein electrical engineering from Makerere University,Kampala, Uganda, in 2007, and the M.Sc. and Ph.D.degrees in electrical engineering from the Univer-sity of Minnesota, Minneapolis, MN, USA, in 2013and 2016, respectively. His broader research inter-ests include data and network science, and statisticallearning, with applications in social networks, brainconnectivity studies, and power grids. His recent in-terests have focused on developing real-time onlinealgorithms for inference and learning over large scale,

time-varying networks.Before his graduate studies, he worked with MTN Uganda, a telecommu-

nications operator. He currently works as a senior data scientist with NextEraAnalytics. He received the prestigious university-wide Graduate School Fellow-ship for the academic years 2009–2012, and a paper award at the 2015 GlobalConference on Signal and Information Processing.