Tracking Performance and Optimal Adaptation Step …bchamp/Papers/Jounal/IEEE-CNS2016a.pdfstate...

12
2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEE Transactions on Control of Network Systems 1 Tracking Performance and Optimal Adaptation Step-Sizes of Diffusion-LMS Networks Reza Abdolee, Member, IEEE, Vida Vakilian, Member, IEEE, Benoit Champagne, Senior Member, IEEE, Abstract—In this paper, we investigate the effect of adaptation step-sizes on the tracking performance of diffusion least-mean squares (DLMS) algorithms in networks under non-stationary signal conditions. We assume that the network parameter vector being estimated varies over time according to a first-order ran- dom walk model. To find the optimal adaptation step-sizes over the network, we formulate a constrained nonlinear optimization problem and solve it through a log-barrier Newton algorithm in an iterative manner. Our studies reveal that the optimal step-size of each node in the network not only depends on the statistics of the random walk and the energy profile of the node itself, but also on the energy and statistical profile of its neighboring nodes. The results show that the optimal step-sizes can substantially improve the performance of DLMS algorithms in tracking time-varying parameters over networks. We also find that the DLMS algorithms have faster tracking ability and superior steady-state mean-square deviation (MSD) performance than the DLMS in non-corporative mode since with the diffusion mode of cooperation, each node at each iteration can take a larger step toward the network optimal parameters. Index Terms—Diffusion adaptation, sensor networks, LMS algorithms, time-varying parameters, distributed estimation. I. I NTRODUCTION I N-NETWORK distributed adaptive signal processing is emerging as a key enabling technology for sensor networks to support the implementation of flexible cooperative learning and information processing schemes [1], [2]. This type of processing and learning mechanism, in which nearby nodes cooperate over wired or wireless links in the achievement of network-wide computational and control objectives without using a fusion center, will probably form one of the cor- nerstones of the future generation of data communications and control networks [3]–[9]. The potential applications of distributed adaptive processing are diverse, and currently, they are being considered in several areas, including factory automation, robotics, intelligent transportation, health care monitoring, precision agriculture, smart spaces and telecom- munications [10]–[15], [37]. There are several variant forms of distributed adaptive algo- rithms for parameter estimation over networks, including the two well-known types of techniques, i.e., consensus [16]–[28] and diffusion [29]–[33]. It was shown in [34] that for constant R. Abdolee is with Qualcomm, Inc., San Diego,. CA, 92121 (e-mail: [email protected]). B. Champagne is with the Department of Electrical and Computer En- gineering, McGill University, Montreal, QC, H3A 0E9 Canada (e-mail: [email protected]). V. Vakilian is with the Department of Computer and Electrical Engineer- ing & Computer Science, California State University, CA, 93311 (e-mail: [email protected]). step-size adaptation, network states can grow unbounded due to an inherent asymmetry in the consensus dynamics. The same problem does not occur for diffusion strategies [34], and for this reason, we focus on these algorithms in this work. The diffusion least mean-squares (DLMS) algorithm is one of the efficient algorithms in the diffusion class, besides the diffusion recursive least-squares (DRLS) [35], [36], that can demonstrate an excellent performance in the estimation of time-varying parameters over networks [34], [37]. This algorithm operates using a simple mode of cooperation, endow the network with adaptive and learning abilities and can exhibit agile tracking performance [29], [30], [37]. The mean and mean-square convergence behavior of DLMS algorithms in stationary signal environments have been exten- sively investigated in many previous works, including [30], [37], [38]. However, there are only a few studies that examined the performance of this algorithm in non-stationary signal environments. The latest works on this topic are [34] and [39], where the authors have compared the performance of DLMS with distributed consensus algorithms in time-varying environments. The results reported in these references con- firmed that the DLMS algorithms outperform the consensus ones and exhibit a superior tracking performance under such conditions. Nevertheless these works only used some arbitrary adaptation step-sizes that satisfy the mean and mean-square stability conditions of the algorithms. So far however, the critically important effects of the step-size parameters on the convergence and tracking properties of the DLMS algorithms in non-stationary signal environments have not been thor- oughly investigated. Furthermore, the link between the choice of optimal step-sizes and the dynamic model characterizing the time evolution of the underlying parameter vector subject to adaptive estimation has not been explored. A. Contributions In this paper, we study the tracking performance of DLMS algorithms in non-stationary signal environments, where the underlying parameter vector of interest evolves according to a first-order random walk model. To the best of our knowledge, such a key study of the DLMS algorithm along with the interpretation of the results from the perspective of control and networking engineering has not been attempted before in the literature. In particular, our main contributions can be summarized as follows in terms of novelty and importance: Investigating the effects of adaptation step-sizes on track- ing performance of DLMS algorithms in networks with non-stationary signal conditions.

Transcript of Tracking Performance and Optimal Adaptation Step …bchamp/Papers/Jounal/IEEE-CNS2016a.pdfstate...

2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEETransactions on Control of Network Systems

1

Tracking Performance and Optimal AdaptationStep-Sizes of Diffusion-LMS Networks

Reza Abdolee, Member, IEEE, Vida Vakilian, Member, IEEE, Benoit Champagne, Senior Member, IEEE,

Abstract—In this paper, we investigate the effect of adaptationstep-sizes on the tracking performance of diffusion least-meansquares (DLMS) algorithms in networks under non-stationarysignal conditions. We assume that the network parameter vectorbeing estimated varies over time according to a first-order ran-dom walk model. To find the optimal adaptation step-sizes overthe network, we formulate a constrained nonlinear optimizationproblem and solve it through a log-barrier Newton algorithmin an iterative manner. Our studies reveal that the optimalstep-size of each node in the network not only depends onthe statistics of the random walk and the energy profile of thenode itself, but also on the energy and statistical profile of itsneighboring nodes. The results show that the optimal step-sizescan substantially improve the performance of DLMS algorithmsin tracking time-varying parameters over networks. We alsofind that the DLMS algorithms have faster tracking ability andsuperior steady-state mean-square deviation (MSD) performancethan the DLMS in non-corporative mode since with the diffusionmode of cooperation, each node at each iteration can take a largerstep toward the network optimal parameters.

Index Terms—Diffusion adaptation, sensor networks, LMSalgorithms, time-varying parameters, distributed estimation.

I. INTRODUCTION

IN-NETWORK distributed adaptive signal processing isemerging as a key enabling technology for sensor networks

to support the implementation of flexible cooperative learningand information processing schemes [1], [2]. This type ofprocessing and learning mechanism, in which nearby nodescooperate over wired or wireless links in the achievement ofnetwork-wide computational and control objectives withoutusing a fusion center, will probably form one of the cor-nerstones of the future generation of data communicationsand control networks [3]–[9]. The potential applications ofdistributed adaptive processing are diverse, and currently,they are being considered in several areas, including factoryautomation, robotics, intelligent transportation, health caremonitoring, precision agriculture, smart spaces and telecom-munications [10]–[15], [37].

There are several variant forms of distributed adaptive algo-rithms for parameter estimation over networks, including thetwo well-known types of techniques, i.e., consensus [16]–[28]and diffusion [29]–[33]. It was shown in [34] that for constant

R. Abdolee is with Qualcomm, Inc., San Diego,. CA, 92121 (e-mail:[email protected]).

B. Champagne is with the Department of Electrical and Computer En-gineering, McGill University, Montreal, QC, H3A 0E9 Canada (e-mail:[email protected]).

V. Vakilian is with the Department of Computer and Electrical Engineer-ing & Computer Science, California State University, CA, 93311 (e-mail:[email protected]).

step-size adaptation, network states can grow unbounded dueto an inherent asymmetry in the consensus dynamics. Thesame problem does not occur for diffusion strategies [34],and for this reason, we focus on these algorithms in thiswork. The diffusion least mean-squares (DLMS) algorithm isone of the efficient algorithms in the diffusion class, besidesthe diffusion recursive least-squares (DRLS) [35], [36], thatcan demonstrate an excellent performance in the estimationof time-varying parameters over networks [34], [37]. Thisalgorithm operates using a simple mode of cooperation, endowthe network with adaptive and learning abilities and can exhibitagile tracking performance [29], [30], [37].

The mean and mean-square convergence behavior of DLMSalgorithms in stationary signal environments have been exten-sively investigated in many previous works, including [30],[37], [38]. However, there are only a few studies that examinedthe performance of this algorithm in non-stationary signalenvironments. The latest works on this topic are [34] and[39], where the authors have compared the performance ofDLMS with distributed consensus algorithms in time-varyingenvironments. The results reported in these references con-firmed that the DLMS algorithms outperform the consensusones and exhibit a superior tracking performance under suchconditions. Nevertheless these works only used some arbitraryadaptation step-sizes that satisfy the mean and mean-squarestability conditions of the algorithms. So far however, thecritically important effects of the step-size parameters on theconvergence and tracking properties of the DLMS algorithmsin non-stationary signal environments have not been thor-oughly investigated. Furthermore, the link between the choiceof optimal step-sizes and the dynamic model characterizingthe time evolution of the underlying parameter vector subjectto adaptive estimation has not been explored.

A. Contributions

In this paper, we study the tracking performance of DLMSalgorithms in non-stationary signal environments, where theunderlying parameter vector of interest evolves according to afirst-order random walk model. To the best of our knowledge,such a key study of the DLMS algorithm along with theinterpretation of the results from the perspective of controland networking engineering has not been attempted beforein the literature. In particular, our main contributions can besummarized as follows in terms of novelty and importance:• Investigating the effects of adaptation step-sizes on track-

ing performance of DLMS algorithms in networks withnon-stationary signal conditions.

2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEETransactions on Control of Network Systems

2

• Formulating a constrained nonlinear optimization prob-lem to find nearly optimal step-size parameter vectorwithin the stability range of DLMS algorithms.

• Finding the effect of the network energy profile on thenetwork optimal adaptation step-size parameter.

• Implementing a log-barrier interior point method andgradient projection algorithm to find adaptation step-sizesin adaptive networks for time-varying model parameters.

To be more specific, we first show how an improper choiceof adaptation step-sizes can substantially degrade the steady-state and tracking performance of DLMS algorithms. We thenexamine the mean-square error (MSE) performance of thesealgorithms with respect to changes in the adaptation step-sizes,and formulate an optimization problem to find their optimalvalues over the network, which enhance MSE performance ofthe algorithms in the steady-state. Furthermore, we learn that,similar to a stand-alone LMS algorithm in a non-stationaryenvironment, the steady-state performance of DLMS does notcontinue to improve by decreasing the adaptation step-sizes,but instead starts deteriorating past a certain point. In addition,we find that compared to LMS, DLMS algorithms demonstratefaster tracking ability and superior performance in the steady-state mean-square deviation (MSD) and excess mean-squaredeviation (EMSE). According to our results, this is becausewith the diffusion mode of cooperation, each node at eachiteration can take a larger step-size toward the network optimalpoint. Our analysis and simulation results also indicate that byusing the optimal step-size at each node, we can substantiallyimprove the steady-state MSE performance of the DLMSalgorithms, and enhance their tracking speed.

The paper is organized as follows. In Section II, we brieflyreview DLMS algorithms and explain their operation in theestimation of time-varying parameters over networks. In Sec-tion III, we characterize the performance of these algorithms intime-varying signal environments. In Section IV, we examinethe tracking performance of DLMS algorithms with respect tothe choice of step-size parameters. We present our numericalresults in Section V, and conclude the paper in Section VI.

Notation: We use boldface letters for random variables andnormal letters for deterministic variables. Symbol (·)T denotestransposition for real vectors and matrices, and (·)∗ denotesconjugate transposition for complex vectors and matrices. Weshow the trace operator with Tr(·), the spectral radius of amatrix with ρ(·) and the expectation by E[·]. We use diag{·}to extract the diagonal entries of a matrix, or to construct adiagonal matrix from a vector. The vec(·) operator vectorizesa matrix by stacking its columns on top of each other andbvec(·) is the block-vectorization operator [29]. We denote theKronecker products by ⊗, and block Kronecker products by⊗b.

II. DIFFUSION LMS STRATEGIES

We consider a collection of N nodes which are distributedover a geographical area to estimate the unknown time-varyingparameter vector wo

i ∈ CM×1 where i is the time index. Eachnode k ∈ {1, 2, · · · , N} at time instant i ∈ N collects themeasurement sample dk(i) that is related to the time-varying

parameter vector through the following regression model:

dk(i) = uk,iwoi + vk,i (1)

where uk,i ∈ C1×M represents the regression data and vk,i ∈C represents the output measurement noise. We assume thatwoi varies over time according to a first-order random walk

model [40]:

woi = wo

i−1 + qi (2)

where qi ∈ CM×1 introduces a random perturbation to theparameter vector at time i. In this model, the random-walksequence, {qi}, is assumed to be zero-mean independent andidentically distributed (i.i.d.) with positive-definite covariancematrix Q = E[qiq

∗i ]. Considering these properties of the

random sequence {qi} and relation (2), we observe that woi

has a constant mean and hence E[woi ] = E[wo

i−1]. We assumethat the regression data uk,i at each node k are zero-meanwide-sense stationary with covariance Ru,k = E[u∗k,iuk,i]. Forthis model, however, the cross-covariance vector at each nodek is time-varying and we, therefore, denote it by rdu,k,i =E[u∗k,idk(i)]. For mathematical tractability, in our analysis,we assume that {uk,i} are i.i.d over time and independentover space with positive definite covariance matrices. Themeasurement noises {vk(i)} are zero-mean random processes,i.i.d over time and independent over space with variances{σ2

v,k}. The noise {vk(i)} are independent of the regressors{um,j} for all i, j and k,m. These are customary assumptionsthat have been normally used in the context of distributedadaptive filtering [29], [30], [34], [37], [38].

It can be verified that if the moments Ru,k and rdu,k,iare available at the fusion center, the minimum mean squareerror (MMSE) estimate of the parameter vector through acentralized solution, at each time instance i, will be:

woi =

( N∑k=1

Ru,k

)−1 N∑k=1

rdu,k,i (3)

If we assume that the parameter vector woi varies slowly, the

following centralized LMS algorithm can be used to estimateand track its value:

wi = wi−1 + µN∑k=1

u∗k,i(dk(i)− uk,iwi−1

)(4)

where µ > 0 is the step-size parameter.

Motivated by the advantages of distributed algorithm suchas energy and communication bandwidth efficiency, the DLMSalgorithms [30], [37], [38] can be used to estimate and trackthe parameter vector wo

i over the network. The Adapt-then-Combine(ATC) variant of DLMS algorithm will take a similarform to that for the estimation of time-invariant parameters,i.e.:

ψk,i = wk,i−1 + µk∑`∈Nk

c`,ku∗`,i

(d`(i)− u`,iwk,i−1

)(5)

wk,i =∑`∈Nk

a`,kψ`,i (6)

where µk > 0 is the local step-size, the vectors ψk,i and wk,i

2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEETransactions on Control of Network Systems

3

are the intermediate estimates of woi at node k, and Nk is

the set of neighboring nodes (including node k itself) withwhich node k can share information. In (5) coefficients c`,kare non-negatives entries of the right-stochastic matrix C ∈RN×N where c`,k = 0 if ` /∈ Nk, and

∑Nk=1 c`,k = 1.

In (6), a`,k are nonnegative coefficients of a left-stochasticmatrix A ∈ RN×N , which satisfy the following conditions:a`,k = 0 if ` /∈ Nk, and

∑l∈Nk

a`,k = 1. There are variousmethods to obtain matrices C and A [30].

The Combine-then-Adapt (CTA) forms of DLMS [30], [37],[38] can be obtained by reversing the order of adapt andcombine steps in (5) and (6) and can be written as:

ψk,i−1 =∑`∈Nk

a`,kw`,i−1 (7)

wk,i = ψk,i−1 − µk∑`∈Nk

c`,ku∗`,i

(d`(i)− u`,iψk,i−1

)(8)

Detailed information about the operation and computationalcomplexity of these algorithms can be found in previousstudies, including [30], [34], [37], [38].

III. NETWORK PERFORMANCE CHARACTERIZATION

In this section, we derive expressions to characterize thenetwork MSD and EMSE of the DLMS algorithms in rapidlytime-varying model parameter vector. We will apply the sameanalysis strategy as used in [30], [38]. The new expressions arehowever different from those in previous works in that theycan characterize the network MSE for relatively large step-size values. As it will be explained in the sequel, the use oflarge step-sizes in DLMS algorithms is sometimes necessaryin order to maintain fast tracking ability should the parametervector of interest changes quickly.

The general form of DLMS algorithms for time-varyingparameter can be written as:

φk,i−1 =∑`∈Nk

a(1)`,kw`,i−1 (9)

ψk,i = φk,i−1 + µk∑`∈Nk

c`,k u∗`,i

(d`(i)− u`,iφk,i−1

)(10)

wk,i =∑`∈Nk

a(2)`,kψ`,i (11)

where a(1)`,k and a

(2)`,k are the (`, k) element of left-stochastic

combination matrices A1 and A2, respectively. We use thegeneral form of DLMS since it includes the CTA and ATCalgorithms as special cases. In particular, once the analyticalexpressions are obtained for all A1 and A2, setting A1 = Iand A2 = A yields the analysis results for ATC and A1 = Aand A2 = I gives the results for CTA algorithm. In addition,setting A1 = A2 = C = I generates the analysis results forDLMS networks in non-cooperative mode.

We begin by introducing the local error vectors at nodek, i.e., wk,i , wo

i − wk,i, ψk,i , woi − ψk,i and

φk,i , woi − φk,i. These are used to define the global

error vectors of the network, i.e., φi , col{φ1,i, · · · , φN,i},ψi , col{ψ1,i · · · , ψN,i}, and wi , col{w1,i, · · · , wN,i}.We further define the extended weighting matrices A2 ,

A2⊗IM A1 , A1⊗IM and C , C⊗IM . Next, we introducethe following matrices and vectors:

Ri ,N∑`=1

diag{c`,1 u

∗`,iu`,i, · · · , c`,N u∗`,iu`,i

}(12)

M , diag{µ1IM , · · · , µNIM

}(13)

gi , CT col{u∗1,iv1(i), · · · ,u∗N,ivN (i)

}(14)

pi , 1N ⊗ qi (15)

where 1N is a column vector with size N and unit entries.Using these definitions and subtracting wo

i from both sides(9)-(11), we then arrive at:

wi =AT2 (I −MRi)AT1 wi−1 −AT2Mgi

+AT2 (I −MRi)AT1 pi (16)

which shows the evolution of the network error vector overtime. We note that the instantaneous network error vector(16) has one additional error term compared with the standardDLMS that was reported in [30], [38], i.e., ei = AT2 (I −MRi)AT1 pi. Under previous assumption that qi and vk(i)are zero-mean and independent of uk,i for all k, we obtainE[pi] = E[ei] = E[gi] = 0. Considering these equalities andtaking the expectation of the global error vector wi in (16),we arrive at:

E[wi] = AT2 (I −MR)AT1 E[wi−1] (17)

where R , E[Ri] =∑N`=1 diag

{c`,1Ru,`, . . . , c`,N Ru,`

}.

The expression (17) is similar to the expression of the meanerror vector of standard DLMS. Therefore, the mean stabilitycondition of standard DLMS algorithm, as developed in [30],[38], remains valid under the non-stationary condition consid-ered in this work. Therefore, for the algorithm to be stable inthe mean, the step-size at each node k must satisfy

0 < µk <2

ρ(∑

` c`,kRu,`

) k ∈ {1, . . . , N} (18)

We will use (18) to find the optimal step-size parameters ofthe network in the following section.

A. Steady-State Mean-Square Performance

To characterize the mean-square performance of the DLMSalgorithms in non-stationary signal environments, we use (16)and form the variance relation by computing the weighted-squared norm of (16). Then, taking the expectations on bothsides:

E‖wi‖2Σ =E‖wi−1‖2Σ′ + E‖pi‖2Σ′

+ E[g∗iMTA2ΣAT2Mgi] (19)

where Σ > 0 is a weighting matrix with com-patible dimension that we are free to choose andΣ′ = A1(I −MRi)

∗A2ΣAT2 (I −MRi)AT1 , E‖pi‖2Σ′ =E[p∗iΣ

′pi] = Tr(Σ′P), Σ′ = E[Σ′] and P = E[pip∗i ]. We

obtain (19) by using the fact that wi−1 is independent of giand pi, and E[gi] = E[pi] = 0 according to the network datastatistic presented in Section II. Relation (19) can be written

2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEETransactions on Control of Network Systems

4

as:

E‖wi‖2Σ = E‖wi−1‖2Σ′ + Tr(ΣAT2MGMA2) + Tr(Σ′P)(20)

where G , E[g∗i gi] which can be written asG = CT diag

{σ2v,1Ru,1, . . . , σ

2v,NRu,N

}C. Introducing

σ , bvec(Σ), and σ′ , bvec(Σ′), the variance relation in(20), becomes:

E‖wi‖2σ =E‖wi−1‖2σ′ +[bvec(AT2MGMA2)

]Tσ

+[bvec(PT )

]Tσ′ (21)

where σ′ = Fσ and F is computed below for two specialcases: small step-size approximation and exact closed-formevaluation using Gaussian distribution assumption. For the firstcase, using the property of conditional expectation togetherwith the independence of wi−1 and Ri, we can extend themultiplication terms in Σ′ to obtain:

E[Σ′] = A1A2ΣAT2AT1 −A1RMA2ΣAT2AT1−A1A2ΣAT2MRAT1 +O(M2) (22)

where

O(M2) = E[A1RiMA2ΣAT2MRiAT1 ] (23)

For small step-sizes this term can be neglected, since itdepends on {µ2

k}. Considering this and using σ′ = Fσ, weobtain:

F ≈ (A1 ⊗b A1)(I − I ⊗b RM−RTM⊗b I)(A2 ⊗b A2)(24)

As will be demonstrated below, however, this approximationcauses large errors if relatively large step-sizes are chosen. Inwhat follows, to circumvent this limitation, we derive an exactclosed-form expression for F that is required to characterizethe network behavior in the estimation and tracking of rapidlyvarying parameter vectors. To make the derivation simplerwe assume that the distribution of the regression data uk,iis Gaussian.

Remark 1. In the estimation of time-invariant parameters inadaptive networks, it has been shown that the MSD decreasesas the optimization step-size parameters become smaller [30],[38]. Moreover, choosing small step-sizes will only lead to alower convergence speed. The situation is however different ifthe the parameter vector of interest varies over time. In thiscase, the step-sizes cannot be chosen arbitrarily small. Thiscan be explained using the adaptation step (10) in generalDLMS algorithms. If at each node k, the step-size µk is chosen

to be very small then the innovation term

∆φk,i = µk∑`∈Nk

c`,k u∗`,i

(d`(i)− u`,iφk,i−1

)(25)

which is used to update φk,i in (10) also becomes very small.In turn, this adversely affects the tracking performance of thealgorithm when the perturbation term qi in (2) is relativelylarge. This is because if the change from wi−1 to wi islarge due to large random walk qi, the DLMS algorithms willneed to proportionally add large ∆φk,i to track the change.However, when the step-sizes µk are chosen very small, theinnovation term, ∆φk,i, become small and the algorithmsfail to track changes. Therefore, in the estimation of time-varying parameters in adaptive networks, the step-sizes ofDLMS algorithms cannot be chosen arbitrarily small suchthat term O(M2) in (22) can be neglected. This motivatesthe derivation of an exact closed-form expression for F toproperly characterize the MSE of DLMS algorithms in trackingtime-varying parameters.

To find an exact closed-form expression to evaluate F , weneed to compute the term given in (23). As shown in AppendixA, if we assume that the regressors uk,i are zero-mean circularcomplex Gaussian random vectors, F will be:

F =(A1 ⊗b A1)(I − I ⊗b RM−RTM⊗b I

)× (A2 ⊗b A2) + ∆F (26)

where ∆F is given in (27). In this expression, β = 2 forthe real-valued data, while β = 1 for the complex-valueddata. Now the steady-state MSD and EMSE expressions of thenetwork can be obtained from (21) as follows. By definition,the MSD at node k is: η(k) = limi→∞ E‖wk,i‖2, whichalternatively, can be retrieved from the global error vector wi

as

η(k) = limi→∞

E‖wi‖2diag(ek)⊗IM(28)

where ek ∈ ZN×1 is a column vector with one at position kand zeros elsewhere. We use (20) to write:

limi→∞

E‖wi‖2(I−F)σ =([bvec(AT

2 MGTMA2)]T

+[bvec(PT )

]TF)σ (29)

Using (28) and (29), we obtain:

η(k) =(

bvec(AT2MGTMA2)T + bvec(PT )TF)σmsdk

(30)

where σmsdk= (I − F)−1bvec(diag(ek) ⊗ IM ). Note that

(I − F) is nonsingular since ρ(F) < 1. In the same way,we can compute the EMSE for each node k, starting from its

∆F = (A1 ⊗b A1)

{(RT ⊗b R) +

N∑m=1

[diag{vec(Cm1N×NCm)}

]⊗[(β − 1)(RTu,m ⊗Ru,m) + rmr

∗m

]}(M⊗bM)(A2 ⊗b A2) (27)

2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEETransactions on Control of Network Systems

5

definition, i.e., ζ(k) = limi→∞ E‖wi‖2diag(ek)⊗Ru,k, which

finally lead us to

ζ(k) =(

bvec(AT2MGTMA2)T + bvec(PT )TF)σemsek

(31)

with σemsek = (I −F)−1bvec(diag(ek)⊗Ru,k).

B. Mean-Square Transient Analysis

Starting from (21) we obtain the following expression tocharacterize the mean-square behavior of the algorithm intransient-state:

E‖wi‖2σ = E‖wi−1‖2σ + ‖wo−1‖2Fi(I−F)σ + sTF iσ. (32)

where s =[bvec(AT2MGTMA)

]+ FT

[bvec(PT )

]. By,

respectively, replacing σ with bvec(diag(ek ⊗ Im)

)and

bvec(diag(ek ⊗ Ru,k)

)into (32), we then arrive at two

expressions for the evolution of MSD and EMSE, at eachnode k. Note that to obtain these expressions, we consider zeroinitialization at each node over the network, i.e., wk,−1 = 0for all k. The steady-state and transient network MSD are,respectively, defined as the average of the steady-state andtransient MSD of the nodes. The same holds for the networksteady-state and transient EMSE. These results will be usedin Section V to demonstrate the transient MSD and EMSEperformance of diffusion-LMS networks.

IV. TRACKING PERFORMANCE AND OPTIMAL STEP-SIZES

In this section, we examine the tracking performance ofDLMS algorithms in non-stationary signal environments forboth cooperative and non-cooperative networks. We also inves-tigate how an optimal choice of step-size parameter vector canenhance the tracking ability of these algorithms and minimizetheir steady-state MSD performance. In particular, we obtainoptimal step-sizes of the DLMS algorithms by defining aconstrained nonlinear optimization problem and solving itthrough a log-barrier Newton algorithm.

A. Optimal Step-Size in a Non-cooperative Network

The MSD and EMSE of each node k in a non-cooperativenetwork can be obtained from (30) and (31) by setting

A1 = A2 = C = IN (33)

For this scenario, these expressions more explicitly can berewritten as:

η(k) =(µ2k σ

2v,k r

∗k + b∗Fk

)(I − Fk)−1mk (34)

ζ(k) =(µ2k σ

2v,k r

∗k + b∗Fk

)(I − Fk)−1rk (35)

where b = vec(Q), rk = vec(Ru,k), mk = vec(IM ) and

Fk = I − 2µk RTu,k ⊗ IM + µ2

k

[β(RTu,k ⊗Ru,k) + rkr

∗k

](36)

We call signal dk(i) slowly time-varying if σ2q � σ2

v,k

otherwise refer to it rapidly time-varying.We now proceed to explain why the exact closed-form

expression (26) needs to be taken into account in finding the

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8−25

−20

−15

−10

−5

0

5

µ k

MSD(d

B)

Theory (exact)Theory (small−step−size)Simulation

Optimal step−size

Fig. 1: Steady-state MSD performance of node k in a slowly time-varying environment where σ2

q = 0.0025σ2v,k.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8−25

−20

−15

−10

−5

0

5

µk

MSD(d

B)

Theory, smal step−size

Theory, exact

Simulation

Optimal step−size

Fig. 2: Steady-state MSD performance of node k in rapidly time-varying environments where σ2

q = 0.25σ2v,k. The minimum point of

the solid curve (theory with approximation) does not match with thatof cross-marked curve (simulation result).

optimal value of the step-sizes over networks with rapidlytime-varying signals. For simplicity, we only present the MSDperformance results for a single node over the network. Toperform these numerical experiments, we use the DLMS algo-rithm in the non-cooperative mode (i.e., A1 = A2 = C = I).The regression data and the measurement noise are circularcomplex-valued Gaussian. Fig. 1 and 2 show the steady-stateMSD of node k after 6000 iteration against the step-size µk.For small step-size approximation, we have

Fk ≈ I − 2µk RTu,k ⊗ IM (37)

while the exact Fk value is given by (36). As it is shownin Fig. 1, in slowly time-varying signal environment whereTr(Q) = Mσ2

q = 0.005σ2v,k, the optimal values of the step-

size that give the minimum MSD at node k are identical forthe exact value and small step-size approximations of Fk. Thiswill be also the case for the optimal choice of the step-sizein the steady-state EMSE curve which is not shown here. Incontrast, as shown in Fig. 2, in the rapidly time-varying signalenvironment where Tr(Q) = Mσ2

q = 0.5σ2v,k, the optimal

values of the step-sizes obtained from the small step-sizeapproximation are far away from the correct values. Therefore,to compute the optimal step-size parameters µk in rapidly

2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEETransactions on Control of Network Systems

6

varying signal environments, the terms depending on µ2k in

Fk cannot be ignored.As it is seen from (34) and (35), in fast non-stationary signal

environments, the random walk qi and ∆F depend on µ2k in

Fk, which make the MSD and EMSE of each node k nonlinearfunctions of the step-sizes µk1. Therefore, unlike the stationarysignal environment, the MSD and EMSE performance does notimprove as step-sizes approach zero. In fact, in non-stationaryscenario, there exists an optimal step-size in the stability rangeof the algorithm for which the MSD and EMSE are minimized.The optimal step-size that minimizes the MSD of each nodecan be found by solving the following optimization problem:

µomsdk= arg min

µk

η(k, µ)

s.t 0 < µk <2

Tr(∑

`∈Nkc`,kRu,`

) (38)

The constraint in (38) is obtained from the DLMS stabilityrange (18). For slowly time-varying signals, a closed-formsolution can be obtained by using the small step-size approx-imation in the evaluation of Fk. To this end, we use (34) and(35) to obtain:

η(k) = µk σ2v,k r

∗k

(RTu,k ⊗Ru,k

)−1mk/2

+ b∗Fk(RTu,k ⊗Ru,k

)−1mk/2µk (39)

ζ(k) = µk σ2v,k r

∗k

(RTu,k ⊗Ru,k

)−1rk/2

+ b∗Fk(RTu,k ⊗Ru,k

)−1rk/2µk (40)

It can be verified that (39) and (40) are convex with respectto µk ∈ {µ1, . . . , µN}2. Therefore, computing the derivativesof MSD and EMSE with respect to µk and forcing them tozero yields, respectively:

µomsdk=

√b∗(RTu,k ⊗ IM )−1mk

σ2v,kr

∗k(RTu,k ⊗ IM )−1mk

(41)

µoemsek=

√b∗(RTu,k ⊗ IM )−1rk

σ2v,kr

∗k(RTu,k ⊗ IM )−1rk

(42)

These expressions give the optimal step-size values that leadto minimum steady-state MSD and EMSE at a given nodek in slowly-varying signal environments in non-cooperativemode. These results are in agreement with the optimal step-size parameter obtained for the stand-alone LMS filter in[41]. In rapidly varying signal environments, these expressionsare invalid as explained. Alternatively, we propose to use aconstrained iterative optimization method, specifically, the log-barrier Newton method [42], to obtain the local optimal step-sizes over the network.

B. Optimal Step-Sizes of DLMS Algorithms

We now propose an algorithm to obtain the optimal step-sizes of DLMS algorithms that leads to lower estimation error

1It can be verified that if we ignore µ2k in Fk , the MSD and EMSE will belinear functions of µk whose value monotonically decreases as µk becomesmaller.

2In general, η(k) and ζ(k) may not be convex for all network topologiesand signal conditions.

0

0.5

00.10.20.30.40.5−35

−30

−25

−20

−15

−10

−5

0

5

µ1µ

2

MS

D (

dB

)

Fig. 3: Steady-State MSD performance of a two-node networkas a function of µ1 and µ2.

and faster tracking ability in the network. We begin this partby illustrating the MSD performance of a simple network withN = 2 against variation of the step-size parameters (see Fig.3). As shown in this figure, there exists an optimal step-sizeparameter vector at which DLMS algorithm attains its min-imum MSD. Similar to the non-cooperative LMS algorithm,in non-stationary signal environments, the steady-state MSDand EMSE of DLMS algorithms increase as the step-sizesmove away from their optimal points. A similar behavior isseen in network with multiple nodes. We can find the optimaladaptation step-size vector of the network by solving thefollowing optimization problem:

(µo1, · · · , µoN ) = arg min(µ1,··· ,µN )

{(γ + αF)

1

N

N∑k=1

σmsdk

}s.t 0 < µk <

2

Tr(∑

`∈Nkc`,kRu,`

) (43)

where k = {1, 2, . . . , N}, α = bvec(PT )T , and γ =bvec(AT2MGTMA2)T . Note that the constraint in (43) guar-antees that the obtained solution will satisfy the stabilitycondition (18). A similar optimization problem can be formedfor minimization of the EMSE over the network. The non-linear constrained optimization problem (43) can be convertedto an unconstrained optimization problem by adding a log-barrier function to the objective, i.e.,

(µo1, · · · , µoN ) = arg min(µ1,··· ,µN )

L(µ) (44)

where

L(µ) =(γ + αF)1

N

N∑k=1

σmsdk

− 1

t

N∑k=1

[log(µk) + log(

2

Tr∑`∈Nk

c`,kRu,`− µk)

](45)

and t > 0 controls the accuracy of the solution. The solutionof the log-barrier method, denoted by µ, is suboptimal and itis away from the optimal point by N

t . Therefore, as t→∞, it

2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEETransactions on Control of Network Systems

7

approaches the solution1 of the original problem (43) denotedby µo = (µo1, . . . , µ

oN ). Here, we introduce the threshold ε as a

desired stoping criterion of the following interior point methodoptimization algorithm to find the solution. The nominal valuesfor ε and t are, respectively, 10−7 and 0.01. In this approach,summarized under Algorithm 1, f(µ, t) can be any iterativegradient algorithms, such as the steepest descent or the Newtonalgorithm. In our implementation of Algorithm 1, we employ

Algorithm 1 : Step-size optimization

whileN

t> ε

µi = f(µi−1, t) (Update the estimated step-size)

t = 2t (Increase the threshold)

Newton algorithm as it offers a fast convergence speed. Notethat at each iteration i the solution is away from the optimalpoint by N

t . We double the value of t at each iterationand use the previous solution of the Newton algorithm toeventually approach the optimal step-size parameters. In thisimplementation, for each i, several Newton iterations, denotedby j, are performed before doubling the value of t. Thecriterion to exit Newton iteration will be given at the end ofthis section. The iterative Newton algorithm can be representedas:

µj = µj−1 − α(∇2µL(µj−1)

)−1

∇µL(µj−1) (46)

where α > 0 is the Newton step-size, and ∇µL(µj−1)and ∇2

µL(µj−1), respectively, are the gradient and Hessianmatrices of the the augmented MSD function, L(µ), withrespect to µ:

∇µL(µ) , [∂L(µ)

∂µ1,∂L(µ)

∂µ2, · · · , ∂L(µ)

∂µN]T (47)

∇2µL(µ) ,

∂2L(µ)∂2µ1

∂2L(µ)∂µ1∂µ2

· · · ∂2L(µ)∂µ1∂µN

∂2L(µ)∂µ2µ1

∂2L(µ)∂2µ2

· · · ∂2L(µ)∂µ2∂µN

... · · · · · ·...

∂2L(µ)∂µNµ1

∂2L(µ)∂µN∂µ1

· · · ∂2L(µ)∂2µN

(48)

To improve the convergence speed of the Newton algorithm,a line search algorithm such as Armigo rule [43] can beimplemented to compute the value of α at each iteration. Nev-ertheless, implementing a line search algorithm will be moreuseful for algorithms with slow convergence speed such as thegradient descent. Introducing lk , 2/(Tr

∑`∈Nk

c`,kRu,`),the enteries of the first order gradient are computed as:

∂L(µ)

∂µn=

1

N

{%∂F(µ)

∂µn+∂γ(µ)

∂µn

} N∑k=1

σmsdk

− 1

t

( 1

µn− 1

ln − µn)

(49)

1The optimization problem (43) is not convex in general. The term“solution” is used to refer to one of the local stationary point of this functionwithin the stability range of DLMS that leads to a lower MSD.

where % = α+ (αF +γ)(I−F)−1 and the partial derivativesare:

∂F(µ)

∂µn= −(A1 ⊗b A1)

[(RT ∂M(µ)

∂µn)⊗b (I −RM)

+ (I −RTM)⊗b R∂M(µ)

∂µn

](A⊗b A) (50)

∂γ(µ)

∂µn=[bvec(AT2

∂M(µ)

∂µnGTMA)

+ bvec(AT2MGT∂M(µ)

∂µnA2)

]T(51)

∂M(µ)

∂µn= diag{0, · · · , IM , · · · ,0} (52)

The elements of the Hessian matrix are computed as:

∂2L(µ)

∂µm∂µn={∂%(µ)

∂µm

∂F(µ)

∂µn+ %

∂2F(µ)

∂µmµn+∂2γ(µ)

∂µmµn

− (%∂F(µ)

∂µn+∂γ(µ)

∂µn)(I −F)−1 ∂F(µ)

∂µm

} 1

N

N∑k=1

σmsdk

+ δnm1

t

( lµ2n

− 1

(ln − µn)2

)(53)

where∂%(µ)

∂µm= (α

∂F(µ)

∂µm+∂γ(µ)

∂µm)(I −F)−1

− (αF + γ)(I −F)−1 ∂F(µ)

∂µm(I −F)−1 (54)

∂2F(µ)

∂µmµn= (A1 ⊗b A1)

[(RT ∂M(µ)

∂µn)⊗b (R∂M(µ)

∂µm)

+ (R∂M(µ)

∂µm)⊗b R

∂M(µ)

∂µn

](A⊗b A) (55)

∂2γ(µ)

∂µmµn=[bvec(AT2

∂M(µ)

∂µnGT ∂M(µ)

∂µmA)+

bvec(AT2∂M(µ)

∂µmGT ∂M(µ)

∂µnA2)

]T(56)

The stopping criterion for the Newton iteration will be

∇µL(µj)∇2µL(µj)∇µL(µj) < ε (57)

As shown, the optimal step-size of each node not only dependson the energy profile of the node itself but also on thatof its neighbors. The initial value of the parameter vectori.e., µk(−1) can impact the final value of the convergingpoint of Algorithms 1 if chosen arbitrarily. Specifically, ourexperiments suggest that these initial values should be chosenwithin the stability range (18). In particular, the followingnominal choice is recommended:

µk(−1) =1

5Tr(∑Nk=1 c`,kRu,`)

(58)

V. NUMERICAL RESULTS

In this section, we present the results of our computersimulations to demonstrate the tracking performance of DLMSalgorithms and to verify the theoretical findings. We considera connected network with N = 10 nodes that are placed

2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEETransactions on Control of Network Systems

8

0 0.2 0.4 0.6 0.8 10.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1

2

3

4

5

6

7

8

9

10

x

y

Fig. 4: Network topology.

1 2 3 4 5 6 7 8 9 100

0.02

0.04

0.06

0.08

0.1

Node k

σ2 v,k

1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

Node k

TrR

u,k

Fig. 5: Network noise variance σ2v,k and regressor power

Tr(Ru,k).

randomly on a unit square area (x, y) ∈ [0 1] × [0 1],as shown in Fig. 4. The maximum communication range ofeach node is 0.4 unit length, i.e., two nodes are consideredas neighbors if their distance is less than 0.4. The networkenergy profile is shown in Fig. 5. The objective of thenetwork is to cooperatively estimate and track a time-varyingparameter vector wo

i of size M = 2 that evolves with timeaccording to the first order random walk model given in (2).For this model, we consider both rapidly and slowly time-varying parameters. In this simulation, we choose Tr(Q) =0.0025 min(σ2

v,1, · · · , σ2v,N ) for slowly varying parameters.

We set A1 = I , compute matrix C according to metropolisrule [29] and obtain the combination matrix A2 according tothe relative degree criterion [30]. In the results, we use CMetand ARel to denote Metropolis and relative degree combinationmatrices, respectively. We use Algorithm 1 to find the optimalstep-sizes of the network. For this algorithm, we use ε = 10−7,and initialize t at 0.01. We choose the initial value µk(−1)according to (58) for all k.

Figs. 6 and 7 show that the algorithm finds the nodes’optimal step sizes after about 30 iterations. These figures alsoshows how the steady-state MSD and EMSE of the networkimprove at each iteration of this algorithm.

Fig. 8 shows the MSD of the network versus the mean ofthe step-sizes, at each iteration j, for cooperative and non-cooperative networks. These results indicate that the network

5 10 15 20 25 30 35−35

−34

−33

−32

−31

−30

−29

−28

−27

−26

Interior point method iteration

Mag

nitude(dB)

Network EMSE

Network MSD

Fig. 6: Network performance improvement in terms of MSDand EMSE versus iteration number in step-size optimization.

5 10 15 20 25 30−35

−34

−33

−32

−31

−30

−29

−28

−27

−26

Inte ri or point method i te rat i on

MSD(d

B)

Node 1

Node 2

Node 3

Node 4

Node 5

Node 6

Node 7

Node 8

Node 9

Node 10

Fig. 7: Node performance improvement in terms of MSDversus iteration number in step-size optimization.

0 0.05 0.1 0.15 0.2-36

-34

-32

-30

-28

-26

-24

-22

-20

-18

-16

1

N

∑N

k=1µk

MSD(dB)

A=Rel, C=Met

A=I, C=I

The initial points

The optimal points

Fig. 8: The network MSD versus average step-size parameters.

2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEETransactions on Control of Network Systems

9

TABLE I: Performance comparison of the step-size optimiza-tion methods for slowly-varying network environments.

MethodStep-size Formula (41) Algorithm 1µo1 0.00905 0.00918µo2 0.00900 0.00910µo3 0.01739 0.01760µo4 0.01946 0.01967µo5 0.01355 0.01373µo6 0.04472 0.04780µo7 0.01471 0.01510µo8 0.00715 0.00724µo9 0.01678 0.01748µo10 0.00667 0.00675

with DLMS algorithms has higher tracking agility than thenetwork with non-cooperative LMS.

In Table (I), we compare the results of Algorithm 1 and theclosed form expression (41) in finding the optimal step-sizesof the network in the non-cooperative mode where the signal isslowly time-varying. We observe that the numerical differencebetween the results of these two methods is insignificant andthe Algorithm 1 performs well.

In Fig. 9, we evaluate the performance of DLMS algorithmsin tracking time-varying parameters and examine the impactof step-size parameters on their performance. We observe thatthe network estimates the underlying parameters of interestafter few samples and keeps tracking the parameter changesover time.

Figs. 10 and 11 show the steady-state and transient behaviorof DLMS algorithms in terms of MSD with optimal and non-optimal step sizes. We use the optimal step-sizes obtainedfrom previous simulations and use equal step-sizes withinthe stability range of the algorithm as non-optimal step-sizevalues. For this simulation, the non-optimal values are chosenas µk = 0.01 for all k. The algorithm is tested for two networkmodes, i.e, cooperative and non-cooperative. As these resultsindicate, the network in the cooperative mode with ARel andCMet outperforms the network in non-cooperative case. Wealso observe that the MSD performance of all nodes overthe network is almost identical in cooperative mode whilein the non-cooperative mode, the performance discrepancybetween nodes is more than 5dB. This implies that nodesreach a consensus on the estimated system parameters despitehaving different energy and noise profile. In all scenarios, thesimulations and analysis results coincide well.

From Figs. 10 and 11, we also observe that the optimal step-size parameters, {µok}, significantly increase the convergencespeed of the algorithm and enhance the accuracy of theestimation.

Fig. 12 shows the performance of DLMS [5] with fixedstep-sizes and with the optimal step-sizes proposed in rapidlyvarying signal environment. For this simulation, we increasethe value of Tr(Q) by 10 times to model the fast-varying signalcondition. All the other parameters are chosen the same as theprevious case. The nearly optimal step-sizes were obtainedusing the proposed interior point method. The computed values

200 400 600 800 1000 1200

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Sample i

wi

True wi(1)

Simulation wi(1)

True wi(2)

Simulation wi(2)

Fig. 9: Time evolution of adaptive parameter estimates overthe network.

1 2 3 4 5 6 7 8 9 10−40

−35

−30

−25

Node k

Steadystate

MSD(dB)

Simulation, A2 = ARel, C = CMet, Optimal step-sizeTheory, A2 = ARel, C = CMet, Optimal step-sizeSimulation, A2 = ARel, C = CMet

Theory, A2 = ARel, C = CMet

Simulation, A2 = I, C = I , Optimal step-sizeTheory, A2 = I, C = I , Optimal step-sizeSimulation, A2 = I, C = ITheory, A2 = I, C = I

Fig. 10: The steady-state MSD performance of DLMS indifferent modes, i.e., ATC and non-cooperative modes.

200 400 600 800 1000 1200−40

−35

−30

−25

−20

−15

−10

−5

0

Sample i

Network

MSD(dB)

Simulation, A2 = ARel, C = CMet, Optimal step-size

Theory, A2 = ARel, C = CMet, Optimal step-size

Simulation, A2 = ARel, C = CMet

Theory, A2 = ARel, C = CMet

Fig. 11: Network transient MSD performance using DLMSalgorithm with different choices of the combination matricesand optimal and non-optimal step-size parameters.

2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEETransactions on Control of Network Systems

10

0 50 100 150 200 250 300 350 400−35

−30

−25

−20

−15

−10

−5

0

5

Time i

Netw

ork

MSD

andEMSE(d

B)

EMSE, S imu lation [5]

EMSE, Theory [5]

MSD, S imu lation [5]

MSD, Theory [5]

EMSE, S imu lation , Optimal ste p - siz e

EMSE, Theory, Optimal ste p - siz e

MSD, S imu lation , Optimal ste p - siz e

MSD, Theory, Optimal ste p - siz e

Fig. 12: The MSE performance of conventional DLMS algo-rithm [5] and DLMS with optimal step-sizes in fast-varyingsignal environement.

0.05 0.1 0.15 0.2 0.25 0.3 0.35−36

−34

−32

−30

−28

−26

−24

∑N

k=1µk

MagnitudedB

EMSE , IPM with Newton

MSD , IPM wi th Newton

EMSE , P ro je c ted Gradi ent

MSD , Pro je c ted Gradi ent

Conve rgedpoints

di re c t i on of conve rgence

Ini t i alpoints

Fig. 13: Trajectory of PG and IPM with Newton versus averagevalue of the adaptation step-sizes.

are µ1 = 0.0542, µ2 = 0.0440, µ3 = 0.0583, µ4 = 0.1630µ5 = 0.1099, µ6 = 0.2445, µ7 = 0.0566, µ8 = 0.0419, µ9 =0.0917, and µ10 = 0.0279. As expected, for the fast-varyingcase, the computed step-sizes are larger than the ones obtainedfor the slow-varying case. As shown in Fig. 12, the DLMSalgorithm with the optimal step-sizes not only converges fasterbut also achieves a lower error rate that the DLMS with non-optimal step-size values. We also note that the theory andsimulation results coincide well for all cases.

To find the optimal step-size values, we have also implem-neted the gradient projection method, also refered to as pro-jected gradient (PG) algorithm. The numerical results for thisalgorithm and the interior point method (IPM) with Newtoniterations are presented in Figs. 13 and 14. Fig. 13 showsthe convergence bahavior of the PG algorithm and compare itwith that of the IPM algorithm. As illustrated in this figure,both algorithms take almost similar trajectory for

∑Nk=1 µk to

reach the desired step-size parameter vector. However, as Fig.14 shows the IPM converges after 35 iterations whereas PGalgorithm takes about 2000 iterations. It should be noted that

10 20 30 40 50 60 70 80 90

−34

−32

−30

−28

−26

−24

−22

ite rat ion number

MagnitudedB

EMSE, Pro jected Gradient

MSD, Pro jected Gradient

EMSE, IPM with Newton

MSD, IPM with Newton

Fig. 14: The MSD and EMSE convergence bahvior versusiterations with PG and IPM with Newton.

the PG algorithm has lower computational complexity than theIPM. Our numerical simulation reveals that the performanceof the PG algorithm highly depends on the step-size it uses.Therefore, a line search algorithm should be implementedalong with it. In contrast, the performance of IPM with Newtonalgorithm is satisfactory even with no line search algorithm.

VI. CONCLUSIONS

We examined the performance of DLMS algorithms in non-stationary signal environment where the underlying modelparameters of interest vary over time. We showed that thereexists an optimal step-size parameter vector by which theDLMS algorithms reach their minimum MSE in the steady-state. We found this optimal step-size parameter vector byformulating a constrained nonlinear optimization problem andsolving it through a log-barrier Newton algorithm. Our studiesshow that the optimal step-size parameter at each node in thenetwork not only depends on the statistics of the random walkand the energy profile of the node itself but that is also dependson the energy and statistical profile of neighboring nodes. Theresults show that the optimal step-size parameters obtainedthrough constrained optimization can substantially improvethe performance of DLMS algorithms in tracking time-varyingparameters over networks.

APPENDIX ADERIVATION OF (27)

We first note that when uk,i are zero mean circular complex-valued Gaussian random vectors, then for any Hermitianmatrix Γ of compatible dimensions it holds that [41]:

E[u∗k,iuk,iΓu∗k,iuk,i] = β(Ru,kΓRu,k) +Ru,kTr(ΓRu,k)

(59)where β = 1 for complex regressors and β = 2 when theregressors are real. Based on this result, we obtain:

E[u∗k,iuk,iΓu∗`,iu`,i] = Ru,kΓRu,` + δkl(β − 1)Ru,kΓRu,k

+ δklRu,kTr(ΓRu,k) (60)

2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEETransactions on Control of Network Systems

11

δkl is Kronecker delta function. We now proceed to compute Fusing this result. The block vectorization of the first three termsin (22) is straightforward. We only present the computation ofthe fourth term. Tacking the block vectorization of the fourthterm yields:

bvec(E[A1R∗iQRiAT1 ]) =(A1 ⊗b A1)bvec(Π), (61)

where here Q = MAΣATM, and Π = E[R∗iQRi]. The[k, l]-th block of Π is given by:

Πk,l =∑m

∑n

cm,kcn,`E[u∗m,ium,iQk,lun,iu∗n,i]

=∑m

cm,kcm,`E[u∗m,ium,iQk,lum,iu∗m,i]

+∑m

∑n

(1− δmn)cm,kcn,`E[u∗m,ium,iQk,lun,iu∗n,i]

(62)

Using this expression, we obtain:

Πk,l =∑m

cm,kcm,`[(β − 1)Ru,mQk,lRu,m

+Ru,mTr(Qk,lRu,m)]

+∑m

∑n

cm,kcn,`[Ru,mQk,lRu,n]

(63)

Some algebra and data rearrangement then give:

bvec(Π) =N∑m=1

[diag{vec(Cm1N×NCm)}

]⊗[(β − 1)(RTu,m ⊗Ru,m) + rmr

∗m

]bvec(Q)

+ (RT ⊗b R)bvec(Q) (64)

where in this relation Cm = diag(eTmC), rm = vec(Ru,m)and 1N×N is the N ×N matrix with unit entries. Finally, bycomputing the block vectorization of the first three terms in(22) and using (64), we arrive at (26).

REFERENCES

[1] R. Abdolee, B. Champagne, and A. H. Sayed, “Diffusion LMS strategiesfor parameter estimation over fading wireless channels,” in Proc. of IEEEInt. Conf. on Communications (ICC), Budapest, Hungary, June 2013, pp.1926–1930.

[2] R. Abdolee, B. Champagne, and A. Sayed, “Estimation of space-timevarying parameters using a diffusion LMS algorithm,” IEEE Trans. onSignal Processing, vol. 62, no. 2, pp. 403– 418, Jan. 2014.

[3] C. G. Lopes and A. H. Sayed, “Incremental adaptive strategies overdistributed networks,” IEEE Trans. on Signal Processing, vol. 55, no. 8,pp. 4064–4077, Aug. 2007.

[4] S. S. Ram, V. Veeravalli et al., “Distributed subgradient projectionalgorithm for convex optimization,” in Proc. IEEE Int. Conf. on Acoust.,Speech, Signal Process., Taipei, Taiwan, Apr. 2009, pp. 3653–3656.

[5] K. Srivastava and A. Nedic, “Distributed asynchronous constrainedstochastic optimization,” IEEE J. Selected Topics on Signal Processing,vol. 5, no. 4, pp. 772–790, Apr. 2011.

[6] S. Lee and A. Nedic, “Distributed random projection algorithm forconvex optimization,” IEEE J. Selected Topics on Signal Processing,vol. 7, no. 2, pp. 221–229, Apr. 2013.

[7] R. Abdolee, B. Champagne, and A. H. Sayed, “A Diffusion LMSStrategy for Parameter Estimation in Noisy Regressor Applications,” inProc. European Signal Processing Conference (EUSIPCO), Bucharest,Romania, Aug. 2012, pp. 749–753.

[8] ——, “Diffusion LMS for Source and Process Estimation in SensorNetworks,” in Statistical Signal Processing Workshop (SSP), 2012 IEEE,Aug. 2012, pp. 165–168.

[9] ——, “Diffusion Adaptation over Multi-Agent Networks with WirelessLink Impairments,” IEEE Trans. Mobile Computing, vol. 15, no. 6, pp.1362–1376, June 2016.

[10] D. Estrin, L. Girod, G. Pottie, and M. Srivastava, “Instrumenting theworld with wireless sensor networks,” in Proc. IEEE Int. Conf. onAcoust., Speech, Signal Process., Salt Lake City, Utah, USA, May 2001,pp. 2033–2036.

[11] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wirelesssensor networks: A survey,” IEEE Commun. Mag., vol. 38, pp. 393–422,Mar. 2002.

[12] R. Abdolee and B. Champagne, “Diffusion LMS algorithms for sensornetworks over non-ideal inter-sensor wireless channels,” in Proc. of Int.Conf. on Dist. Computing in Sensor Systems and Workshops, Barcelona,Spain, June 2011, pp. 1–6.

[13] P. Di Lorenzo and S. Barbarossa, “A bio-inspired swarming algorithmfor decentralized access in cognitive radio,” IEEE Trans. on SignalProcessing, vol. 59, no. 12, pp. 6160–6174, Dec. 2011.

[14] F. Cattivelli and A. H. Sayed, “Distributed detection over adaptivenetworks using diffusion adaptation,” IEEE Trans. on Signal Processing,vol. 59, no. 5, pp. 1917–1932, May 2011.

[15] L. Atzori, A. Iera, and G. Morabito, “The internet of things: A survey,”Computer Networks, vol. 54, no. 15, pp. 2787–2805, May 2010.

[16] M. Rabbat and R. Nowak, “Distributed optimization in sensor networks,”in Proc. Int. Symp. on Inf. Proc. in Sensor Networks (IPSN), 2004, pp.20–27.

[17] G. Mateos, I. D. Schizas, and G. B. Giannakis, “Performance analysisof the consensus-based distributed LMS algorithm,” EURASIP J. Adv.Signal Process., pp. 1–19, Nov. 2009.

[18] D. Jakovetic, J. Xavier, and J. M. Moura, “Weight optimization forconsensus algorithms with correlated switching topology,” IEEE Trans.on Signal Processing, vol. 58, no. 7, pp. 3788–3801, July 2010.

[19] K. Chan, A. Swami, Q. Zhao, and A. Scaglione, “Consensus algorithmsover fading channels,” in Military Communications Conference, SanJose, CA, Nov. 2010, pp. 549–554.

[20] S. Kar and J. M. Moura, “Distributed consensus algorithms in sensornetworks with imperfect communication: Link failures and channelnoise,” IEEE Trans. on Signal Processing, vol. 57, no. 1, pp. 355–369,Jan. 2009.

[21] ——, “Distributed consensus algorithms in sensor networks: Quantizeddata and random link failures,” IEEE Trans. on Signal Processing,vol. 58, no. 3, pp. 1383–1400, Mar. 2010.

[22] S. Kar and J. Moura, “Convergence rate analysis of distributed gossip(linear parameter) estimation: Fundamental limits and tradeoffs,” IEEEJ. Selected Topics on Signal Processing, vol. 5, no. 4, pp. 674–690, Aug.2011.

[23] S. Barbarossa and G. Scutari, “Bio-inspired sensor network design,”IEEE Signal Processing Magazine, vol. 24, no. 3, pp. 26–35, May 2007.

[24] P. Braca, S. Marano, and V. Matta, “Running consensus in wirelesssensor networks,” in Proc. Int. Conf. on Information Fusion, Cologne,Germany, June 2008, pp. 1–6.

[25] U. A. Khan and J. M. Moura, “Distributing the Kalman filter for large-scale systems,” IEEE Trans. on Signal Processing, vol. 56, no. 10, pp.4919–4935, Oct. 2008.

[26] I. D. Schizas, G. Mateos, and G. B. Giannakis, “Distributed LMS forconsensus-based in-network adaptive processing,” IEEE Trans. on SignalProcessing, vol. 57, no. 6, pp. 2365–2382, June 2009.

[27] T. Aysal, M. Yildiz, A. Sarwate, and A. Scaglione, “Broadcast gossipalgorithms for consensus,” IEEE Trans. on Signal Processing, vol. 57,no. 7, pp. 2748–2761, July 2009.

[28] A. Bertrand and M. Moonen, “Consensus-based distributed total leastsquares estimation in ad hoc wireless sensor networks,” IEEE Trans. onSignal Processing, pp. 2320–2330, May 2011.

[29] C. G. Lopes and A. H. Sayed, “Diffusion least-mean squares overadaptive networks: Formulation and performance analysis,” IEEE Trans.on Signal Processing, vol. 56, no. 7, pp. 3122–3136, July 2008.

[30] F. S. Cattivelli and A. H. Sayed, “Diffusion LMS strategies for dis-tributed estimation,” IEEE Trans. on Signal Processing, vol. 58, no. 3,pp. 1035–1048, Mar. 2010.

[31] L. Li and J. A. Chambers, “Distributed adaptive estimation based on theAPA algorithm over diffusion networks with changing topology,” Proc.IEEE Statist. Signal Process. Workshop, pp. 757–760, Cardiff, Wales,Sep. 2009.

[32] N. Takahashi, I. Yamada, and A. H. Sayed, “Diffusion least-meansquares with adaptive combiners: formulation and performance anal-ysis,” IEEE Trans. on Signal Processing, vol. 58, no. 9, pp. 4795–4810,Sept. 2010.

2325-5870 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2016.2578044, IEEETransactions on Control of Network Systems

12

[33] A. Khalili, M. Tinati, A. Rastegarnia, and J. Chambers, “Steady-stateanalysis of diffusion LMS adaptive networks with noisy links,” IEEETrans. on Signal Processing, vol. 60, no. 2, pp. 974–979, Feb. 2012.

[34] S. Y. Tu and A. H. Sayed, “Diffusion strategies outperform consensusstrategies for distributed estimation over adaptive networks,” IEEETrans. on Signal Processing, vol. 60, no. 12, pp. 6217–6234, Dec. 2012.

[35] F. S. Cattivelli, C. G. Lopes, and A. H. Sayed, “Diffusion recursiveleast-squares for distributed estimation over adaptive networks,” IEEETrans. on Signal Processing, vol. 56, pp. 1865–1877, May 2008.

[36] R. Abdolee and B. Champagne, “Distributed blind adaptive algorithmsbased on constant modulus for wireless sensor networks,” in Proc. IEEEInt. Conf. on Wireless and Mobile Commun., Sept. 2010, pp. 303–308.

[37] J. Chen and A. H. Sayed, “Diffusion adaptation strategies for distributedoptimization and learning over networks,” IEEE Trans. on Signal Pro-cessing, vol. 60, no. 8, pp. 4289–4305, Aug. 2012.

[38] A. H. Sayed, “Diffusion adaptation over networks,” in E-ReferenceSignal Processing, vol. 3, R. Chellapa and S. Theodoridis,Eds., pp 323–454, Academic Press, 2014. Also available asarXiv:1205.4220v1[cs.MA], May 2012.

[39] X. Zhao, S. Y. Tu, and A. H. Sayed, “Diffusion adaptation over networksunder imperfect information exchange and non-stationary data,” IEEETrans. on Signal Processing, vol. 60, no. 7, pp. 3460–3475, July 2012.

[40] A. Leon-Garcia, Probability and Random Processes for Electrical En-gineering. 2nd edition, Addison-Wesley, 1994.

[41] A. H. Sayed, Adaptive Filters. Wiley-IEEE Press, 2008.[42] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge

university press, 2009.[43] J. Nocedal and S. Wright, Numerical optimization. Springer Science

& Business Media, 2006.

Reza Abdolee received his Ph.D. from the De-partment of Electrical and Computer Engineering,McGill University, Montreal, Canada in 2014. He iscurrently with Qualcomm, Inc., San Diego,. CA. In2012, he was a research scholar at the Bell Labs,Alcatel-Lucent, Stuttgart, Germany. In 2011, he wasa visiting scholar at the Department of ElectricalEngineering, University of California, Los Angeles(UCLA). From 2006 to 2008, Dr. Abdolee workedas a staff engineer at the Wireless CommunicationCenter, University of Technology, Malaysia (UTM),

where he implemented a switch-beam smart antenna system for wirelessnetwork applications. His research interests include digital hardware design,communication theory, statistical signal processing and optimization. Dr.Abdolee was a recipient of several awards and scholarships, including, NSERCPostgraduate Scholarship, FQRNT Doctoral Research Scholarship, McGillGraduate Research Mobility Award and DAAD-RISE International Internshipscholarship from Germany.

Vida Vakilian received her Ph.D. degree fromEcolePolytechnique de Montreal, Canada, in Elec-trical Engineering in 2014. She is currently anAssistant Professor at California State University,Bakersfield (CSUB). Before joining CSUB, she hasbeen working on several research topics in collab-oration with industry and R&D laboratories. FromFeb. 2013 to Feb. 2014, she was a co-op Engi-neer at InterDigital Communications Inc., where sheworked on design and development of advancedsignal processing algorithms for the 3rd Generation

Partnership Project (3GPP) standards-based cellular systems. From Aug. toDec. 2012, she was a research intern at Bell Labs, Alcatel-Lucent, Germany.During this period, she contributed to 5GNOW project aiming to develop newrobust PHY layer concepts, suited for future wireless systems. From Apr.to Sep. 2011, she was a visiting scholar at University of California, Davis,and worked on analyzing and investigating the performance of reconfigurableMIMO systems in spatially correlated frequency-selective fading channels.Her research interests include communication theory, statistical signal pro-cessing and hardware design and integration.

Benoit Champagne received the B.Ing. degree inEngineering Physics from the Ecole Polytechniquede Montral in 1983, the M.Sc. degree in Physicsfrom the Universit de Montral in 1985, and the Ph.D.degree in Electrical Engineering from the Universityof Toronto in 1990. From 1990 to 1999, he wasan Assistant and then Associate Professor at INRS-Telecommunications, Universit du Quebec, Montral.In 1999, he joined McGill University, Montreal,where he is now a Full Professor within the De-partment of Electrical and Computer Engineering.

He also served as Associate Chairman of Graduate Studies in the Departmentfrom 2004 to 2007. His research focuses on the development and performanceanalysis of advanced algorithms for the processing of information bearingsignals by digital means.

His research has been funded by the Natural Sciences and EngineeringResearch Council (NSERC) of Canada, the Fonds de Recherche sur la Natureet les Technologies from the Government of Quebec, Prompt Quebec, as wellas some major industrial sponsors, including Nortel Networks, Bell Canada,InterDigital and Microsemi. He has been an Associate Editor for the IEEESignal Processing Letters, the IEEE Trans. on Signal Processing and theEURASIP Journal on Applied Signal Processing. He has also served on theTechnical Committees of several international conferences in the fields ofcommunications and signal processing. He is currently a Senior Member ofIEEE.