[IEEE 2011 International Joint Conference on Neural Networks (IJCNN 2011 - San Jose) - San Jose, CA,...

An Echo State Network Architecture Based on Volterra Filteringand PCA with Application to the Channel Equalization Problem

Levy Boccato, Amauri Lopes, Romis Attux and Fernando Jose Von Zuben

Abstract— Echo state networks represent a promising alter-native to the classical approaches involving recurrent neuralnetworks, as they ally processing capability, due to the existenceof feedback loops within the dynamical reservoir, with a sim-plified training process. However, the existing networks cannotfully explore the potential of the underlying structure, since theoutputs are computed via linear combinations of the internalstates. In this work, we propose a novel architecture for an echostate network that employs the Volterra filter structure in theoutput layer together with the Principal Component Analysistechnique. This idea not only improves the processing capabilityof the network, but also preserves the simplicity of the trainingprocess. The proposed architecture has been analyzed in thecontext of the channel equalization problem, and the obtainedresults highlight the adequacy and the advantages of the novelnetwork, which achieved a convincing performance, overcomingthe other echo state networks, especially in the most challengingscenarios.

I. INTRODUCTION

A key research topic in the field of intelligent signal pro-cessing is the study of effective solutions based on dynamicaland real-time systems. Due to the increasing complexityof the most relevant practical problems, the developmentand application of models endowed with feedback loops isvirtually a constant demand. Furthermore, it is essential thatthe proposed models be capable of performing nonlinearmappings, and also accessing the time-history of the inputand output signals, as well as the internal states.

In this context, recurrent neural networks (RNN) emerge asan interesting option, since they allow feedback connectionsbetween neurons belonging to distinct layers, which engen-ders the access to the history of the signals. The effective useof an RNN structure involves the adaptation of all connectionweights, including those associated with the feedback links,so that the network behaves adequately. Within a supervisedlearning framework, this is tipically carried out with the aidof methods like the backpropagation-through-time (BPTT)[1] and real-time recurrent learning (RTRL) [2].

However, all these approaches are associated with well-known drawbacks. For instance, during the training phase,computing the derivatives of the cost function with respect tothe parameters that must be adapted becomes an arduous taskdue to the feedback loops. Furthermore, the constant menace

Levy Boccato, Romis Attux and Fernando Jose Von Zuben arewith the Department of Computer Engineering and Industrial Automa-tion (DCA), Amauri Lopes is with the Department of Communica-tions (DECOM), School of Electrical and Computer Engineering (FEEC),University of Campinas, Sao Paulo, Brazil, (email: {lboccato, attux,vonzuben}@dca.fee.unicamp.br, [email protected]).

This work was sponsored by grants from CAPES, CNPq and FAPESP(2010/51027-8).

of reaching an unstable configuration for the network mayeclipse, to a certain extent, the processing capability intrinsicto these structures.

In this scenario, echo state networks (ESN), originallyproposed by Jaeger [3], represent a robust and promisingsolution, as they preserve, in a measure, the processingpotential of the RNN structure, and introduce a significantsimplification into the network training process. This iscleverly accomplished by setting the connection weightsof the recurrent internal layer, also called the dynamicalreservoir, without employing information about the desiredsignal. Thus, the remaining task is to adapt the weights of theoutput linear combiner, which can be performed by meansof any method that solves a least squares problem.

It is important to stress that the idea of creating thereservoir of dynamics without resorting to an error signalimplies a certain reduction in the representation capability ofthe network, especially if compared with an ideally-adjustedsimilar structure. Therefore, performing an adequate choiceof the reservoir weights is a vital element to the design ofthe ESN. This observation was the main concern of the workby Ozturk et al. [4]. Employing an information-theoreticmeasure, the authors analyzed the correlation between theecho states, and then proposed a method for selecting thevalues of the reservoir weights which favors the occurrenceof a higher degree of diversity when compared to the originalproposal of Jaeger [3].

Nevertheless, both ESNs cannot make a fully effective useof the higher-order statistics of the signals coming from thereservoir, since the output layer still corresponds to a linearcombiner, for the sake of keeping the training process assimple as possible. Thus, in order to overcome this limitation,moving towards the use of a nonlinear output layer may bean interesting perspective.

In this work, we propose a novel ESN architecture: thelinear combiner is replaced with the structure of a Volterrafilter [5], which not only contributes to a better exploitationof the information associated with the internal signals, butalso preserves the simplicity of the training process, sincethe problem of adjusting the output weights remains linearwith respect to the free parameters. Additionally, aiming toprevent the curse of dimensionality [6], i.e., the occurrenceof an excessive growth in the number of weights to beadapted as the number of echo states increases, we employedthe Principal Component Analysis (PCA) [7] [8] techniquebefore transmitting the echo states to the output layer, thusreducing the number of effective parameters that need to beadjusted.

Proceedings of International Joint Conference on Neural Networks, San Jose, California, USA, July 31 – August 5, 2011

978-1-4244-9637-2/11/$26.00 ©2011 IEEE 580

A relevant signal processing problem, and one that suitswell the features of echo state networks, is the problem ofchannel equalization, which is characterized by an intrinsictrade-off between processing capability and computationaleffort [9].

In simple terms, the problem of channel equalizationcorresponds to designing a device, called equalizer, whichis present in the receiver and whose role is to cancel thedistortions introduced by the physical environment usedfor transmission, thus enabling the correct recovery of theoriginal information. Although linear filters have been widelyused as equalizers due to their simplicity and mathematicaltractability, when distortions become too pronounced, theperformance of such equalizers is severely deteriorated. Inthese cases, we are forced to resort to more sophisticatedstructures [10].

During the last decades, the possibility of designingefficient nonlinear equalizers has been extensively inves-tigated, yielding a myriad of successful techniques basedon approaches like polynomial filters and artificial neuralnetworks [9]. This possibility is an additional motivation for asystematic analysis of the application of echo state networksto the channel equalization problem, which constitutes themain objective of this work. The other main contributionconsists, as outlined above, of the proposal and evaluationof a new ESN architecture, which, as the results shall reveal,corresponds to a promising and efficient solution for bothoffline and real-time applications.

This paper is organized as follows: Section II describes thearchitecture of the echo state network proposed by Jaeger [3],as well as the model introduced in [4], whereas the proposednetwork is explained in Section III. The fundamentals of thechannel equalization problem are presented in Section IV,focusing on the supervised case, i.e., when a desired signalis used to guide the network adaptation. Next, Section Vdisplays the set of results obtained with the ESN, confirmingthe adequacy of the application of ESN in the equalizationproblem, as well as highlighting the benefits of the proposednetwork. Finally, some concluding remarks and future per-spectives are presented in Section VI.

II. ECHO STATE NETWORKS

Consider a discrete-time recurrent neural network with 𝐾input units, 𝑁 internal units and 𝐿 outputs, as depicted inFigure 1.

The vector u(𝑛) = [𝑢1(𝑛) . . . 𝑢𝐾(𝑛)]𝑇 denotes the

activations of the input units, which are transmitted to theinternal neurons by means of a linear combination. Theweight matrix W𝑖𝑛 ∈ ℛ𝑁×𝐾 specifies the coefficients ofsuch combinations.

The internal layer, also called dynamical reservoir, is com-posed of fully-connected nonlinear units whose activations,given by x(𝑛) = [𝑥1(𝑛) . . . 𝑥𝑁 (𝑛)]

𝑇 , correspond to thenetwork states, and are updated as follows:

x(𝑛+ 1) = f(W𝑖𝑛u(𝑛+ 1) +Wx(𝑛) +W𝑏𝑎𝑐𝑘y(𝑛)

),(1)

where W ∈ ℛ𝑁×𝑁 brings the weights of the connectionswithin the reservoir, W𝑏𝑎𝑐𝑘 ∈ ℛ𝑁×𝐿 contains the weightsassociated with the connections that project the output sam-ples back to the internal units, and f(⋅) = (𝑓1(⋅), . . . , 𝑓𝑁 (⋅))denotes the activation functions of the internal units.

Finally, the outputs of the network, represented by thevector y(𝑛) = [𝑦1(𝑛) . . . 𝑦𝐿(𝑛)]

𝑇 , are determined by thefollowing expression:

y(𝑛+ 1) = f𝑜𝑢𝑡(W𝑜𝑢𝑡x(𝑛+ 1)

), (2)

where W𝑜𝑢𝑡 ∈ ℛ𝐿×𝑁 is the output weight matrix andf𝑜𝑢𝑡(⋅) = (𝑓𝑜𝑢𝑡

1 (⋅), . . . , 𝑓𝑜𝑢𝑡𝐿 (⋅)) specifies the activation func-

tions of the output units.

u(𝑛) x(𝑛)

W𝑖𝑛

W

W𝑜𝑢𝑡

W𝑏𝑎𝑐𝑘

y(𝑛)

Fig. 1. The basic recurrent neural network architecture.

In 2001, Jaeger studied the dynamics of the RNN architec-ture of Figure 1, verifying that, under certain circumstances,the network states x(𝑛) become asymptotically independentof the initial condition. In other words, if the network startsfrom two distinct initial states x(0) and x(0), and the samesequence of input signals is received, then the obtainedsequences of states x(𝑛) and x(𝑛) converge to close values.When this property holds, the effect of the initial statesvanishes, and the reservoir dynamics depends solely on theinput history, so that the network has echo states [3].

Moreover, Jaeger demonstrated two sufficient conditionswith respect to echo states: the first condition determinesthat in order to an RNN present the echo state property,the largest singular value of the internal weight matrix Wmust be smaller than unity (𝜎𝑚𝑎𝑥 (W) < 1) [3] [11] 1; thesecond condition establishes the non-existence of echo statesin terms of the spectral radius (the largest among the absoluteeigenvalues) of the internal weight matrix W: if ∥𝑊∥ > 1,then the network does not have echo states 2.

Based on these observations, Jaeger proposed an ingeniousstrategy that simplifies the adaptation process of the RNNstructure: firstly, construct a weight matrix W satisfying

1This condition is valid only in the context of RNNs without outputfeedback and with tanh(⋅) as the internal neuron activation function.

2This condition requires no output feeback, tanh(⋅) nonlinearity and zeroinput.

581

∥𝑊∥ < 1 3; next, arbitrarily define the input weight matrixWin, since it does not harm the echo state property; finally,by employing a linear combiner as the output layer, theadaptation of the remaining weights is converted into theproblem of minimizing the mean-squared error. Hence, anylinear regression algorithm can be used to perform such task.This idea forms the essence of the ESNs [3].

ESNs represent a promising and interesting alternativeto the conventional RNNs as they are capable of bringingtogether the ability of accessing the time history of the inputand output signals - which they inherit from the RNNs- and a simple training procedure, since the connectionweights within the reservoir are predefined. The strategy ofmaintaining such weights fixed is what provides agility to thetraining algorithm, in contrast to the standard approach forthe adaptation of recurrent neural networks, which involvesadjusting the whole set of weights. The works in [12], [13],[14] and [15] are examples of successful applications of theESN approach to many different problems, such as systemidentification, time series prediction and motor control.

It is important to remember that the dynamical reservoiris completely specified by the weight matrix W, and itshould be as rich as possible in order to allow an adequateapproximation of the desired signal. However, in accordancewith the spirit of the echo state approach, the reservoiris designed without access to information regarding theparticularities of the application, which, in a certain sense,characterizes a reduction in the representation capability ofthe network when compared to a similar structure with anadjustable weight matrix W.

Recently, Ozturk et al. introduced the idea of employingthe average state entropy (ASE) as a measure of the richnessof the ESN reservoir [4]. Moreover, they observed that thediversity of the echo states is, in some degree, a reflex ofthe structure of the eigenvalues of W. This study led to theproposal of a different method for designing the reservoir:in order to maximize the ASE, the poles of the linearizedsystem derived from the ESN dynamics should be uniformlydistributed - given a chosen radius - within the unit circle,which evokes the conceptual framework of Kautz filters.

Henceforth, we shall refer to this latter network with theacronym ASE-ESN, whereas ESN will correspond to theoriginal network conceived by Jaeger.

III. PROPOSED NETWORK

As outlined above, the ESN can be interpreted in terms ofa trade-off between performance and simplicity. On the onehand, the learning methodology employed in the ESN cannotactualize all the potential from the underlying processingstructure, since the recurrent connections within the reservoirare not adapted by means of an error signal. On the otherhand, relinquishing such adaptation avoids the difficultiesinherent to classical RNN learning algorithms, which is aremarkable feature.

3As remarked by Jaeger, in all experiments he carried out, this weakercondition practically ensured the echo state property, even though theproperty itself is stated in terms of 𝜎𝑚𝑎𝑥.

In the core of this trade-off lies the design of the echo statereservoir. Since no information regarding an error signal isavailable, the idea is to produce the highest possible diversityof dynamics, in order to better supply with relevant informa-tion the output layer, which is then adapted with the aidof a reference signal. However, in the existent proposals ofecho state networks, the output layer corresponds to a linearcombiner, which cannot exploit the higher-order statistics ofthe information coming from the nonlinear dynamics of theneurons of the reservoir.

In order to circumvent this problem, the idea of employ-ing a nonlinear output layer emerges. However, aiming atpreserving the spirit of simplicity inherent to the ESN, it iscrucial that the output layer remains linear with respect to thefree parameters: after all, in this case, it is possible to find aclosed-form solution in the least-squares (or Wiener) sense[9]. Taking into account these facts, our proposal is to usea Volterra filter structure for the output layer [5]. Thus, thenetwork outputs shall be composed of linear combinationsof polynomial terms, as follows:

𝑦𝑖(𝑛) = ℎ0 +

𝑁∑

𝑖=1

ℎ1(𝑖)𝑥𝑖(𝑛) +

𝑁∑

𝑖=1

𝑁∑

𝑗=1

ℎ2(𝑖, 𝑗)𝑥𝑖(𝑛)𝑥𝑗(𝑛)

+𝑁∑

𝑖=1

𝑁∑

𝑗=1

𝑁∑

𝑘=1

ℎ3(𝑖, 𝑗, 𝑘)𝑥𝑖(𝑛)𝑥𝑗(𝑛)𝑥𝑘(𝑛) + . . . , (3)

where 𝑥𝑘(𝑛) represents the k-th echo state in the instant 𝑛,and 𝑦𝑖(𝑛) is the i-th network output. It is straightforwardto see that the coefficients ℎ𝑖(⋅) are linearly related to theoutput, as desired.

However, the Volterra structure introduces a new concern:as the number of echo states (𝑁 ) increases, the number ofkernels tends to grow dramatically. Indeed, this becomesevident if we observe the expression for the number of non-ambiguous kernels 𝑁𝑘𝑒𝑟:

𝑁𝑘𝑒𝑟 = 1+𝑁+𝑁(𝑁 + 1)

2+𝑁(𝑁 + 1)(𝑁 + 2)

6+ . . . (4)

As we can see, depending on the maximum order of thekernels we intend to use, the respective number of coeffi-cients that must be adjusted tends to be severely increased,which threatens the practical application of this structurealong with the ESN.

Aiming to mitigate this problem, we decided to use astrategy capable of reducing the number of effective sig-nals transmitted to the output layer, based on the classicalcompression technique called Principal Component Analysis(PCA) [7] [8] [16]. Thus, instead of transmitting the wholeset of echo states, we use only a relatively small numberof principal components, which reduces the quantity ofweights that need to be adjusted without necessarily causinga significant loss of information.

Let x ∈ ℝ𝑁×𝑇𝑠 be the matrix containing the activation of

the internal units considering 𝑇𝑠 input samples. The sample

582

covariance matrix of the network states is computed via

C =xx𝑇

𝑇𝑠, (5)

and V ∈ ℂ𝑁×𝑁𝑝𝑐 is the matrix formed by the eigenvectors

of C that correspond to its 𝑁𝑝𝑐 largest eigenvalues4. Hence,the principal components are given by [8]:

s𝑖 = V𝑇𝑖 x, 𝑖 = 1, . . . , 𝑁𝑝𝑐. (6)

The perspective of using PCA is encouraged by the ob-servation that there is a non-negligible degree of redundancebetween the echo states. In fact, the study carried out in [4]already evinced the presence of a strong correlation betweensuch signals, which means that it is possible to retain a fewcomponents of x(𝑛) representing the most significant portionof the dynamics coming from the reservoir. Moreover, wemay benefit from PCA as it decreases the computationalcomplexity of the subsequent operations, and it tends toreduce noise, since the data not contained in the first 𝑁𝑝𝑐

components may be mostly due to this random factor.To summarize, our proposal is to employ PCA and the

Volterra filter in order to achieve compression and a reductionof the number of coefficients to be adapted, and also to obtaina more effective exploitation of the information regarding theecho states.

As the proposed network - which can be applied to generalsignal processing tasks - will be evaluated in the contextof channel equalization, it is now important to describe thefundamentals of this problem in more detail.

IV. FUNDAMENTALS OF CHANNEL EQUALIZATION

A communication system is designed to allow that infor-mation be efficiently sent from a transmitter to a receiverusing an available channel. In practice, successful communi-cation inevitably requires that the received data be suitablyprocessed, as the channel is always responsible for distortingto some degree the signals it conveys. For our purposes, thisdistortion can be modeled in terms of a system whose inputand output are, respectively, the transmitted signal 𝑠(𝑛) andthe received signal 𝑟(𝑛), as shown in Figure 2.

𝑠(𝑛)Channel

𝑟(𝑛)

Fig. 2. Model of a Communication System.

The nature and the intensity of the modifications sufferedby the sent data are essentially related to the features ofthe channel model, which can be either linear or nonlinear,for example, and also possess different degrees of stochasticcharacter.

A well-established strategy to counterbalance the channeleffects is to employ a specially-tailored filter - an equalizer- at the receiver. The role of this filter must be inverse to

4It is important to remember that the network states should have zeromean in order to correctly employ the PCA technique.

that played by the channel, and, ideally, its output shouldcorrespond to an undistorted version of the transmittedsignal, i.e.,

𝑦(𝑛) = 𝑘𝑠(𝑛− 𝑑), (7)

where 𝑦(𝑛) is the equalizer output, 𝑘 is a constant gain and 𝑑denotes the equalization delay. This condition is commonlyreferred to as zero-forcing [9].

The design of an equalizer encompasses the choices of:1) a suitable filtering structure; 2) a sound equalizationcriterion; and 3) an efficient optimization approach to adjustthe parameters of the filter.

The first choice is crucial, as it sets an inexorable limitto the reachable equalization performance. The linear ornonlinear character of the channel, the degree of influence ofpast samples on the present channel output (i.e. the channelmemory), the nature / intensity of the existing noise - aninherently stochastic factor -: each of these factors createsdifficulties that must be coped with by the filtering structure.A linear channel can be, in some cases, properly dealt withusing a linear equalizer, but, contrarily to what might beexpected at first sight, a nonlinear structure can be decisive tosolve certain linearly-generated problems [10]. The channelmemory has a deep influence on the required equalizermemory, both for linear and nonlinear models, and in thiscontext the use of recurrent structures arises in a naturalway. Finally, the structure of the existing channel noise mayrequire the use of a nonlinear structure - as a consequence,for example, of a pronounced non-Gaussian character [17] -or even of a recurrent structure if temporal dependence is akey factor.

The second choice amounts essentially to a translation ofthe aim of the equalization task - vide equation (7) - intomathematical terms. A classical choice has been to use amean-squared error (MSE) criterion, also known as Wienercriterion, which leads to the following cost function:

𝐽𝑀𝑆𝐸 = 𝐸{[𝑠(𝑛− 𝑑)− 𝑦(𝑛)]2

}(8)

where 𝐸{⋅} corresponds to the statistical expectation opera-tor. Supervised alternatives to the MSE have been proposedin the context of the research field called information-theoretic learning [17], but they will not be used in this work.It is also important to mention the existence of unsupervisedequalization methods [8], which are based on a statisticalframework centered, either directly or indirectly, on higher-order moments.

Finally, given the structure and the chosen criterion, theremaining task is to employ an optimization approach to findan actual solution with respect to the filter free parameters.In the MSE context, if the filtering structure is feedforwardand linear with respect to its free parameters, the engenderedcost function has a single minimum, the so-called Wienersolution, which can be reached by means of a direct least-squares estimation procedure or of iterative methods like theLMS and RLS algorithms [9]. On the other hand, if thestructure is nonlinear with respect to the free parametersand/or recurrent, the MSE cost function may possess multiple

583

optima, which establishes a more complex search problem.From this discussion, we are led to some considerations.

It is undoubtedly possible to affirm that, from a strictlystructural point of view, the most desirable equalizationdevice is a nonlinear and recurrent structure, as it would becapable of dealing in the most general way with both memoryand mapping requirements. However, from an optimizationstandpoint, to find the optimal MSE solution may be difficultboth due to the cost function multimodality and also to thenecessity of properly defining the direction and stepsize ofthe adjustment in iterative algorithms.

Interestingly, as discussed earlier in this work, the echostate approach emerges as an attractive equalization solution,as it is both nonlinear and recurrent without posing complexoptimization problems. As a consequence, the remainderof this work should be interpreted not only as a series oftests involving a new echo state network, but also as asystematic analysis of the potential application of the echostate approach to equalization tasks.

V. EXPERIMENTAL RESULTS

In this section, we present the methodology employedfor the training and test of the echo state networks in thechannel equalization problem, as well as the performanceachieved by each network considering different scenarios. Allthe experiments were carried out in Matlab R⃝ environment.

A. Methodology

In all simulations, the source signal 𝑠(𝑘) assumes thevalues {+1,−1} with equal probability (BPSK modulation).Each echo state network aims to estimate the original infor-mation 𝑠(𝑘) using solely the received signal 𝑟(𝑘) at the sametime instant, meaning that there is no equalization delay, andthat the number of inputs and outputs are always equal toone (𝐾 = 𝐿 = 1), which characterizes a challenging casefrom the equalization standpoint. Additionally, we do notconsider the presence of noise, which means that the inputsof the ESNs 𝑢(𝑘) are exactly equal to 𝑟(𝑘), except in oneparticular scenario, in which we also analyzed the influenceof the signal-to-noise ratio (SNR) over the performance ofthe ESNs.

Each network was trained using 𝑇𝑠 = 1100 samples ofthe input signal, being the first hundred samples discardedto eliminate transient effects. The same procedure was em-ployed in the testing phase.

The basic measure used to assess the quality of theequalization performed by the ESNs is the mean-squarederror (MSE) between the desired signal 𝑑 and the outputoffered by the networks, 𝑦𝐸𝑆𝑁 , which is estimated as shownin Equation (9).

𝑀𝑆𝐸 =1

𝑇𝑠

𝑇𝑠∑

𝑖=1

[𝑑(𝑖)− 𝑦𝐸𝑆𝑁 (𝑖)]2 (9)

In order to consistently evaluate the performance of thenetworks, we considered an average MSE (AMSE) for 20independent runs. Additionally, we present the standard

deviation of the MSE values obtained in such experiments,which offers an interesting view over the variability of theperformance achieved with the echo state networks in theequalization of each communication channel.

We also let the number of internal neurons assume differ-ent values, so that we may observe their influence over theperformance of each network. A small exception was madewith respect to the proposed network: based on preliminarysimulations, the number of internal units remains equal to𝑁 = 40 in all experiments, whereas the number of principalcomponents varies.

Based on the procedure described in [3], the ESN wasconstructed as follows: the input weights were set to −1or +1 with equal probability, whereas the entries of thereservoir weight matrix W were set to −0.4, 0.4 and 0 withprobabilities of 0.025, 0.025 and 0.95, respectively.

For both the ASE-ESN and the proposed network, Win

was adjusted similarly to the procedure used in the ESN.With respect to the reservoir weight matrix W, the methodproposed in [4] was employed for the ASE-ESN, using aspectral radius equal to 0.8. Since the proposed network canbe easily adapted to different techniques for the reservoirdesign, we analyzed its performance using the methodsoffered by [3] and [4]. We shall identify these versions asproposed ESN and proposed ASE-ESN, respectively. For thesake of brevity, we restricted the presentation of the resultsfor the approach which led to the best performance. Finally,Wback was set to zero for all networks, as performed in [3]and [4].

For all networks, given a desired signal 𝑑(𝑛), the out-put weight matrix Wout was determined according to theWiener solution [9]. However, it is pertinent to emphasizethat the output weights are related to different signals: inthe ESN and the ASE-ESN, the echo states x are linearlycombined to generate the outputs. On the other hand, asexplained in Section III, the proposed network employsthe PCA and the Volterra filter as the basic framework togenerate the outputs, and, in this case, the output weightscorrespond to the coefficients used in the combination ofthe kernel functions. Another relevant aspect is that, in theproposed network, we employed a third-order Volterra filterto compute the outputs, without considering the quadraticterms for the sake of simplicity.

Having described the methodology and the main aspectsrelated to the design of each network, we proceed to thepresentation and analysis of the results obtained with theESNs.

B. First Channel

The first channel we shall consider is characterized by thefollowing transfer function:

𝐻(𝑧) = 0.5 + 𝑧−1 (10)

Even though this seems to be a relatively simple channel,it cannot be equalized with linear filters when the delay inthe equalization is zero, since, in this case, the correspondingchannel states cannot be linearly separated.

584

The ESNs have been trained and tested according tothe methodology described in the previous section, and thecomparative performance of the architectures are presentedin Table I. Exceptionally in this scenario, the AMSE valuesobtained with both the proposed ESN and proposed ASE-ESN are displayed. The values presented inside parenthesescorrespond to the standard deviation.

TABLE I

AMSE VALUES OBTAINED WITH EACH ECHO STATE NETWORK

CONSIDERING CHANNEL 1.

Channel 1 AMSENetwork Parameter Training Test

ESN𝑁 = 10 2.66(±3.22)e-01 2.72(±3.27)e-01𝑁 = 40 1.87(±2.65)e-05 2.07(±3.03)e-05𝑁 = 60 2.37(±2.19)e-05 2.74(±2.58)e-05

ASE-ESN𝑁 = 10 1.24(±0.60)e-01 1.28(±0.59)e-01𝑁 = 40 1.65(±0.97)e-02 2.54(±3.27)e-02𝑁 = 60 9.56(±7.08)e-03 1.72(±2.88)e-02

Proposed𝑁𝑝𝑐 = 3 1.02(±0.78)e-02 1.06(±0.78)e-02𝑁𝑝𝑐 = 5 1.09(±0.81)e-03 1.58(±1.78)e-03

ASE-ESN 𝑁𝑝𝑐 = 6 4.09(±4.04)e-04 8.61(±17.7)e-04𝑁𝑝𝑐 = 10 4.71(±2.47)e-06 6.63(±18.2)e-04


ESN 𝑁𝑝𝑐 = 6 2.56(±3.80)e-06 3.06(±4.60)e-06𝑁𝑝𝑐 = 10 3.34(±4.02)e-10 2.06(±4.86)e-08

Some interesting remarks can be drawn from Table I.Firstly, it is possible to observe that, as expected, theperformance of the ESNs are improved when the numberof neurons inside the reservoir, or the number of principalcomponents, is increased. Secondly, by comparing the AMSEvalues obtained with the ASE-ESN and the proposed ASE-ESN, the benefits acquired with the use of a nonlinear outputlayer become evident. Indeed, consider the case where 𝑁 =60 for the former network, and 𝑁𝑝𝑐 = 6 for the proposedone, which means that the number of output weights to beadapted in both networks is approximately equal (60 and 62weights, respectively). In this situation, the proposed networkled to a significantly better performance.

We can also observe in Table I that there is a disparitybetween the AMSE values with the ESN and the ASE-ESN,which suggest that, for this particular case, designing thereservoir weight matrix according to the Jaeger’s proposalleads to a better performance. This observation is corrobo-rated when we compare the AMSE values associated witheach version of the proposed network. As shown in TableI, the proposed model achieved a better performance whenthe reservoir was designed as in the ESN, reaching smallerAMSE values than its counterpart, which, again, indicatesthe potential of the proposed architecture.

One last pertinent comment is related to the variability ofthe MSE values obtained with each network: it is possible toinfer from Table I, with the aid of the standard deviationvalues, that the proposed model, even in the worst case,offers a significant improvement in the performance, for bothtraining and test sequences, when compared with the ESNand the ASE-ESN.

C. Second Channel

In 1999, Montalvao et al. [18] demonstrated that feed-forward structures, either linear or nonlinear, are not suita-ble tools for the equalization of communication channelsthat present coincident states, i.e., whose transfer functionpresents zeros over the unit circle. Therefore, this class ofchannels demands the use of feedback connections in theequalizer in order to allow a correct separation of the states.One example of such class is the channel with transferfunction 𝐻(𝑧) = 1 + 𝑧−1: both the input sequences ofsymbols (𝑠(𝑘) = −1, 𝑠(𝑘−1) = 1) and (𝑠(𝑘) = 1, 𝑠(𝑘−1) =−1) generate the value 𝑟(𝑘) = 0.

For this second scenario, Table II displays the obtainedAMSE values with each echo state network.

TABLE II

AMSE VALUES OBTAINED WITH EACH ESN FOR CHANNEL 2.


ESN𝑁 = 10 7.97(±4.87)e-02 7.58(±4.84)e-02𝑁 = 40 1.96(±2.22)e-03 2.05(±1.78)e-03𝑁 = 60 1.54(±1.58)e-04 7.34(±9.52)e-04



ASE-ESN 𝑁𝑝𝑐 = 6 5.90(±7.52)e-04 1.56(±1.83)e-03

The obtained results confirm that the presence of recurrentconnections within the reservoir enables the ESNs to main-tain a memory of the past received signals, which is decisiveto an adequate recovery of the original information. Thismeans that the idea of fixing the reservoir weights, whichsimplifies the training process of the recurrent architecture,still provides sufficient processing capability for the network.

However, the proposed network was not able to offerany improvement in terms of AMSE, as we can observein Table II. The main reason behind the slight performancedegradation for the proposed network can be associated withthe essence of this equalization problem, which may be moredependent on the existence of feedback connections than onthe intrisic difficulty of the required mapping. Nevertheless,we emphasize that the performance of the proposed networkwas quite satisfactory.

D. Third Channel

Let 𝐻(𝑧) = 0.5 + 0.71𝑧−1 + 0.5𝑧−2 be the transferfunction of the channel. According to Proakis [19], this isthe linear channel with three coefficients that introduces themost severe distortions into the transmitted signal.

Firstly, as performed in the previous scenarios, we shallanalyze the performance of each echo state network consid-ering the noiseless case. The obtained results are presentedin Table III.

The AMSE values displayed in Table III clearly reveal theperformance gain associated with the novel structure. In fact,

585

the application of a nonlinear output layer, which forms thebasic idea of the proposal, led to a significant performanceimprovement with respect to the ESN and the ASE-ESN.

TABLE III



ESN𝑁 = 10 3.03(±1.07)e-01 3.08(±1.13)e-01𝑁 = 40 4.73(±1.51)e-02 5.18(±1.61)e-02𝑁 = 60 1.72(±0.74)e-02 2.01(±0.68)e-02



ESN 𝑁𝑝𝑐 = 6 5.56(±5.95)e-04 6.60(±7.20)e-04𝑁𝑝𝑐 = 10 9.00(±8.40)e-07 1.01(±1.88)e-05

It is pertinent to remark that employing the method of [4]for designing the reservoir weight matrix of the proposedarchitecture also offered a good performance, although thebenefits were not as significant as occured with the proposedESN.

Now, we consider the case where the received samples arecorrupted with a Gaussian additive noise. This means thatthe inputs of the echo state networks are given by 𝑢(𝑘) =𝑟(𝑘) + 𝜎𝒩 (0, 1), where 𝜎 denotes the standard deviation ofthe noise and 𝒩 (0, 1) corresponds to a zero-mean and unit-variance Gaussian random variable.

Assuming 𝑁 = 60 for both the ESN and the ASE-ESN,and 𝑁𝑝𝑐 = 6 for the proposed network, we performed 20independent experiments for each network considering fivedifferent signal-to-noise values. The obtained AMSE valuesare exhibited in Table IV. With respect to the design of thereservoir, we adopted the procedure in [3] for the proposednetwork, since this choice led to a better performance in thenoiseless case.

TABLE IV

AMSE VALUES ASSOCIATED WITH EACH ECHO STATE NETWORK

VERSUS THE SNR.

Channel 3 AMSESNR Network Training Test

0 dBESN 8.0714e-01 9.3295e-01

ASE-ESN 8.0659e-01 1.0009e-00Proposed ESN 8.0567e-01 9.1407e-01

5 dBESN 6.6104e-01 7.5504e-01


10 dBESN 3.8648e-01 4.4366e-01


15 dBESN 1.6712e-01 1.9371e-01


20 dBESN 7.6860e-02 9.0315e-02


As shown in Table IV, there is a significant increase in thereachable error levels as the noise becomes more pronounced.Nonetheless, for all the values of SNR, the proposed networkled to the best performance, albeit the difference betweenthe AMSE values associated with each network is relativelysmall.

E. Fourth Channel

The next channel involved in this work is defined by thetransfer function 𝐻(𝑧) = 0.38+0.6𝑧−1+0.6𝑧−2+0.38𝑧−3,and presents coincident states, which, as already mentionedin Section V-C, can only be adequately separated usinga structure endowed with feedback connections. Moreover,as remarked in [19], this channel corresponds to the four-coefficient channel that poses the most severe distortions tothe transmitted signal.

The procedure adopted for the analysis of the performanceof the ESNs followed the methodology described in SectionV-A, with the difference that the number of samples usedduring the training and test of each network is increasedto 5500, where the first 500 samples are excluded to avoidtransient effects. The obtained results are shown in Table V.

TABLE V



ESN𝑁 = 10 5.24(±0.89)e-01 5.25(±0.90)e-01𝑁 = 40 1.74(±0.16)e-01 1.73(±0.19)e-01𝑁 = 60 1.37(±0.11)e-01 1.39(±0.11)e-01



ESN 𝑁𝑝𝑐 = 6 3.78(±1.33)e-02 3.85(±1.35)e-02𝑁𝑝𝑐 = 10 1.89(±0.92)e-03 2.27(±1.43)e-03

As we can observe in Table V, the proposed networkhas achieved significantly better results than those associatedwith the ESN and the ASE-ESN, which means that, inthis case, not only the presence of recurrent connectionsis necessary, but also a nonlinear output layer is definetelyessencial for an adequate separation of the channel states.These results, along with those reported in Section V-D, showthat, when the distortions in the transmitted signal becomemore pronounced, the flexibility provided by the proposednetwork is really capable of leading to more effective channelequalization.

F. Fifth Channel

Finally, we consider the case where the channel is nonli-near, as shown in the following expression:

𝑦𝑐ℎ𝑎𝑛𝑛𝑒𝑙(𝑘) = 𝑦(1)(𝑘) + 0.25𝑦2(1)(𝑘), (11)

where 𝑦(1)(𝑘) represents the outputs of the first linear chan-nel defined in Equation (10).

586

The nonlinear nature of the channel poses additionaldifficulties to the equalization task, so that the ability of per-forming nonlinear mappings represents a decisive element forenabling an adequate recovery of the information. Therefore,this scenario is very useful for the analysis of the processingcapability of each ESN. Ultimately, the importance of thisscenario is also emphasized by the evidence that, in certaincontexts, like satellite communications, nonlinear channelsmay be good representations of naturally occurring channels.

In Table VI, the performance obtained with the ESNs ispresented.

TABLE VI

PERFORMANCE OF THE ESNS FOR THE NONLINEAR CHANNEL.


ESN𝑁 = 10 5.66(±1.84)e-01 5.70(±1.85)e-01𝑁 = 40 9.97(±3.94)e-02 1.04(±0.38)e-01𝑁 = 60 4.38(±2.26)e-02 5.34(±2.77)e-02



ESN 𝑁𝑝𝑐 = 6 2.19(±2.20)e-05 2.86(±3.25)e-05𝑁𝑝𝑐 = 10 1.22(±1.34)e-08 3.21(±5.19)e-07

The results shown in Table VI clearly reveal the advan-tages of the proposed network in this case: indeed, the AMSEvalues achieved are many orders of magnitude smaller thanthose associated with the ESN and the ASE-ESN.

Therefore, based on these evidences, it is possible to affirmthat the use of a more flexible structure as the output layer ofthe echo state network, like the Volterra filter, was certainlya key factor for the good performance achieved by theproposed ESN architecture, especially in the cases for whichthe distortions became more severe. Furthermore, with theaid of the PCA technique, the verified improvement in thequality of the equalization was reached without a significantincrease in the complexity of the network training process,which is another attractive feature offered by the proposal.

VI. CONCLUSIONS

In this paper, we have presented a novel echo state networkarchitecture, characterized by the use of PCA compressionand a Volterra-based output layer. These extensions providedsignificant improvements in the capability of extracting theinformation related to the echo states, as well as in theability to deal with nonlinear behaviors, which are essentialcharacteristics within the spirit of the ESNs.

The channel equalization problem has been selected as theapplication domain for the proposed network and the otherESNs, especially because of its intrinsic trade-off betweenprocessing capability and computational cost. The resultsobtained in this work not only have demonstrated the ade-quacy of the echo state approach in the equalization problem,but also reveal the increment in performance brought bythe proposed network. This becomes more evident in the

cases where the channel introduces severe distortions intothe signals, particularly in the nonlinear channel case, whenthe proposed network obtained a remarkable performance.

There still remain many different issues to be considered infuture works, like, for example, the design of the dynamicalreservoir. As analyzed in Section V, the procedure adoptedto select the reservoir weight matrix exerts a significantinfluence over the performance of the ESNs. In the contextof supervised channel equalization, the original proposal ofJaeger led to a superior performance in almost all scenarios.However, this does not necessarily mean that this procedurewill always be the most adequate choice, as, indeed, has beenverified in [4]. This means that deeper investigations on thissubject are necessary. Fortunately, the proposed architecturecan be straightforwardly adapted to different proposals forthe reservoir design, which means that it may also profit fromthese studies. Moreover, it should be interesting to extendthe application of the proposed network to the case of blindchannel equalization and to other classical problems, like thatof time series prediction.

REFERENCES

[1] P. J. Werbos, “Backpropagation through time: what it does and howto do it,” in Proceedings of the IEEE, vol. 78, no. 10, 1990, pp. 1550–1560.

[2] R. Williams and D. Zipser, “A learning algorithm for continuallyrunning fully recurrent neural networks,” Neural Computation, vol. 1,pp. 270–280, 1989.

[3] H. Jaeger, “The echo state approach to analyzing and training recurrentneural networks,” Bremem: German National Research Center forInformation Technology, Tech. Rep. GMD Report 148, 2001.

[4] M. C. Ozturk, D. Xu, and J. C. Principe, “Analysis and design of echostate networks,” Neural Computation, vol. 19, pp. 111–138, 2007.

[5] V. J. Mathews, “Adaptive polynomial filters,” IEEE Signal ProcessingMagazine, vol. 8, pp. 10–26, 1991.

[6] R. E. Bellman, Dynamic Programming. Princeton University Press,1957.

[7] I. T. Jolliffe, Principal Component Analysis. Springer-Verlag, 1986.[8] A. Hyvarinen, J. Karhunen, and E. Oja, Independent component

analysis. John Wiley & Sons, 2001.[9] S. Haykin, Adaptive filter theory, 3rd ed. NJ: Prentice Hall, 1996.

[10] T. Adali, “Why a nonlinear solution for a linear problem?” in Proceed-ings of IEEE Workshop on Neural Networks for Signal Processing,1999, pp. 157–165.

[11] H. Jaeger, “Short term memory in echo state networks,” Bremem:German National Research Center for Information Technology, Tech.Rep. 152, 2002.

[12] ——, “Adaptive nonlinear system identification with echo state net-works,” in Advances in Neural Information Processing Systems, 2003,pp. 593–600.

[13] H. Jaeger and H. Hass, “Harnessing nonlinearity: Predicting chaoticsystems and saving energy in wireless communication,” Science, vol.304, no. 5667, pp. 78–80, 2004.

[14] M. Salmen and P. Ploger, “Echo-state networks used for motorcontrol,” Robotics and Automation, vol. 18, pp. 1953–1958, 2005.

[15] R. Sacchi, M. Ozturk, J. Principe, A. Carneiro, and I. da Silva, “Waterinflow forecasting using the echo state network: a brazilian case study,”in Proceedings of International Joint Conference on Neural Networks(IJCNN 2007), 2007, pp. 2403–2408.

[16] M. Kendall, Multivariate analysis. Charles Griffin & Company, 1975.[17] D. Erdogmus and J. Prıncipe, “From linear adaptive filtering to nonli-

near information processing - the design and analysis of informationprocessing systems,” Signal Processing Magazine, IEEE, vol. 23, pp.14–33, 2006.

[18] J. Montalvao, B. Dorizzi, and J. C. M. Mota, “Some theoreticallimits of efficiency of linear and nonlinear equalizer,” Journal ofCommunications and Information Systems, vol. 14, pp. 85–92, 1999.

[19] J. G. Proakis, Digital Communications. McGraw-Hill, 1995.

587

[IEEE 2011 International Joint Conference on Neural Networks (IJCNN 2011 - San Jose) - San Jose, CA,...

Documents

Transcript of [IEEE 2011 International Joint Conference on Neural Networks (IJCNN 2011 - San Jose) - San Jose, CA,...