Research Article Fuzzy Wavelet Neural Network Using a...

Research ArticleFuzzy Wavelet Neural Network Using a CorrentropyCriterion for Nonlinear System Identification

Leandro L S Linhares12 Aluisio I R Fontes12 Allan M Martins3

Faacutebio M U Arauacutejo2 and Luiz F Q Silveira2

1Postgraduate Program in Electrical and Computer Engineering (PPgEEC) Federal University of Rio Grande do Norte59078-970 Natal RN Brazil2Department of Computer Engineering Federal University of Rio Grande do Norte 59078-970 Natal RN Brazil3Department of Elecrtical Engineering Federal University of Rio Grande do Norte 59078-970 Natal RN Brazil

Correspondence should be addressed to Aluisio I R Fontes aluisioigorgmailcom

Received 17 September 2014 Accepted 28 November 2014

Academic Editor Yudong Zhang

Copyright copy 2015 Leandro L S Linhares et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Recent researches have demonstrated that the Fuzzy Wavelet Neural Networks (FWNNs) are an efficient tool to identify nonlinearsystems In these structures features related to fuzzy logic wavelet functions and neural networks are combined in an architecturesimilar to the Adaptive Neurofuzzy Inference Systems (ANFIS) In practical applications the experimental data set used in theidentification task often contains unknown noise and outliers which decrease the FWNNmodel reliability In order to reduce thenegative effects of these erroneous measurements this work proposes the direct use of a similarity measure based on informationtheory in the FWNN learning procedure The Mean Squared Error (MSE) cost function is replaced by the Maximum CorrentropyCriterion (MCC) in the traditional error backpropagation (BP) algorithmThe input-outputmaps of a real nonlinear system studiedin this work are identified from an experimental data set corrupted by different outliers rates and additive white Gaussian noiseThe results demonstrate the advantages of the proposed cost function using the MCC as compared to the MSE This work alsoinvestigates the influence of the kernel size on the performance of the MCC in the BP algorithm since it is the only free parameterof correntropy

1 Introduction

System identification is a modeling procedure where themathematical representation of the input-output maps fordynamical systems can be obtained with the aid of experi-mental dataThis procedure is a prominent alternative for theefficient modeling of complex systems without the need forusing complex mathematical concepts For this reason thissystem identification plays an important role in some controlengineering related tasks such as classification and decisionmaking monitoring control and prediction [1ndash8]

Artificial Neural Networks (ANNs) represent one ofthe most successful identification techniques used to modelnonlinear dynamical systems [9] This is due to their abilityto learn by examples associated with intrinsic robustness

and nonlinear characteristics [10ndash13] Recently a wide varietyof network structures have been used to model the input-output maps of nonlinear systems [5 14 15] MultilayerPerceptron (MLP) Radial Basis Function (RBF) networkNeurofuzzy Hybrid Structures for example Adaptive Neu-rofuzzy Inference Systems (ANFIS) and Wavelet NeuralNetworks (WNN) are examples of ANNs commonly used inapplications involving nonlinear systems [9 13 16 17]

WNNs combine the flexibility of ANNs and the curvefitting ability of wavelet functions [18ndash20] Besides it can beimproved in terms of extending the domain of validity by theaddition of an extra layer of fuzzy structures to achieve thecourse delimitation of the universe of discourse resulting inFuzzy Wavelet Neural Networks (FWNNs) [5] The architec-ture of the FWNN is very close to the traditional ANFIS [21]

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 678965 12 pageshttpdxdoiorg1011552015678965

2 Mathematical Problems in Engineering

although wavelets are used as membership functions (MFs)[22 23] or in the consequent part of fuzzy rules through theuse ofWNNs as local models In literature it is often possibleto find several research works applying FWNN to deal withmodeling control function approximation and nonlinearsystem identification among others [6 24ndash28]

In [29] Linhares et al evaluate an alternative FWNNstructure to identify the nonlinear dynamics of amultisectionliquid tankThe aforementioned proposed structure is similarto the ones presented by Yilmaz and Oysal [5] Abiyev andKaynak [6] and Lu [24] However the FWNN presentedin [29] uses only wavelets in the consequent fuzzy rulesThe wavelets in each node of the FWNN consequent layerare weighted by the activation signals of the fuzzy rulesTherefore the local models of such FWNN are solely rep-resented by a set of wavelet functions which differs from[5 6 24] The results presented in [29] demonstrate thatthe modified FWNN structure maintains the generalizationcapability and also other important features presented bytraditional FWNNs despite the reduction in the complexityof these structures

In practical applications the experimental data set used inthe identification procedure is often corrupted by unknownnoise and outliers The outliers are incorrect measurementswhich markedly deviate from the typical ranges of otherobservations [30] The main source of the outliers comesfrom sporadic malfunctioning of sensors and equipments[31] The presence of noise and outliers in experimentaldata negatively affects the performance and reliability of thedynamical model under identification because it tries tofit such undesired measurements [30 32 33] Despite thefact that there are many outlier detection methods presentedin literature many approaches are not able to detect allthe outliers Therefore the resulting data obtained after theapplication of such methods may still be contaminated withoutliers [30 31]

Generally the learning process of the neural networks isbased on a given gradient method for example the classicalerror backpropagation (BP) algorithm which uses the MeanSquared Error (MSE) as its cost function However theapplicability of MSE to obtain a model that represents aninput-output relationship is optimal only if the probabilitydistribution function (pdf) of the errors is Gaussian [34]However the error distribution in most cases is nongaussianand nonlinear [8] In literature we can find some researchesthat demonstrate that the use of the Maximum CorrentropyCriterion (MCC) replacing the traditionalMSE is an effectiveapproach to handle the problem of prediction and iden-tification when the dynamical system has unknown noiseand outliers [7 8 30 35] The correntropy evaluation allowsthe extraction of additional information from available databecause such similarity measure takes into account all themoments of a probability distribution that are typically notobserved by MSE [7]

In this work the reliability of the FWNN recently pro-posed in [29] is evaluated when different percentages ofoutliers and noise contaminate the experimental data usedto identify a nonlinear system The aforementioned neuralnetwork is used to identify the dynamic relationship between

the input and output of a multisection liquid tank In orderto train the FWNN the BP algorithm is used although thetraditional MSE cost function is replaced by the MaximumCorrentropy Criterion using an adaptive adjustment of itskernel size which is the free parameter of the MCC Theobtained models using each one of the quality measures areproperly evaluated and compared Despite the advantages ofcorrentropy over MSE little effort has been reported towardsthe application of correntropy to identify nonlinear systemsusing neural networks [7 8] The results presented in thiswork demonstrate that the FWNN architecture proposed in[29] is less sensitive to the presence of outliers and noisewhen it is trained using the MCC In addition this workalso investigates the influence of the kernel size on theperformance of the MCC in BP algorithm

This paper is organized as follows Section 2 presents thedefinition and the basic mathematical theory of the similaritymeasure of correntropyThen Section 3 describes the FWNNproposed in [29] which is applied in this work to identifyan experimental nonlinear dynamical system considering thepresence of outliers and noise Section 4 presents the updat-ing equations of BP algorithm which are modified accordingto the MCC Section 5 describes the proposed identificationarchitecture in detail Section 6 presents the multisectionliquid tank under study while the performance of FWNNmodels obtained using MSE and MCC cost functions isevaluated considering the presence of both outliers and noisein experimental data Finally concluding remarks are given inSection 7

2 Correntropy

Correntropy is a generalized similarity measure between twoarbitrary scalar random variables119883 and 119884 defined by [36]

119881120590(119883 119884) = 119864

119883119884[119896120590(119883 119884)] = intint119896 (119909 119910) 119901

119883119884(119909 119910) 119889119909 119889119910 (1)

where 119901119883119884

is the joint probability distribution 119864[sdot] is theexpectation operator and 119896

120590(sdot sdot) is a symmetric positive

definite kernel In this work 119896120590(sdot sdot) is a Gaussian kernel given

119896120590(119909119895 119909119894) = 119870

120590(119909119895minus 119909119894) =

1radic2120587120590

(119909119895minus 119909119894)2

21205902]

where 120590 is the variance defined as the kernel size The kernelsize may be interpreted as the resolution for which corren-tropy measures similarity in a space with characteristics ofhigh dimensionality [36]

By applying a Taylor series expansion to the Gaussianfunction in (1) and assuming that all the moments of the jointpdf are finite such equation becomes

119881120590(119883 119884) =

1radic2120587120590

119896=0

(minus1)119896

21198961205902119896119896119864 [(119883 minus 119884)

2119896] (3)

Mathematical Problems in Engineering 3

In practice the joint pdf in (1) is unknown and only afinite amount of data (119909

119894 119910119894)119873

119894=1 is available leading to thesample correntropy estimator defined by

V120590(119883 119884) =

1119899

119873

119894=1119896120590(119909119894 119910119894) (4)

Correntropy involves all the even moments of differencebetween 119883 and 119884 Compared with MSE 119864[(119883 minus 119884)

2] which

is a quadradic function in the joint input space correntropyincludes second-order and higher-order statistical informa-tion [37] However for sufficiently large values of 120590 thesecond-order moment is predominant and the measureapproaches correlation [38]

Nowadays correntropy has been successfully used in awide variety of applications where the signals are non-Gaus-sian or nonlinear for example automaticmodulation classifi-cation [39] classification systems of pathological voices [40]and principal component analysis (PCA) [41]

21 Maximum Correntropy Criterion for Model EstimationThe correntropy concept can be extended to the model esti-mation The variable 119883 can be considered as a mathematicalexpression of the unknown function 119891(119883119908) where 119883 is aninput set 119883 = 119909

119894isin 119877119898

119894=1119873 and the model parameters are

119908 = [1199081 119908119898]119879 which approximates the dependence on

an output set 119884 = 119910119894isin 119877119898

119894=1119873 [42]

Therefore it is possible to determine the optimal solutionfor the MCC from (4) as [43]

arg max119908

119869 (119908) =1119872

119872

119894=1119896120590(119891 (119883119908) 119884)

=1119872

119873

119894=1119870120590(119890119894)

where 119890119894= 119891(119883119908)minus119910

119894and 119894 = 1 119873 which are the errors

generated by the model during the supervised learning foreach of the 119873 training samples It is worth mentioning thatsuch criterion is used as the cost function of the BP algorithmto adjust the parameters of the FWNN

One of the advantages of using correntropy in systemidentification lies in the robustness of such measure againstimpulsive noise due to the use of the Gaussian kernel in(5) which is close to zero that is 119896

120590(119891(119883119908) 119884) asymp 0

when 119891(119883119908) or 119884 is an outlier Correntropy is positive andbounded and it gives 0 lt V

120590(119883 119884) le 1radic2120587120590 for the

Gaussian kernelThe Gaussian variance (also called kernel size) is a free

parameter that must be selected by the user [38] Thereforewhen the correntropy is estimated the resulting valuesdepend on the selected kernel size In addition the kernelsize of correntropy influences the nature of the performancesurface presence of local optima rate of convergence androbustness to impulsive noise during adaption [37 43] If thetraining data size is not large enough the kernel size mustbe chosen considering tradeoffs between outlier rejection andestimation efficiency [44]

Some approaches can be employed to determine the ker-nel size for example the statistical method [45] Silvermanrsquosrule [46] cross validation techniques [47 48] and shapeof the prediction error distribution [44] This work uses anadaptive kernel size algorithm [42] which is given by

120590 =max 1003816100381610038161003816119890119894

1003816100381610038161003816

2radic2 119894 = 1 119873 (6)

In order to assess the improved performance of anadaptive kernel size over fixed ones Section 6 is supposed toshow how the error evolves during the FWNN training fordifferent values of the kernel size

3 Fuzzy Wavelet Neural Networks

31 Brief Review Wavelets are obtained by scaling and trans-lating a special function 120595(119909) localized in both timespaceand frequency called mother wavelet which can be definedin such a way to serve as a basis to describe other functionsWavelets are extensively used in the fields of signal analysisidentification and control of dynamical systems computervision and computer graphics among other applications[49ndash52] Given 120595(119909) the corresponding family of wavelets isobtained by

Ψ119895(x) = 10038161003816100381610038161003816d119895

10038161003816100381610038161003816

minus12120595(

x minus t119895

d119895

) d119895= 0 119895 = 1 2 119896 (7)

where x = 1199091 1199092 119909119899 andΨ119895(x) is obtained from120595(x) byscaling it by a factor d

119895= 1198891119895 1198892119895 119889119899119895 and translating it

by t119895= 1199051119895 1199052119895 119905119899119895

A WNN is a nonlinear regression structure that canrepresent input-output maps by combining wavelets withappropriate scalings and translations [53] The output of aWNN is determined as follows

119910 =

119896

119895=1w119895Ψ119895(x) =

119896

119895=1w119895

10038161003816100381610038161003816d119895

10038161003816100381610038161003816

minus12120595(

x minus t119895

d119895

where w119895are the synaptic weights x is the input vector and

d119895and t119895are parameters characterizing the wavelets

In a concise manner the purpose of FWNNs is toincorporate WNNs into the ANFIS structure in order toobtain faster convergence and better approximation capabil-ities eventually with a greater number of parameters to beadjusted The fuzzy rules allow tackling the uncertaintieswhile wavelets contribute to improving the accuracy in theprocess of approximating input-output maps [6]

32 FWNN Architecture A particular instance of FWNNproposed in [29] is applied in this work to identify areal nonlinear system investigating its performance andreliability when the experimental data set is corrupted byunknown noise and outliers In this FWNN architecturethe consequent part of its fuzzy rules is described only bywavelet functions It differs from other structures such asthose proposed in [5 6 24] The basic architecture of theFWNN can be seen in Figure 1 and its layers are describedas follows

Ank119899

1205831

1205832

120583p

120583m

1205831

1205832

120583p

120583m

x1 x2 middot middot middot xn

120573

Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6

Figure 1 Graphical representation of the FWNN architecture

Layer 1 The input layer just transfers the input signal vectorx = 1199091 1199092 119909119899 to the next layer

Layer 2 In the fuzzification layer the membership functionsare parameterized to match the specific requirements of avariety of applications For instance a Gaussian membershipfunction can be described by the following equation

119860119902119903(119909119902) = exp[minus1

119909119902minus 119886119902119903

119887119902119903

where for 119902 = 1 2 119899 and 119903 = 1 2 119896119902 119860119902119903

wouldbe associated with the 119903th membership function appearingin a given rule and evaluated for the 119902th component ofthe input vector The adjustable parameters are 119886

119902119903and

119887119902119903 representing the center and width of the membership

function respectively

Layer 3 This is the inference layer Assuming that there are119898rules where 119877

119894is a given rule and 119894 = 1 2 119898 each rule is

supposed to produce and output 120583119894by aggregating 119860

119902119903using

a T-normThe output of the 119901th rule in this layer is

120583119901=

119899

119902=1119860119902119903(119909119902) 119901 = 1199031 1199032 119903119899 (10)

where 1199031 = 1 1198961 1199032 = 1 1198962 119903119899 = 1 119896119899

All the rule outputs of this layer are added up to thesummation node located between Layers 3 and 4The output120573 of this node is later used in the normalization stage

Layer 4 In the normalization layer the normalization factorfor the output of the 119894th rule denoted by 120583

119894is given by

120583119894=120583119894

120573=

120583119894

sum119898

119894=1 120583119894 119894 = 1 2 119898 (11)

Layer 5 This is the consequent layer of the FWNN In thiswork the Mexican Hat family of wavelets is adopted as in [56 54] Its mathematical representation is given by

120595 (119909) =1radic119889

(1 minus (119909 minus 119905119889

2) exp [minus05(119909 minus 119905

119889)

2] (12)

The inputs of the wavelet layer are the normalizedweights120583119894and the input vector x = 1199091 1199092 119909119899 while the outputs

of this layer represented by 119891119895are given by

119891119895= 120583119895Ψ119895 119895 = 1 2 119898

119891119895= 120583119895

119899

119894=1(1 minus 1198792

119894119895) exp[[

minus051198792119894119895

radic119889119894119895

where the term 119879119894119895= (119909119894minus 119905119894119895)119889119894119895 119889119894119895gt 0 is adopted to

simplify the mathematical notation and 119899 is the number ofwavelet functions in a node of Layer 5

Layer 6 In the output layer all signals from the waveletneurons are summed up as follows

119910 =

119898

119895=1119891119895 (14)

By observing Figure 1 and considering (9) to (14) it ispossible to notice that the FWNN related parameters arelocated in the second and fifth layers The membershipfunctions and wavelet functions are adjusted according tothe application using any learning algorithm such as BPalgorithm

4 Error BackpropagationAlgorithm with MCC

The classical BP algorithm is the learning algorithm usedin this work to adjust the free parameters of the FWNNmodels According to [54] this algorithm is probably themost frequently used technique to train a FWNN Despiteits functionality it presents some shortcomings such as thefact that it may get stuck on a local minimum of the errorsurface and that the training convergence rate is generallyslow [55ndash57] However it is well known that the use ofwavelet functions in neural network structures reduces suchinconveniences [6 58]

A neural system should be designed to present a desiredbehavior hence it is necessary to define a cost functionfor this task It provides an evaluation of the quality of thesolution obtained by the neural model [59] The gradientbased learning algorithms such as the BP algorithm requirethe differentiation of the chosen cost function with respect tothe adjustable parameters of the FWNN model Therefore itis necessary to obtain the partial derivatives of the chosen costfunction with respect to parameters 119889

119894119895and 119905119894119895of the wavelets

and parameters 119886119902119903and 119887119902119903of the membership functions119860

119902119903

Typically MSE is the cost function used with BP algo-rithm [10] Such classical cost criterion is replaced by MCCin this work in order to increase the reliability of the FWNNmodelwhen the identified dynamical systempresents outliersand noise When using MCC the main goal is to maximizethe correntropy similarity measure between two randomprocess variables In the FWNN learning procedure suchvariables are the desired output 119910

119889and the estimated output

119910 provided by the FWNNmodel Considering the estimationerror of the FWNN model given by 119890 = 119910

119889minus 119910 maximizing

the MCC is equivalent to minimizing

E119886V =

1119873

119873

119899=1[

1radic2120587120590

minus 119870120590(119890119899)] (15)

where 119873 is the number of samples in the experimentaldata Equation (15) corresponds to the cost function usedduring the minimization process of the BP algorithm appliedto adjust the parameters of the FWNN models As suchparameters are adjusted sequentially (16) defines the instan-taneous correntropy used to update thewavelet functions andmembership functions parameters of the FWNN after eachtraining pair is presented to this network Consider

radic2120587120590minus 119870120590(119890119899) (16)

By differentiatingE with respect to 119889119894119895and 119905119894119895 it gives

120597E

120597119889119894119895

= minus

119890119899119870120590(119890119899) 120583119894(exp [minus051198792

119894119895] (31198792119894119895minus 119879

4119894119895))

1205902radic1198893119894119895

120597E

120597119905119894119895

= minus

119890119899119870120590(119890119899) 120583119894(exp [minus05119879

119894119895] (3119879119894119895minus 119879

3119894119895))

1205902radic1198893119894119895

Now differentiatingE with respect to 119886119902119903and 119887119902119903 it gives

120597E

120597119886119902119903

=120597E

120597119910

120597Ψ

120597120583

120597119860

120597119886119902119903

120597E

120597119887119902119903

=120597E

120597119910

120597Ψ

120597120583

120597119860

120597119887119902119903

120597119860119902119903

120597119887119902119903

119860119902119903

(119909119902minus 119887119902119903)

1198862119902119903

if 119909119902= 119887119902119903

0 if 119909119902= 119887119902119903

120597119860119902119903

120597119887119902119903

119860119902119903

(119909119902minus 119887119902119903)2

1198863119902119903

if 119909119902= 119887119902119903

0 if 119909119902= 119887119902119903

Following the delta rule mentioned in [10] the parame-ters of the proposed FWNN are updated as follows

119889119894119895(119896 + 1) = 119889

119894119895(119896) minus 120578

120597E

120597119889119894119895

119905119894119895(119896 + 1) = 119905

119894119895(119896) minus 120578

120597E

120597119905119894119895

119886119902119903(119896 + 1) = 119886

119902119903(119896) minus 120578

120597E

120597119886119902119903

119887119902119903(119896 + 1) = 119887

119902119903(119896) minus 120578

120597E

120597119887119902119903

where 120578 is the learning rate For the training algorithm initial-ization wavelets and membership functions parameters areset with random numbers from a uniform distribution

The replacement of the traditional MSE by MCC insertsanother learning parameter to BP algorithm As alreadyexplained the success of the correntropy is based on theappropriate adjustment of the kernel size of its Gaussianfunctions This new parameter influences the nature of theperformance surface presence of local optima rate of conver-gence and robustness Therefore if an unsuitable kernel sizeis chosen the expected improved performance of the MCCwill not be confirmed [60] For this reason an adaptive kernelmethod is applied in this work (see (6)) to adjust the kernelsize over the learning epochs

5 Proposed Identification Architecture

The proposed architecture adopted in this work identifiesthe dynamic relationship between the input and output of amultisection tank for water storage The system is evaluatedwhen the experimental data used during the identificationtask is corrupted with noise and outliers The proposedarchitecture is based on the series-parallel identificationscheme described in [13] with small modifications due tothe experimental data set characteristic and the learning

Cost criterion

Adaptivekernel size

Noise signaland outliers

Proposed architecture

Dynamicalsystem

zminus1 middot middot middot zminusd zminus1 middot middot middot zminusd

+minus

d(k) = y(k) + n(k)

e(k) = d(k) minus y(k)

Figure 2 Proposed identification architecture

procedure used to adjust the parameters of the FWNNmodel Figure 2 presents a schematic diagramof the proposedidentification architecture in this work

The inputs of the FWNN model are past values of inputsignal 119906(119896) and the system output when corrupted withnoise and outliers 119889(119896) = 119910(119896) + 119899(119896) while the estimatedoutput is given by 119910(119896) The work developed in [9] showsthat well-known linear modeling structures such as FIR(Finite Impulse Response) ARX (AutoRegressive eXoge-nous input) ARMAX (AutoRegressive Moving AverageeXogenous input) OE (Output Error) and SSIF (State SpaceInnovations Form) may be extended by using nonlinearfunctions or representations thus leading to the nonlinearmodeling structures NFIR NARX NARMAX NOE andNSSIFThis concept is used to define the inputs of the FWNNmodels obtained in this study

According to [9] the advantage of a NARX model is thatnone of its regressors depends on past outputs of the modelwhich ensures that the predictor remains stableThis is partic-ularly important in the nonlinear case since the stability issuein this particular case is much more complex than in linearsystems Considering that the inputs of the FWNN modelsin this work are described exactly as the regression vectorof the NARX modeling structure they inherit importantcharacteristics from such structure Figure 3 shows moredetails on the FWNN inputs in accordance with the NARXstructure where 120582 120591 and 120572 are constants that define a modelof order 120572 and delay 120591

Figure 2 illustrates that the FWNNmodel parameters areupdated according to the error signal 119890(119896) = 119889(119896) minus 119910(119896) byusing a learning algorithm for example the BP algorithmBy adopting the MCC as its respective cost criterion thelearning algorithm is applied to the FWNN model As it waspreviously explained in Section 2 the success of theMCCalso

y(k minus 1)

y(k minus 120582)

u(k minus 120591)

u(k minus 120591 minus 120572)

Figure 3 NARX model structure using FWNN

depends on the correct choice of the kernel size Thereforethe adaptive method described by (6) is used in this work toadjust the kernel size during the learning epochs

6 Experiments and Results

In order to evaluate the performance of the FWNNwhen thetraditional MSE cost criterion of the error backpropagationalgorithm is replaced by MCC the aforementioned neuralnetwork is used to identify a real dynamical system consid-ering that its experimental data is corrupted by noise andoutliers

61 Multisection Liquid Tank The multisection liquid tankconsists of an acrylic tank for containment of liquids withthree abrupt changes in its cross-sectional area as it can beseen in Figure 4 The liquid tank was originally designedfor educational purposes in order to be used in studiesof identification and control of dynamical systems [61]It was also used in [29] to evaluate the performance of

PsOutWater basin

(a) Schematic diagram of the multisec-tion liquid tank

(b) Multisection liquid tank

Figure 4 Three-section liquid tank with distinct cross-sectional areas

the alternative FWNN structure employed in this work Inaddition to the acrylic tank structure the system is composedby a water reservoir a water pump a pressure sensor anelectronic power driver and an electronic interface with AD(analog-to-digital) and DA (digital-to-analog) converters

The nonlinearity presented in the liquid flow outputwhich is due to the different pressure levels at the tankbase in accordance with the height of the liquid columncan be clearly noticed in the aforementioned dynamicalsystem Besides the distinct cross-sectional areas make suchnonlinearity even more evident It is worth mentioning thatthe abrupt transitions between the tank sections are alsoresponsible for discontinuitiesThe whole system can be seenas a set of three coupled nonlinear systems since each tanksection has its own dynamic behavior

62 System Identification Initially in order to collect theexperimental data set used during the learning and testingphase of the identified FWNN models in this work thewater pump is excited with anAPRBS (AmplitudeModulatedPseudorandom Binary Sequence) and the water level insidethe tank is registered at a sample rate of 119879

119904= 2Hz

For the generation of the persistent excitation signal thefollowing parameters are considered theminimumhold time119879ℎ= 10 s minimal amplitude 119860min = 0V and maximum

amplitude 119860max = 15V Since only positive values of voltageare considered in this case study the pump only operates inorder to shift the liquid from the reservoir to themultisectiontank

After the system excitation the collected data is corruptedwith additive white Gaussian noise and two different percent-ages of outliers (1 and 3) The resulting data are dividedinto two sets comprising approximately 80 and 20 ofthe total amount The first set is used to train the FWNNmodel and the second one is used during the testing phase

The whole data set is normalized to fit within the range [0 1]in order to avoid numerical problems during the FWNNlearning procedure Since the multisection tank is a first-order nonlinear system and also considering Figure 3 theinputs of the FWNN models are defined with 120582 = 120572 = 1and 120591 = 0Thus 119906(119896) 119906(119896minus 1) and 119910(119896) are defined as inputsto the FWNNmodels to predict 119910(119896 + 1)

The BP algorithm presented in Section 4 is used to adjustthe parameters 119889

119894119895 119905119894119895 119886119902119903 and 119887

119902119903of the FWNN After a

trial-and-error procedure the learning rate 120578 = 00001 wasfound as a good choice to identify the multisection tank It isworthmentioning that the results presented in this workwereobtained after 350 learning epochs

Figure 5 presents the model validation when 1 of theoriginal experimental data set is corrupted with outliers andadditive white Gaussian noise is inserted In this figure thetank water level in cm is in function of the sample timestep where each time step is equivalent to 05 secondsdefined by the sample rate 119879

119904= 2Hz The terms FWNN-

MCC and FWNN-MSE are used to identify the FWNNmodels obtained using MCC and MSE as cost criterion ofBP algorithm respectively It is evident that FWNN-MCChas the best performance due to the use of the higher-order statistical information On the other hand the FWNN-MSE model based on second-order moments presents someproblems to efficiently identify the input-output dynamicrelationship of the multisection tank at some points of thevalidation curveThe presence of outliers in the experimentaldata has a significant negative impact on the FWNN modelwhen the MSE criterion is used in the learning procedureonce the error due to the outliers is increased by a squarerate The same behavior is not observed in FWNN-MCCwhen 1 of the experimental data are corrupted by outliersbecause the outliers power is weighted by the Gaussiankernel

0 100 200 300 400 500 600 700minus02

Time step

DesiredFWNN-MCCFWNN-MSE

Figure 5 Validation of the FWNN-MCC with 1 outlier rate in experimental data

0 50 100 150

Time step Time step

Time step

0 100 200 300 400 500 600 700

540 560 580 600 620 640 660 680

Figure 6 shows the model validation when 3 of theoriginal experimental data is corrupted with outliers andadditive white Gaussian noise is inserted In Figure 6 onlythe validation points are plotted to allow the better visual-ization of outliers and its respective effects in the FWNN-MCC and FWNN-MSE models Two regions are highlightedin Figure 6 thus demonstrating the improvement of theFWNN-MCCmodel Both models present problems at somepoints although the performance of FWNN-MCC one is

improved in the identification of the multisection tankdynamics as it also seems to be less sensitive to outliers andnoise than FWNN-MSE model It is noteworthy that MCChas intrinsic robustness due to the local estimation producedby the kernel size

It is also important to mention that the correntropycriterion has a free parameter that is the kernel size whichis at the core of the learning process [38] An adaptivekernel is applied in this work to improve the performance of

0 50 100 150 200 250 300 3500

Epochs

120590 = 10

120590 = 01

120590 = 001Adaptive 120590

Figure 7 Evolution of MSE for different kernel sizes

the FWNN learning procedure performance Figure 7 showsMSE obtained over the 350 epochs for three different fixedkernel sizes that is 001 01 and 10 and also using the adap-tive kernel The adaptive kernel size method mathematicallydescribed by (6) has the highest convergence rate and the bestperformance in the attenuation of outliers and noise

Figure 8 presents the behavior of the adaptive kernel sizeduring the learning stage of the FWNN-MCC model whenthe experimental data is composed by 1 and 3 of outliersDuring the initial epochs of the BP algorithm the kernel sizeis quite oscillatory However the behavior of the kernel sizebecomes more stable as it comes to the hundredth epoch

7 Conclusions

This work has analyzed the performance of a FWNN whenapplied to identify a real nonlinear dynamical system in thepresence of unknown noise and outliers Such erroneousmeasurements in experimental data reduce the reliability ofthe identified model once it tries to fit some behaviors thatare not part of the dynamical system The most commonlearning techniques applied to adjust the FWNN parametersin identification applications are methods based on gradientthat use the MSE as their cost function This paper hasthen proposed the replacement of this traditional evaluationmeasure by a similaritymeasure based on information theorydenominated correntropy Therefore the MCC was used inthis paper as the cost function of the error backpropagationalgorithm in order to reduce the negative effects of theunknown noise and outliers The results have demonstratedthat the FWNN-MCC models based on the MCC cost func-tion represent the input-output dynamics of the multisectionliquid tank more properly being also less sensitive to outliersand noise than the FWNN-MSE models This work alsohas investigated the influence of the kernel size on theperformance of theMCC in the BP algorithm since it is a free

0 50 100 150 200 250 300 350

Epochs

Adaptive 120590 with 1 outlier noiseAdaptive 120590 with 3 outlier noise

Figure 8 Adaptive kernel size behavior

parameter of correntropyThe addition of this new parameterin the learning procedure of the FWNN can be considereda disadvantage of the proposed architecture mainly becausetheMCC is very dependent on its proper adjustment Withinthis context the adopted adaptive kernel has shown to bemore efficient if compared to the case when this parameterremains fixed during the whole FWNN learning processTheadaptive kernel size method has improved the convergencerate of the backpropagation algorithm and contributed toattenuating the effects of the outliers and noise Due to the useof the BP algorithm the proposed architecture is susceptibleto local minima falls limiting the correntropy action toremove the outliers

The further research work will focus on the followingitems (1) analyzing the application of the MCC associatedwith different algorithms in order to train the FWNNarchitecture to avoid the outliers harmful effects The meta-heuristic algorithms such as Genetic Algorithm ParticleSwarm Optimization and Bat Algorithm are good optionssince they are less sensitive to local minima than the BPalgorithm (2) including and comparing different adaptivekernel methods to improve the functionality of the MCC(3) applying the proposed architecture to identify reliabledynamical models to be used in advanced control strategiessuch as the predictive controllers (4) evaluating the feasibilityto apply the FWNN-MCC as an inferential system to estimatechemical compositions calibrate sensors [62] and faultdiagnosis among others

Acronyms

AD Analog-to-digitalANFIS Adaptive Neurofuzzy Inference SystemANN Artificial Neural NetworkAPRBS Amplitude Modulated Pseudorandom

Binary Sequence

ARMAX AutoRegressive Moving AverageeXogenous input model

ARX AutoRegressive eXogenous input modelBP Error backpropagation algorithmDA Digital-to-analogFIR Finite Impulse Response modelFWNN Fuzzy Wavelet Neural NetworkFWNN-MCC FWNN obtained using MCCFWNN-MSE FWNN obtained using MSEMCC Maximum Correntropy CriterionMF Membership functionMLP Multilayer Perceptron networkMSE Mean Squared ErrorNARMAX Nonlinear AutoRegressive Moving

Average eXogenous input modelNARX Nonlinear AutoRegressive eXogenous

input modelNFIR Nonlinear Finite Impulse Response modelNOE Nonlinear Output Error modelNSSIF Nonlinear State Space Innovations Form

modelOE Output Error modelPCA Principal component analysisRBF Radial Basis Function networkSSIF State Space Innovations Form modelWNN Wavelet Neural Network

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] X Lu K L V Iyer K Mukherjee and N C Kar ldquoA dual pur-pose triangular neural network based module for monitoringand protection in Bi-directional off-board level-3 charging ofEVPHEVrdquo IEEE Transactions on Smart Grid vol 3 no 4 pp1670ndash1678 2012

[2] Z Tian and M J Zuo ldquoHealth condition prediction of gearsusing a recurrent neural network approachrdquo IEEE Transactionson Reliability vol 59 no 4 pp 700ndash705 2010

[3] Y-Y Lin J-Y Chang and C-T Lin ldquoIdentification and predic-tion of dynamic systems using an interactively recurrent self-evolving fuzzy neural networkrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 24 no 2 pp 310ndash321 2013

[4] Z Shao Y Zhan and Y Guo ldquoFuzzy neural network-based model reference adaptive inverse control for induc-tion machinesrdquo in Proceedings of the International Confer-ence on Applied Superconductivity and Electromagnetic Devices(ASEMD rsquo09) pp 56ndash59 Chengdu China 2009

[5] S Yilmaz and Y Oysal ldquoFuzzy wavelet neural network modelsfor prediction and identification of dynamical systemsrdquo IEEETransactions on Neural Networks vol 21 no 10 pp 1599ndash16092010

[6] RHAbiyev andOKaynak ldquoFuzzywavelet neural networks foridentification and control of dynamic plantsmdasha novel structureand a comparative studyrdquo IEEE Transactions on IndustrialElectronics vol 55 no 8 pp 3133ndash3140 2008

[7] R J Bessa V Miranda and J Gama ldquoEntropy and correntropyagainst minimum square error in offline and online three-dayahead wind power forecastingrdquo IEEE Transactions on PowerSystems vol 24 no 4 pp 1657ndash1666 2009

[8] H Qu W Ma J Zhao and T Wang ldquoPrediction methodfor network traffic based on maximum correntropy criterionrdquoChina Communications vol 10 no 1 pp 134ndash145 2013

[9] O Nelles Nonlinear System Identification From ClassicalApproaches to Neural Networks and Fuzzy Models SpringerBerlin Germany 2001

[10] S Haykin Neural Networks A Comprehensive FoundationPrentice-Hall Upper Saddle River NJ USA 1999

[11] Y Zhang and L Wu ldquoCrop classification by forward neuralnetwork with adaptive chaotic particle swarm optimizationrdquoSensors vol 11 no 5 pp 4721ndash4743 2011

[12] Y Zhang S Wang G Ji and P Phillips ldquoFruit classificationusing computer vision and feedforward neural networkrdquo Jour-nal of Food Engineering vol 143 pp 167ndash177 2014

[13] K S Narendra andK Parthasarathy ldquoIdentification and controlof dynamical systems using neural networksrdquo IEEE Transac-tions on Neural Networks vol 1 no 1 pp 4ndash27 1990

[14] A Banakar and M F Azeem ldquoIdentification and prediction ofnonlinear dynamical plants using TSK and wavelet neuro-fuzzymodelsrdquo in Proceedings of the 3rd IEEE Conference on IntelligentSystems pp 617ndash620 London UK September 2006

[15] T Kara and I Eker ldquoExperimental nonlinear identificationof a two mass systemrdquo in Proceedings of the IEEE Conferenceon Control Applications (CCA rsquo03) vol 1 pp 66ndash71 IstanbulTurkey 2003

[16] M O Efe ldquoA comparison of ANFIS MLP and SVM inidentification of chemical processesrdquo in Proceedings of the IEEEControl Applications amp Intelligent Control pp 689ndash694 StPetersburg Russia 2009

[17] X Xue J Lu and W Xiang ldquoNonlinear system identificationwith modified differential evolution and RBF networksrdquo in Pro-ceedings of the 5th IEEE International Conference on AdvancedComputational Intelligence ( ICACI rsquo12) pp 332ndash335 NanjingChina October 2012

[18] Q Zhang and A Benveniste ldquoWavelet networksrdquo IEEE Trans-actions on Neural Networks vol 3 no 6 pp 889ndash898 1992

[19] S A Billings andH-LWei ldquoAnew class of wavelet networks fornonlinear system identificationrdquo IEEE Transactions on NeuralNetworks vol 16 no 4 pp 862ndash874 2005

[20] F-J Lin P-H Shen and Y-S Kung ldquoAdaptive wavelet neuralnetwork control for linear synchronous motor servo driverdquoIEEE Transactions on Magnetics vol 41 no 12 pp 4401ndash44122005

[21] J-S R Jang ldquoANFIS adaptive-network-based fuzzy inferencesystemrdquo IEEE Transactions on Systems Man and Cyberneticsvol 23 no 3 pp 665ndash685 1993

[22] L Zhao J-P Zhang J Yang and Y Chu ldquoSoftware reliabilitygrowth model based on fuzzy wavelet neural networkrdquo inProceedings of the 2nd International Conference on FutureComputer and Communication (ICFCC rsquo10) pp V1664ndashV1668IEEE Wuhan China May 2010

[23] J-R Song and H-B Shi ldquoDynamic system modeling basedon wavelet recurrent fuzzy neural networkrdquo in Proceedingsof the 7th International Conference on Natural Computation(ICNC rsquo11) vol 2 pp 766ndash770 Shanghai China July 2011

[24] C H Lu ldquoWavelet fuzzy neural networks for identification andpredictive control of dynamic systemsrdquo IEEE Transactions onIndustrial Electronics vol 58 no 7 pp 3046ndash3058 2011

[25] D W C Ho P-A Zhang and J Xu ldquoFuzzy wavelet networksfor function learningrdquo IEEE Transactions on Fuzzy Systems vol9 no 1 pp 200ndash211 2001

[26] A Ebadat N Noroozi A A Safavi and S H Mousavi ldquoNewfuzzy wavelet network for modeling and control the modelingapproachrdquoCommunications inNonlinear Science andNumericalSimulation vol 16 no 8 pp 3385ndash3396 2011

[27] S HMousavi NNoroozi A Safavi andA Ebadat ldquoModelingand control of nonlinear systems using novel fuzzy wavelet net-works the output adaptive control approachrdquo CommunicationsinNonlinear Science andNumerical Simulation vol 16 no 9 pp3798ndash3814 2011

[28] S Ganjefar and M Tofighi ldquoA fuzzy wavelet neural networkstabilizer design using genetic algorithm for multi-machinesystemsrdquo Przegląd Elektrotechniczny vol 89 no 5 pp 19ndash252013

[29] L L S Linhares J M Araujo Jr F M U Araujo and TYoneyama ldquoA nonlinear system identification approach basedon fuzzy wavelet neural networkrdquo Journal of Intelligent andFuzzy Systems 2014

[30] Y Liu and J Chen ldquoCorrentropy-based kernel learning fornonlinear system identification with unknown noise an indus-trial case studyrdquo in Proceedings of the 10th IFAC Symposium onDynamics and Control of Process Systems (DYCOPS rsquo13) vol 10pp 361ndash366 Mumbai India December 2013

[31] J C Munoz and J Chen ldquoRemoval of the effects of outliers inbatch process data through maximum correntropy estimatorrdquoChemometrics and Intelligent Laboratory Systems vol 111 no 1pp 53ndash58 2012

[32] T Soderstrom ldquoSystem identification for the errors-in-variablesproblemrdquo Transactions of the Institute of Measurement andControl vol 34 no 7 pp 780ndash792 2012

[33] S Khatibisepehr and B Huang ldquoDealing with irregular datain soft sensors Bayesian method and comparative studyrdquoIndustrial and Engineering Chemistry Research vol 47 no 22pp 8713ndash8723 2008

[34] C M Bishop Neural Networks for Pattern Recognition OxfordUniversity Press Oxford UK 1995

[35] VMiranda C Cerqueira andCMonteiro ldquoTraining a FISwithepso under an entropy criterion for wind power predictionrdquoin Proceedings of the International Conference on ProbabilisticMethods Applied to Power Systems (PMAPS rsquo06) pp 1ndash8Stockholm Sweden 2006

[36] I Santamarıa P P Pokharel and J C Principe ldquoGeneralizedcorrelation function definition properties and application toblind equalizationrdquo IEEE Transactions on Signal Processing vol54 no 6 pp 2187ndash2197 2006

[37] S Zhao B Chen and J C Principe ldquoKernel adaptive filteringwith maximum correntropy criterionrdquo in Proceedings of theInternational Joint Conference on Neural Network (IJCNN rsquo11)pp 2012ndash2017 San Jose Calif USA August 2011

[38] J C Principe Information Theoretic Learning Renyirsquos Entropyand Kernel Perspectives Springer 2010

[39] A I Fontes ADMMartins L F Silveira and J Principe ldquoPer-formance evaluation of the correntropy coefficient in automaticmodulation classificationrdquo Expert Systems with Applicationsvol 42 no 1 pp 1ndash8 2014

[40] A I R Fontes P T V Souza A D D Neto A M MartinsandL FQ Silveira ldquoClassification systemof pathological voicesusing correntropyrdquo Mathematical Problems in Engineering vol2014 Article ID 924786 7 pages 2014

[41] R He B-G Hu W-S Zheng and X-W Kong ldquoRobustprincipal component analysis based on maximum correntropycriterionrdquo IEEE Transactions on Image Processing vol 20 no 6pp 1485ndash1494 2011

[42] Y Liu and J Chen ldquoCorrentropy-based kernel learning fornonlinear system identification with unknown noise an indus-trial case studyrdquo in Proceedings of the 10th IFAC Symposiumon Dynamics and Control of Process Systems pp 361ndash366December 2013

[43] W Liu P P Pokharel and J C Principe ldquoCorrentropyproperties and applications in non-Gaussian signal processingrdquoIEEE Transactions on Signal Processing vol 55 no 11 pp 5286ndash5298 2007

[44] S Zhao B Chen and J C Principe ldquoAn adaptive kernel widthupdate for correntropyrdquo in Proceedings of the International JointConference on Neural Networks (IJCNN rsquo12) pp 1ndash5 BrisbaneAustralia 2012

[45] M C Jones J S Marron and S J Sheather ldquoA brief surveyof bandwidth selection for density estimationrdquo Journal of theAmerican Statistical Association vol 91 no 433 pp 401ndash4071996

[46] B W Silverman Density Estimation for Statistics and DataAnalysis vol 3 CRC Press New York NY USA 1986

[47] A W Bowman ldquoAn alternative method of cross-validation forthe smoothing of density estimatesrdquo Biometrika vol 71 no 2pp 353ndash360 1984

[48] D W Scott and G R Terrell ldquoBiased and unbiased cross-validation in density estimationrdquo Journal of the AmericanStatistical Association vol 82 no 400 pp 1131ndash1146 1987

[49] N Terzija and H McCann ldquoWavelet-based image reconstruc-tion for hard-field tomographywith severely limited datardquo IEEESensors Journal vol 11 no 9 pp 1885ndash1893 2011

[50] A Bouzida O Touhami R Ibtiouen A Belouchrani MFadel and A Rezzoug ldquoFault diagnosis in industrial inductionmachines through discrete wavelet transformrdquo IEEE Transac-tions on Industrial Electronics vol 58 no 9 pp 4385ndash4395 2011

[51] J Gao Z Leng Y Qin Z Ma and X Liu ldquoShort-term trafficflow forecasting model based on wavelet neural networkrdquo inProceedings of the 25th Chinese Control and Decision Conference(CCDC rsquo13) pp 5081ndash5084 Guiyang China May 2013

[52] J J Cordova W Yu and X Li ldquoHaar wavelet neural networksfor nonlinear system identificationrdquo in Proceedings of the 6thIEEE Multi-Conference on Systems and Control (MSC rsquo12) pp276ndash281 Dubrovnik Croatia October 2012

[53] R K Galvao V M Becerra J M Calado and P M SilvaldquoLinear-wavelet networksrdquo International Journal of AppliedMathematics and Computer Science vol 14 no 2 pp 221ndash2322004

[54] M Davanipoor M Zekri and F Sheikholeslam ldquoFuzzy waveletneural network with an accelerated hybrid learning algorithmrdquoIEEE Transactions on Fuzzy Systems vol 20 no 3 pp 463ndash4702012

[55] V V J Rajapandian and N Gunaseeli ldquoModified standardback propagation algorithm with optimum initialization forfeed forward neural networksrdquo International Journal of ImagingScience and Engineering vol 1 no 3 pp 86ndash89 2007

[56] T Kathirvalavakumar and PThangavel ldquoA modified backprop-agation training algorithm for feedforward neural networksrdquoNeural Processing Letters vol 23 no 2 pp 111ndash119 2006

[57] S Abid R Fnaicch and M Najim ldquoA fast feedforward trainingalgorithm using a modified form of the standard backpropaga-tion algorithmrdquo IEEE Transactions on Neural Networks vol 12no 2 pp 424ndash430 2001

[58] M Davanipoor M Zekri and F Sheikholeslam ldquoThe prefer-ence of fuzzy wavelet neural network to ANFIS in identificationof nonlinear dynamic plants with fast local variationrdquo in Pro-ceedings of the 18th Iranian Conference on Electrical Engineering(ICEE rsquo10) pp 605ndash609 Isfahan Iran May 2010

[59] E Soria J D Martin and P J G Lisboa ldquoClassical trainingmethodsrdquo in Metaheuristic Procedures for Training Neural Net-works E Alba andRMarti Eds pp 25ndash32 SpringerNewYorkNY USA 2006

[60] A Singh and J C Prıncipe ldquoInformation theoretic learningwithadaptive kernelsrdquo Signal Processing vol 91 no 2 pp 203ndash2132011

[61] C A G Fonseca Estrutura ANFIS Modificada para identifi-cacao e Controle de Plantas com Ampla Faixa de Operacao e naoLinearidade Acentuada [PhD thesis] Universidade Federal doRio Grande do Norte (UFRN) Natal Brazil 2012

[62] JM deAraujo Jr JM P deMenezes A AM deAlbuquerqueO D M Almeida and F M U de Araujo ldquoAssessmentand certification of neonatal incubator sensors through aninferential neural networkrdquo Sensors vol 13 no 11 pp 15613ndash15632 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

OptimizationJournal of

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Operations ResearchAdvances in

Journal of

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Decision SciencesAdvances in

Discrete MathematicsJournal of

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

although wavelets are used as membership functions (MFs)[22 23] or in the consequent part of fuzzy rules through theuse ofWNNs as local models In literature it is often possibleto find several research works applying FWNN to deal withmodeling control function approximation and nonlinearsystem identification among others [6 24ndash28]

In [29] Linhares et al evaluate an alternative FWNNstructure to identify the nonlinear dynamics of amultisectionliquid tankThe aforementioned proposed structure is similarto the ones presented by Yilmaz and Oysal [5] Abiyev andKaynak [6] and Lu [24] However the FWNN presentedin [29] uses only wavelets in the consequent fuzzy rulesThe wavelets in each node of the FWNN consequent layerare weighted by the activation signals of the fuzzy rulesTherefore the local models of such FWNN are solely rep-resented by a set of wavelet functions which differs from[5 6 24] The results presented in [29] demonstrate thatthe modified FWNN structure maintains the generalizationcapability and also other important features presented bytraditional FWNNs despite the reduction in the complexityof these structures

In practical applications the experimental data set used inthe identification procedure is often corrupted by unknownnoise and outliers The outliers are incorrect measurementswhich markedly deviate from the typical ranges of otherobservations [30] The main source of the outliers comesfrom sporadic malfunctioning of sensors and equipments[31] The presence of noise and outliers in experimentaldata negatively affects the performance and reliability of thedynamical model under identification because it tries tofit such undesired measurements [30 32 33] Despite thefact that there are many outlier detection methods presentedin literature many approaches are not able to detect allthe outliers Therefore the resulting data obtained after theapplication of such methods may still be contaminated withoutliers [30 31]

Generally the learning process of the neural networks isbased on a given gradient method for example the classicalerror backpropagation (BP) algorithm which uses the MeanSquared Error (MSE) as its cost function However theapplicability of MSE to obtain a model that represents aninput-output relationship is optimal only if the probabilitydistribution function (pdf) of the errors is Gaussian [34]However the error distribution in most cases is nongaussianand nonlinear [8] In literature we can find some researchesthat demonstrate that the use of the Maximum CorrentropyCriterion (MCC) replacing the traditionalMSE is an effectiveapproach to handle the problem of prediction and iden-tification when the dynamical system has unknown noiseand outliers [7 8 30 35] The correntropy evaluation allowsthe extraction of additional information from available databecause such similarity measure takes into account all themoments of a probability distribution that are typically notobserved by MSE [7]

In this work the reliability of the FWNN recently pro-posed in [29] is evaluated when different percentages ofoutliers and noise contaminate the experimental data usedto identify a nonlinear system The aforementioned neuralnetwork is used to identify the dynamic relationship between

the input and output of a multisection liquid tank In orderto train the FWNN the BP algorithm is used although thetraditional MSE cost function is replaced by the MaximumCorrentropy Criterion using an adaptive adjustment of itskernel size which is the free parameter of the MCC Theobtained models using each one of the quality measures areproperly evaluated and compared Despite the advantages ofcorrentropy over MSE little effort has been reported towardsthe application of correntropy to identify nonlinear systemsusing neural networks [7 8] The results presented in thiswork demonstrate that the FWNN architecture proposed in[29] is less sensitive to the presence of outliers and noisewhen it is trained using the MCC In addition this workalso investigates the influence of the kernel size on theperformance of the MCC in BP algorithm

This paper is organized as follows Section 2 presents thedefinition and the basic mathematical theory of the similaritymeasure of correntropyThen Section 3 describes the FWNNproposed in [29] which is applied in this work to identifyan experimental nonlinear dynamical system considering thepresence of outliers and noise Section 4 presents the updat-ing equations of BP algorithm which are modified accordingto the MCC Section 5 describes the proposed identificationarchitecture in detail Section 6 presents the multisectionliquid tank under study while the performance of FWNNmodels obtained using MSE and MCC cost functions isevaluated considering the presence of both outliers and noisein experimental data Finally concluding remarks are given inSection 7

2 Correntropy

Correntropy is a generalized similarity measure between twoarbitrary scalar random variables119883 and 119884 defined by [36]

119881120590(119883 119884) = 119864

119883119884[119896120590(119883 119884)] = intint119896 (119909 119910) 119901

119883119884(119909 119910) 119889119909 119889119910 (1)

where 119901119883119884

is the joint probability distribution 119864[sdot] is theexpectation operator and 119896

120590(sdot sdot) is a symmetric positive

definite kernel In this work 119896120590(sdot sdot) is a Gaussian kernel given

119896120590(119909119895 119909119894) = 119870

120590(119909119895minus 119909119894) =

1radic2120587120590

(119909119895minus 119909119894)2

21205902]

where 120590 is the variance defined as the kernel size The kernelsize may be interpreted as the resolution for which corren-tropy measures similarity in a space with characteristics ofhigh dimensionality [36]

By applying a Taylor series expansion to the Gaussianfunction in (1) and assuming that all the moments of the jointpdf are finite such equation becomes

119881120590(119883 119884) =

1radic2120587120590

119896=0

(minus1)119896

21198961205902119896119896119864 [(119883 minus 119884)

2119896] (3)

119894 119910119894)119873

V120590(119883 119884) =

1119899

119873

119894=1119896120590(119909119894 119910119894) (4)

2] which

119894isin 119877119898

119894=1119873 [42]

arg max119908

119869 (119908) =1119872

119872

119894=1119896120590(119891 (119883119908) 119884)

=1119872

119873

119894=1119870120590(119890119894)

where 119890119894= 119891(119883119908)minus119910

120590(119891(119883119908) 119884) asymp 0

120590(119883 119884) le 1radic2120587120590 for the

120590 =max 1003816100381610038161003816119890119894

1003816100381610038161003816

2radic2 119894 = 1 119873 (6)

Ψ119895(x) = 10038161003816100381610038161003816d119895

10038161003816100381610038161003816

minus12120595(

x minus t119895

d119895

) d119895= 0 119895 = 1 2 119896 (7)

by t119895= 1199051119895 1199052119895 119905119899119895

119910 =

119896

119895=1w119895Ψ119895(x) =

119896

119895=1w119895

10038161003816100381610038161003816d119895

10038161003816100381610038161003816

minus12120595(

x minus t119895

d119895

Ank119899

1205831

1205832

120583p

120583m

1205831

1205832

120583p

120583m

120573

119860119902119903(119909119902) = exp[minus1

119909119902minus 119886119902119903

119887119902119903

where for 119902 = 1 2 119899 and 119903 = 1 2 119896119902 119860119902119903

119902119903and

119902119903using

120583119901=

119899

119902=1119860119902119903(119909119902) 119901 = 1199031 1199032 119903119899 (10)

where 1199031 = 1 1198961 1199032 = 1 1198962 119903119899 = 1 119896119899

119894is given by

120583119894=120583119894

120573=

120583119894

sum119898

119894=1 120583119894 119894 = 1 2 119898 (11)

120595 (119909) =1radic119889

(1 minus (119909 minus 119905119889

2) exp [minus05(119909 minus 119905

119889)

2] (12)

119891119895= 120583119895Ψ119895 119895 = 1 2 119898

119891119895= 120583119895

119899

119894=1(1 minus 1198792

119894119895) exp[[

minus051198792119894119895

radic119889119894119895

119910 =

119898

119895=1119891119895 (14)

119902119903

E119886V =

1119873

119873

119899=1[

1radic2120587120590

minus 119870120590(119890119899)] (15)

radic2120587120590minus 119870120590(119890119899) (16)

120597E

120597119889119894119895

= minus

119890119899119870120590(119890119899) 120583119894(exp [minus051198792

119894119895] (31198792119894119895minus 119879

4119894119895))

1205902radic1198893119894119895

120597E

120597119905119894119895

= minus

119890119899119870120590(119890119899) 120583119894(exp [minus05119879

119894119895] (3119879119894119895minus 119879

3119894119895))

1205902radic1198893119894119895

120597E

120597119886119902119903

=120597E

120597119910

120597Ψ

120597120583

120597119860

120597119886119902119903

120597E

120597119887119902119903

=120597E

120597119910

120597Ψ

120597120583

120597119860

120597119887119902119903

120597119860119902119903

120597119887119902119903

119860119902119903

(119909119902minus 119887119902119903)

1198862119902119903

if 119909119902= 119887119902119903

0 if 119909119902= 119887119902119903

120597119860119902119903

120597119887119902119903

119860119902119903

(119909119902minus 119887119902119903)2

1198863119902119903

if 119909119902= 119887119902119903

0 if 119909119902= 119887119902119903

119889119894119895(119896 + 1) = 119889

119894119895(119896) minus 120578

120597E

120597119889119894119895

119905119894119895(119896 + 1) = 119905

119894119895(119896) minus 120578

120597E

120597119905119894119895

119886119902119903(119896 + 1) = 119886

119902119903(119896) minus 120578

120597E

120597119886119902119903

119887119902119903(119896 + 1) = 119887

119902119903(119896) minus 120578

120597E

120597119887119902119903

Cost criterion

Adaptivekernel size

Dynamicalsystem

+minus

d(k) = y(k) + n(k)

y(k minus 1)

y(k minus 120582)

u(k minus 120591)

PsOutWater basin

119904= 2Hz

119894119895 119905119894119895 119886119902119903 and 119887

0 100 200 300 400 500 600 700minus02

Time step

0 50 100 150

Time step Time step

Time step

0 100 200 300 400 500 600 700

540 560 580 600 620 640 660 680

0 50 100 150 200 250 300 3500

Epochs

120590 = 10

120590 = 01

120590 = 001Adaptive 120590

7 Conclusions

0 50 100 150 200 250 300 350

Epochs

Acronyms

Binary Sequence

References

Volume 2014

Journal of

Function Spaces

Algebra

119894 119910119894)119873

V120590(119883 119884) =

1119899

119873

119894=1119896120590(119909119894 119910119894) (4)

2] which

119894isin 119877119898

119894=1119873 [42]

arg max119908

119869 (119908) =1119872

119872

119894=1119896120590(119891 (119883119908) 119884)

=1119872

119873

119894=1119870120590(119890119894)

where 119890119894= 119891(119883119908)minus119910

120590(119891(119883119908) 119884) asymp 0

120590(119883 119884) le 1radic2120587120590 for the

120590 =max 1003816100381610038161003816119890119894

1003816100381610038161003816

2radic2 119894 = 1 119873 (6)

Ψ119895(x) = 10038161003816100381610038161003816d119895

10038161003816100381610038161003816

minus12120595(

x minus t119895

d119895

) d119895= 0 119895 = 1 2 119896 (7)

by t119895= 1199051119895 1199052119895 119905119899119895

119910 =

119896

119895=1w119895Ψ119895(x) =

119896

119895=1w119895

10038161003816100381610038161003816d119895

10038161003816100381610038161003816

minus12120595(

x minus t119895

d119895

Ank119899

1205831

1205832

120583p

120583m

1205831

1205832

120583p

120583m

120573

119860119902119903(119909119902) = exp[minus1

119909119902minus 119886119902119903

119887119902119903

where for 119902 = 1 2 119899 and 119903 = 1 2 119896119902 119860119902119903

119902119903and

119902119903using

120583119901=

119899

119902=1119860119902119903(119909119902) 119901 = 1199031 1199032 119903119899 (10)

where 1199031 = 1 1198961 1199032 = 1 1198962 119903119899 = 1 119896119899

119894is given by

120583119894=120583119894

120573=

120583119894

sum119898

119894=1 120583119894 119894 = 1 2 119898 (11)

120595 (119909) =1radic119889

(1 minus (119909 minus 119905119889

2) exp [minus05(119909 minus 119905

119889)

2] (12)

119891119895= 120583119895Ψ119895 119895 = 1 2 119898

119891119895= 120583119895

119899

119894=1(1 minus 1198792

119894119895) exp[[

minus051198792119894119895

radic119889119894119895

119910 =

119898

119895=1119891119895 (14)

119902119903

E119886V =

1119873

119873

119899=1[

1radic2120587120590

minus 119870120590(119890119899)] (15)

radic2120587120590minus 119870120590(119890119899) (16)

120597E

120597119889119894119895

= minus

119890119899119870120590(119890119899) 120583119894(exp [minus051198792

119894119895] (31198792119894119895minus 119879

4119894119895))

1205902radic1198893119894119895

120597E

120597119905119894119895

= minus

119890119899119870120590(119890119899) 120583119894(exp [minus05119879

119894119895] (3119879119894119895minus 119879

3119894119895))

1205902radic1198893119894119895

120597E

120597119886119902119903

=120597E

120597119910

120597Ψ

120597120583

120597119860

120597119886119902119903

120597E

120597119887119902119903

=120597E

120597119910

120597Ψ

120597120583

120597119860

120597119887119902119903

120597119860119902119903

120597119887119902119903

119860119902119903

(119909119902minus 119887119902119903)

1198862119902119903

if 119909119902= 119887119902119903

0 if 119909119902= 119887119902119903

120597119860119902119903

120597119887119902119903

119860119902119903

(119909119902minus 119887119902119903)2

1198863119902119903

if 119909119902= 119887119902119903

0 if 119909119902= 119887119902119903

119889119894119895(119896 + 1) = 119889

119894119895(119896) minus 120578

120597E

120597119889119894119895

119905119894119895(119896 + 1) = 119905

119894119895(119896) minus 120578

120597E

120597119905119894119895

119886119902119903(119896 + 1) = 119886

119902119903(119896) minus 120578

120597E

120597119886119902119903

119887119902119903(119896 + 1) = 119887

119902119903(119896) minus 120578

120597E

120597119887119902119903

Cost criterion

Adaptivekernel size

Dynamicalsystem

+minus

d(k) = y(k) + n(k)

y(k minus 1)

y(k minus 120582)

u(k minus 120591)

PsOutWater basin

119904= 2Hz

119894119895 119905119894119895 119886119902119903 and 119887

0 100 200 300 400 500 600 700minus02

Time step

0 50 100 150

Time step Time step

Time step

0 100 200 300 400 500 600 700

540 560 580 600 620 640 660 680

0 50 100 150 200 250 300 3500

Epochs

120590 = 10

120590 = 01

120590 = 001Adaptive 120590

7 Conclusions

0 50 100 150 200 250 300 350

Epochs

Acronyms

Binary Sequence

References

Volume 2014

Journal of

Function Spaces

Algebra

Ank119899

1205831

1205832

120583p

120583m

1205831

1205832

120583p

120583m

120573

119860119902119903(119909119902) = exp[minus1

119909119902minus 119886119902119903

119887119902119903

where for 119902 = 1 2 119899 and 119903 = 1 2 119896119902 119860119902119903

119902119903and

119902119903using

120583119901=

119899

119902=1119860119902119903(119909119902) 119901 = 1199031 1199032 119903119899 (10)

where 1199031 = 1 1198961 1199032 = 1 1198962 119903119899 = 1 119896119899

119894is given by

120583119894=120583119894

120573=

120583119894

sum119898

119894=1 120583119894 119894 = 1 2 119898 (11)

120595 (119909) =1radic119889

(1 minus (119909 minus 119905119889

2) exp [minus05(119909 minus 119905

119889)

2] (12)

119891119895= 120583119895Ψ119895 119895 = 1 2 119898

119891119895= 120583119895

119899

119894=1(1 minus 1198792

119894119895) exp[[

minus051198792119894119895

radic119889119894119895

119910 =

119898

119895=1119891119895 (14)

119902119903

E119886V =

1119873

119873

119899=1[

1radic2120587120590

minus 119870120590(119890119899)] (15)

radic2120587120590minus 119870120590(119890119899) (16)

120597E

120597119889119894119895

= minus

119890119899119870120590(119890119899) 120583119894(exp [minus051198792

119894119895] (31198792119894119895minus 119879

4119894119895))

1205902radic1198893119894119895

120597E

120597119905119894119895

= minus

119890119899119870120590(119890119899) 120583119894(exp [minus05119879

119894119895] (3119879119894119895minus 119879

3119894119895))

1205902radic1198893119894119895

120597E

120597119886119902119903

=120597E

120597119910

120597Ψ

120597120583

120597119860

120597119886119902119903

120597E

120597119887119902119903

=120597E

120597119910

120597Ψ

120597120583

120597119860

120597119887119902119903

120597119860119902119903

120597119887119902119903

119860119902119903

(119909119902minus 119887119902119903)

1198862119902119903

if 119909119902= 119887119902119903

0 if 119909119902= 119887119902119903

120597119860119902119903

120597119887119902119903

119860119902119903

(119909119902minus 119887119902119903)2

1198863119902119903

if 119909119902= 119887119902119903

0 if 119909119902= 119887119902119903

119889119894119895(119896 + 1) = 119889

119894119895(119896) minus 120578

120597E

120597119889119894119895

119905119894119895(119896 + 1) = 119905

119894119895(119896) minus 120578

120597E

120597119905119894119895

119886119902119903(119896 + 1) = 119886

119902119903(119896) minus 120578

120597E

120597119886119902119903

119887119902119903(119896 + 1) = 119887

119902119903(119896) minus 120578

120597E

120597119887119902119903

Cost criterion

Adaptivekernel size

Dynamicalsystem

+minus

d(k) = y(k) + n(k)

y(k minus 1)

y(k minus 120582)

u(k minus 120591)

PsOutWater basin

119904= 2Hz

119894119895 119905119894119895 119886119902119903 and 119887

0 100 200 300 400 500 600 700minus02

Time step

0 50 100 150

Time step Time step

Time step

0 100 200 300 400 500 600 700

540 560 580 600 620 640 660 680

0 50 100 150 200 250 300 3500

Epochs

120590 = 10

120590 = 01

120590 = 001Adaptive 120590

7 Conclusions

0 50 100 150 200 250 300 350

Epochs

Acronyms

Binary Sequence

References

Volume 2014

Journal of

Function Spaces

Algebra

119902119903

E119886V =

1119873

119873

119899=1[

1radic2120587120590

minus 119870120590(119890119899)] (15)

radic2120587120590minus 119870120590(119890119899) (16)

120597E

120597119889119894119895

= minus

119890119899119870120590(119890119899) 120583119894(exp [minus051198792

119894119895] (31198792119894119895minus 119879

4119894119895))

1205902radic1198893119894119895

120597E

120597119905119894119895

= minus

119890119899119870120590(119890119899) 120583119894(exp [minus05119879

119894119895] (3119879119894119895minus 119879

3119894119895))

1205902radic1198893119894119895

120597E

120597119886119902119903

=120597E

120597119910

120597Ψ

120597120583

120597119860

120597119886119902119903

120597E

120597119887119902119903

=120597E

120597119910

120597Ψ

120597120583

120597119860

120597119887119902119903

120597119860119902119903

120597119887119902119903

119860119902119903

(119909119902minus 119887119902119903)

1198862119902119903

if 119909119902= 119887119902119903

0 if 119909119902= 119887119902119903

120597119860119902119903

120597119887119902119903

119860119902119903

(119909119902minus 119887119902119903)2

1198863119902119903

if 119909119902= 119887119902119903

0 if 119909119902= 119887119902119903

119889119894119895(119896 + 1) = 119889

119894119895(119896) minus 120578

120597E

120597119889119894119895

119905119894119895(119896 + 1) = 119905

119894119895(119896) minus 120578

120597E

120597119905119894119895

119886119902119903(119896 + 1) = 119886

119902119903(119896) minus 120578

120597E

120597119886119902119903

119887119902119903(119896 + 1) = 119887

119902119903(119896) minus 120578

120597E

120597119887119902119903

Cost criterion

Adaptivekernel size

Dynamicalsystem

+minus

d(k) = y(k) + n(k)

y(k minus 1)

y(k minus 120582)

u(k minus 120591)

PsOutWater basin

119904= 2Hz

119894119895 119905119894119895 119886119902119903 and 119887

0 100 200 300 400 500 600 700minus02

Time step

0 50 100 150

Time step Time step

Time step

0 100 200 300 400 500 600 700

540 560 580 600 620 640 660 680

0 50 100 150 200 250 300 3500

Epochs

120590 = 10

120590 = 01

120590 = 001Adaptive 120590

7 Conclusions

0 50 100 150 200 250 300 350

Epochs

Acronyms

Binary Sequence

References

Volume 2014

Journal of

Function Spaces

Algebra

Cost criterion

Adaptivekernel size

Dynamicalsystem

+minus

d(k) = y(k) + n(k)

y(k minus 1)

y(k minus 120582)

u(k minus 120591)

PsOutWater basin

119904= 2Hz

119894119895 119905119894119895 119886119902119903 and 119887

0 100 200 300 400 500 600 700minus02

Time step

0 50 100 150

Time step Time step

Time step

0 100 200 300 400 500 600 700

540 560 580 600 620 640 660 680

0 50 100 150 200 250 300 3500

Epochs

120590 = 10

120590 = 01

120590 = 001Adaptive 120590

7 Conclusions

0 50 100 150 200 250 300 350

Epochs

Acronyms

Binary Sequence

References

Volume 2014

Journal of

Function Spaces

Algebra

PsOutWater basin

119904= 2Hz

119894119895 119905119894119895 119886119902119903 and 119887

0 100 200 300 400 500 600 700minus02

Time step

0 50 100 150

Time step Time step

Time step

0 100 200 300 400 500 600 700

540 560 580 600 620 640 660 680

0 50 100 150 200 250 300 3500

Epochs

120590 = 10

120590 = 01

120590 = 001Adaptive 120590

7 Conclusions

0 50 100 150 200 250 300 350

Epochs

Acronyms

Binary Sequence

References

Volume 2014

Journal of

Function Spaces

Algebra

0 100 200 300 400 500 600 700minus02

Time step

0 50 100 150

Time step Time step

Time step

0 100 200 300 400 500 600 700

540 560 580 600 620 640 660 680

0 50 100 150 200 250 300 3500

Epochs

120590 = 10

120590 = 01

120590 = 001Adaptive 120590

7 Conclusions

0 50 100 150 200 250 300 350

Epochs

Acronyms

Binary Sequence

References

Volume 2014

Journal of

Function Spaces

Algebra

0 50 100 150 200 250 300 3500

Epochs

120590 = 10

120590 = 01

120590 = 001Adaptive 120590

7 Conclusions

0 50 100 150 200 250 300 350

Epochs

Acronyms

Binary Sequence

References

Volume 2014

Journal of

Function Spaces

Algebra

References

Volume 2014

Journal of

Function Spaces

Algebra

Volume 2014

Journal of

Function Spaces

Algebra

Volume 2014

Journal of

Function Spaces

Algebra

Volume 2014

Journal of

Function Spaces

Algebra

Research Article Fuzzy Wavelet Neural Network Using a...

Documents

Transcript of Research Article Fuzzy Wavelet Neural Network Using a...

Fuzzy Neural Optimized Fuzzy Logic Controller Based ...

GenSoFNN: a generic self-organizing fuzzy neural network ...looney/cs773b/NNself-organizing.pdf · tionality of fuzzy systems using neural networks ... Existing neural fuzzy systems

Neural Fuzzy Systems

NEURAL NETWORKS,CELLULAR NEURAL NETWORKS AND ADAPTIVE FUZZY FILTERS

Fuzzy Min-Max Neural Networks

Integrating neural, fuzzy logic, and neuro-fuzzy ...

Revisit Fuzzy Neural Network: Demystifying Batch Normalization and …papers.nips.cc/paper/6788-revisit-fuzzy-neural-network... · 2018-02-13 · Revisit Fuzzy Neural Network: Demystifying

Fuzzy Neural Networks Ppt

Combination of neuro fuzzy and wavelet model …neural network literature to fuzzy modeling or a fuzzy inference system (FIS). Combination of neuro fuzzy and wavelet model usage in

Learning Fuzzy Rule-Based Neural Networks for Control · Learning Fuzzy Rule-Based Neural Networks for Control ... The advantage of fuzzy systems over neural networks is that the

Monthly Rainfall Prediction Using Wavelet Neural · PDF fileMonthly Rainfall Prediction Using Wavelet Neural Network Analysis ... for rainfall prediction by combining the wavelet technique

R00013-Fuzzy neural-cast 1

Wavelet Neural Networks

Neural and Fuzzy Neural Networks

Fuzzy and Neural Control

Artiﬁcial Neural Networks and Fuzzy Neural Networks for ...downloads.hindawi.com/journals/specialissues/378312.pdfArtificial Neural Networks and Fuzzy Neural Networks for Solving

Artificial Neural Networks fuzzy logic

Neural Fuzzy

Wavelet Transforms, Neural Networks and Migration - eCommons

Fuzzy neural networks and neuro-fuzzy networks: A review ...