Interactively recurrent fuzzy functions with
Transcript of Interactively recurrent fuzzy functions with
Unc
orre
cted
Aut
hor P
roof
Journal of Intelligent & Fuzzy Systems xx (20xx) x–xxDOI:10.3233/IFS-151839IOS Press
1
Interactively recurrent fuzzy functions withmulti objective learning and its applicationto chaotic time series prediction
1
2
3
Sobhan Goudarzi∗, Mohammad Bagher Khodabakhshi and Mohammad Hassan Moradi4
Department of Biomedical Engineering, Amirkabir University of Technology, (Tehran Polytechnic), Tehran, Iran5
Abstract. Fuzzy functions (FFs) models were introduced as an alternate representation of the fuzzy rule based approaches.This paper presents novel Interactively Recurrent Fuzzy Functions (IRFFs) for nonlinear chaotic time series prediction. Chaoticsequences are strongly dependent on their initial conditions as well as past states, therefore feed forward FFs models cannotperform properly. To overcome this weakness, recurrent structure of FFs is proposed by placing local and global feedbacks inthe output parts of multidimensional subspaces. IRFFs’ optimized parameters should minimize the output error and maximizeclusters density. To achieve these contradictory goals, Non-dominated Sorting Genetic Algorithm II (NSGAII) is applied forsimultaneously optimizing the objectives. Also, feedback loop parameters are tuned by utilizing gradient descent algorithm withline search strategy based on the strong Wolfe condition. The experimental setup includes comparative studies on prediction ofbenchmark chaotic sequences and real lung sound data. Further simulations demonstrate that our proposed approach effectivelylearns complex temporal sequences and outperforms fuzzy rule based approaches and feed forward FFs.
6
7
8
9
10
11
12
13
14
15
Keywords: Recurrent fuzzy functions, chaotic time series prediction, non-dominated sorting genetic algorithm, unsupervisedoptimal fuzzy clustering, multivariate adaptive regression spline
16
17
1. Introduction18
Over the last three decades of research on the dynam-19
ical systems, it has shown that chaos has been widely20
observed in both natural and engineering systems. Sev-21
eral methods of computational intelligence and soft22
computing have been widely used for chaotic sig-23
nals prediction and identification. Neuro-fuzzy systems24
is one of the interesting soft computing approaches25
utilized in various application such as modelling, clas-26
sification, and prediction [1–8]. Turksen introduced27
another type of fuzzy structure, called Fuzzy Functions28
(FFs) [9, 10] which has a simpler structure compared
∗Corresponding author. S. Goudarzi, MSc Student, Departmentof Biomedical, Engineering , Amirkabir University of Technology,(Tehran Polytechnic), Tehran, Iran. Fax: +98 2166468186; E-mail:[email protected].
with neuro-fuzzy rule based systems. The multidimen- 29
sional structure characteristic of the input space of FFs 30
leads to elimination of combination operators such as, t- 31
norm and t-conorm in antecedent fuzzy subsets. It is one 32
of the main differences of the multidimensional struc- 33
tures with the rule based structures [11]. Moreover the 34
obtained membership values as well as input variables 35
are utilized to estimate FFs’ functions which are equiva- 36
lent to consequent parts of the rule based systems. These 37
functions can be estimated by different regression meth- 38
ods like the Least Square Estimation (LSE) [10], Multi 39
Adaptive Regression Spline (MARS) [12], and Support 40
Vector Machine (SVM) [13]. 41
The attractor of chaotic systems contains interactive 42
structure in the state space. Therefore the correspond- 43
ing time series has a considerable dependency on their 44
past states and are very sensitive to the initial condi- 45
tions. So their prediction are much more difficult than 46
1064-1246/15/$35.00 © 2015 – IOS Press and the authors. All rights reserved
Unc
orre
cted
Aut
hor P
roof
2 S. Goudarzi et al. / Interactively recurrent fuzzy functions with multi objective learning
those of static/algebraic systems [14]. Recurrent struc-47
tures enable the model to address temporal sequences48
powerfully by responding to memory information based49
on the system past states. In contrast to feed forward50
fuzzy rule based systems, the combination of recurrent51
structures and fuzzy systems has been investigated in52
several fuzzy models. Two categories of recurrent struc-53
tures were introduced as local and global feedbacks. By54
adopting global feedbacks, some of fuzzy structures55
such as, Takagi–Sugeno–Kang (TSK) type recurrent56
fuzzy network (TRFN) [15], the enhanced memory57
TRFN [16], the high-order recurrent neuron-fuzzy sys-58
tem (HO-RNFS) [17], and the recurrent self-organizing59
neural fuzzy inference network (RSONFIN) [18], have60
been proposed. Some other fuzzy neural networks61
using only local feedbacks have also been proposed [2,62
18–20]. In [18] a recurrent self-evolving fuzzy neural63
network with local feedbacks (RSEFNN-LF) is pro-64
posed that its recurrent structure comes from locally65
feeding the firing strength of a fuzzy rule back to66
itself. Dynamic fuzzy neural network (DFNN) [19],67
self-recurrent Wavelet Neural Network (SRWNN) [2],68
and Indirect Stable Adaptive Fuzzy Wavelet Neural69
Controller (ISAFWNC) [20] use a local feedback struc-70
ture similar to RSEFNN-LF, with the exception that the71
local feedbacks are located in their conclusion parts. As72
a local and global method, the interactively recurrent73
self-evolving fuzzy neural network (IRSFNN) [21] has74
been introduced which its recurrent structure is formed75
as an external loops and internal feedback.76
In the same way, the main goal of this paper is77
extending the FFs approach to a recurrent structure,78
which can utilized nonlinear and chaotic time series79
prediction. In contrast to conventional FFs structure80
that used weighting average for aggregation, in 81
this study recurrent structure is applied to improve 82
modeling capacity of relationships among states of 83
the time series which resulted in our propose scheme, 84
Interactively Recurrent Fuzzy Functions. Membership 85
values of input vectors are calculated by using Unsu- 86
pervised Optimal Fuzzy Clustering (UOFC) and then 87
multidimensional fuzzy subspaces will be achieved. 88
IRFFs’ recurrent links are placed in the output parts 89
of the multidimensional subspaces and have led to the 90
creation of FFs interactive structure. Beside the scalar 91
input variables, their membership values are then used 92
by Multivariate Adaptive Regression Spline (MARS) 93
to determine each cluster’s FF. 94
One important task in designing IRFFs is optimiza- 95
tion of free parameters such as number of fuzzy subsets 96
and fuzzy exponents as well as feedback weights. In 97
some studies, the number of rules is fixed [2, 15, 22] 98
or generated automatically based on a firing strength 99
criterion [18, 21]. In our proposed system, objectives 100
are minimizing IRFFs’ error and increasing the den- 101
sity of its fuzzy subspaces. Simultaneous optimization 102
of both objectives by using conventional approaches 103
is practically infeasible. To overcome this issue, we 104
applied Non-dominated Sorting Genetic Algorithm II 105
(NSGAII) [23] to optimize two objectives at the same 106
time. Also, interactive feedback weights are trained 107
with gradient descent algorithm with line search based 108
on strong wolf condition that automatically tunes learn- 109
ing rate. Some of the well-known chaotic time series and 110
a set of recorded lung sound signals in laboratory are 111
considered for the prediction task and discussed to ver- 112
ify IRFF’s performance. Figure 1 shows the proposed 113
scheme for IRFFs system: 114
Fig. 1. Block diagram of IRFFs system.
Unc
orre
cted
Aut
hor P
roof
S. Goudarzi et al. / Interactively recurrent fuzzy functions with multi objective learning 3
The rest of this paper is organized as follows; the115
preliminaries of state space reconstruction is briefly116
reviewed in Section 2. Then in Section 3, UOFC117
method for multidimensional fuzzy subspaces will be118
presented. MARS regression and NSGA II algorithms119
for optimization are described separately in Sections 4120
and 5. Details of proposed IRFFs’ structure and param-121
eter learning scheme are presented in Sections 6. In122
Section 7, we will compare our system with con-123
ventional methods. Finally, Section 8 is reserved for124
conclusions and future works.125
2. Stare space reconstruction126
State space may be used to study the geometric and127
dynamic properties of a given dynamic system. But in128
the most of the problems, it is not possible to measure all129
of dynamical/state variables, and a time series of obser-130
vation is only available. The dynamics of the original131
system are unknown in given scalar data and so one of132
the most basic steps of analyzing time series is state133
space reconstruction.134
According to Taken’s embedding theorem [24],135
reconstructed state space of time series x(n) is given by:136
s(n) = [x(n), x(n+ τ), . . . , x(n+ (m− 1)τ)]137
n = 1, 2, . . . , M (1)138
where s(n), m and τ are nth state, embedding139
dimension, and time delay of space reconstruction,140
respectively. Also, M = n− (m− 1)τ and N is the141
length of time series.142
The key issue in state space reconstruction is prop-143
erly determination of ‘m’ and ‘τ’, because different144
delay and embedding dimension cause different state145
space for one time series. For calculating embedding146
dimension, False Nearest Neighbor (FNN) [25] was uti-147
lized. The main idea of FNN is to examine how the148
number of neighbors of a point along a signal trajec-149
tory change with increasing embedding dimension. By150
examining how the number of neighbors change as a151
function of dimension, an appropriate embedding can152
be determined. If embedding dimension is chosen too153
low, many of the neighbors will be false, but in an appro-154
priate embedding dimension or higher, the neighbors155
are real.156
Also, τ must be chosen in such a way that makes the157
components of the states to be independent. The best158
determination method of τ is using mutual information159
function which shows nonlinear and general depen-160
dence between two variables. Criteria for determine τ 161
is the first local minimum value of mutual information 162
function for different values of delays [26]. 163
3. Unsupervised Optimal Fuzzy Clustering 164
(UOFC) 165
By assuming that the number of clusters is known,the objective function of basic FCM algorithm is asfollowing [27]:
Jq(V, U) =M∑i=1
C∑j=1
uqijd(xi, vj) (2)
where q > 1 is fuzziness and xi, vj, M, and C are the 166
ith input data, the center of the jth cluster, the number of 167
data points and clusters respectively. Also,U is anM × 168
mmatrix whose ijth element equals to the membership 169
degree of xi in jth cluster and V isC ×mmatrix which 170
containsm dimensional center of clusters, and d(xi, vj) 171
is the distance between x and jth cluster center. 172
After minimization, membership degree of features 173
and cluster prototype are as follows [27]: 174
uij = 1∑mk=1
(d(xi, vj)d(xi, vk)
) 1q−1
(3) 175
vj =∑Ni=1 u
qijxi∑N
i=1 uqij
(4) 176
if we use exponential distance (de) instead of Euclideannorm, we will obtain the UOFC algorithm introducedby Gath and Geva [28]. In this case, the algorithm learnshyper ellipsoidal cluster with different densities anddifferent number of clusters which solves more com-plex problems [28]. Covariance matrices of clusters areused to consider shapes of them in calculations. Fuzzycovariance matrix of ith cluster as follows [29]:
Fi =∑Mk=1 u
qik(xk − vi)(xk − vi)T∑M
k=1 uqik
, 1 ≤ i ≤ C (5)
And the distance function is as follows [29]: 177
d2ik(xk, vi) 178
= (2π)( n2 )Vj
αiexp
(1
2(xk − vi)
T F−1i (xk − vi)
)(6) 179
where αi = 1N
∑Nk=1 uik is prior probability and Vj =∣∣Fj∣∣ 1
2 is fuzzy hypervolume of jth cluster. Based on the
Unc
orre
cted
Aut
hor P
roof
4 S. Goudarzi et al. / Interactively recurrent fuzzy functions with multi objective learning
concept of hypervolume and cluster density Gath andGeva [28] proposed two indices for automatic determi-nation of the number of clusters. Here we used one ofthe indices as follows:
PD = S
FH(7)
where PD is partition density. Moreover, FH =180 ∑mj=1 Vj and S = ∑m
i=1∑Nj=1 uij are total fuzzy181
hypervolume and total sum of membership degrees,182
respectively. Compact clusters lead to large values of183
PD, and vice versa, so large values of PD are emblem184
of “compact” clusters [28].185
4. Multivariate adaptive regression spline186
Suppose y indicates the dependent variable and x =(x1, . . . , xn) is the vector of explanatory variables inthe set of observations where xi = (xi1, xi2, . . . , xim) ∈Rm. least square regression is one of the conventionalmethods utilized for modeling of the relation betweenexplanatory and output variables. In this method,bj(x); j = 1, 2, . . . , P are basis functions and the coef-ficients are βj which calculated by minimizing the Sumof Squared Error (SSE) defined as below [30]:
SSE =n∑i=1
(yi − β0 − β1b1(xi) − · · · − βpbp(xi))2
(8)If specific form considered for basis function bj(x)187
it will be non-adaptive predictive model and if they188
become determined base observe data, it will be called189
adaptive. MARS is one of the fast and compatible190
adaptive regression approaches. This nonparametric191
regression method which is a generalization of step192
by step linear regression can effectively reveal hid-193
den patterns and nonlinear relation in data sets [30].194
In MARS the basis functions are chosen as hinge func-195
tions (h1, h2) and formed by a pair of functions known196
as spline, given as [31]:197
h1 = max {0, x− c}198
h2 = max {0, c − x} (9)199
where, C is a constant called knot. Figure 2 depicts the200
spline functions.201
Since for each of the explanatory variables xj there202
are n independent observations, by considering each203
observation as a knot, we have 2n functions for204
each xj . By considering set B includes all functions205
Fig. 2. Plot of spline function a) h1, b) h2.
h1j, h2j, j = 1, . . . , d and product of two or more of 206
them, each member of set B can be utilized as basis 207
function of Equation 8 [31]. MARS is performed in 208
forward and backward steps as follows: 209
Forward phase: This step starts with intercepting 210
b0 or average of the response value, and then adds 211
iteratively basis function in couple form to model. Coef- 212
ficients β in Equation 8, are estimates in order to have 213
the most decrease in SSE. To add a new basis function, 214
MARS must search all of the below items: 215
- Statements in the model. 216
- All of variables (can select one for making basis 217
function). 218
- All values of variable (for specify knot in new h 219
function). 220
Adding new statement ends once changes in SSE 221
become negligible or given maximum number of state- 222
ments is achieved. At the end of this phase, a model 223
with a lot of statements is provided that has poor per- 224
formance in generalization and thus it needs a backward 225
elimination phase. 226
Backward phase starts with the last model of forward 227
phase and in each step removes a basis function from the 228
model that has the lowest contribution to increase SSE. 229
By eliminating each of basis function, a new model is 230
obtained that can be the candidate for the final model. 231
Performance of new model will be assessed with “Gen- 232
eralized Cross Validation” and this process continues 233
until all of the basis functions are removed. MARS 234
selects an optimum model based on minimizing GCV. 235
GCV for kth step of removing phase, calculated as 236
bellow [31]: 237
GCV = RSS
N ×(
1 − effective number of parametersN
)2 (10) 238
239
effective number of parameters 240
= number of MARS terms + penalty 241
×number of MARS terms− 1
2(11) 242
Unc
orre
cted
Aut
hor P
roof
S. Goudarzi et al. / Interactively recurrent fuzzy functions with multi objective learning 5
where N and 2 < λ < 4 are the number of obser-243
vations and smoothing parameter, respectively. Also244
number of MARS terms−12 is the number of knot of hing245
functions exist in model.246
5. Non-dominant sorting Genetic Algorithm II247
(NSGA II)248
To solve many real world problems, it is necessaryto optimize several objectives, simultaneously. Thus,Multi-objective (MO) optimization has been recentlypopular and was applied in different applications, like[32]. Here, MO optimization can be stated as findingthe vector p∗ = [
p∗1, p
∗2, . . . , p
∗n
]T of solutions whichoptimizes the following vector function:
f (p) = [f1(p), f2(p), . . . , fk(p)
]T (12)
where k is the number of cost functions and will be opti-249
mized. There are some insights in finding the optimum250
vector p∗ in this situations. When there are two or more251
objective functions which are conflicting it is needed252
to optimize all objectives simultaneously. The main253
difficulty in this situation is comparing one solution254
another. When the relative importance of the objectives255
is unknown, there are multiple solutions, and each of256
which is considered acceptable and equivalent. At the257
end of the procedure, there is a set of solutions named258
Pareto set. A decision vector p∗ is called Pareto opti- 259
mal if and only if there is no object that dominates p∗. 260
In other words, p∗ is Pareto optimal if there exists no 261
feasible vector p which causes a reduction on some 262
criterion without a simultaneous increase in at least 263
another. Detailed information about definition of pareto 264
optimal is available in [33]. 265
There are several methods for implementation of MO 266
such as NSGA [34], MOGA [35], and NPGA [36]. 267
NSGAII is an evolutionary and population based meta- 268
heuristic algorithm with predefined generation numbers 269
(gen) and number of populations (pop) in each gener- 270
ation. The algorithm by utilizing a specific selection 271
operator, creates a mating pool combines the parent and 272
offspring population, selects the best solutions. Finally 273
NSGAII finds the best solution in each generation and 274
creates the Pareto optimal solution. 275
6. The proposed Interactively Recurrent Fuzzy 276
Functions (IRFFs) 277
The proposed IRFFs comes as an extended version of 278
FFs and is illustrated in Fig. 3, the links between clusters 279
create the interactively recurrent dynamical structure. 280
By utilizing the interaction feedback, the proposed 281
IRFFs captures the information from other subspaces. 282
Our IRFFs system follows the following steps: 283
Fig. 3. The proposed IRFFs framework.
Unc
orre
cted
Aut
hor P
roof
6 S. Goudarzi et al. / Interactively recurrent fuzzy functions with multi objective learning
1. State space reconstruction.284
2. Calculating membership values by use of UOFC285
clustering in state space.286
3. Reproducing augmented input by appending287
membership values to primary inputs.288
4. Estimate appropriate function for each cluster uti-289
lizing MARS method.290
5. Calculation aggregation weight in the form of291
interaction between rules and past states.292
6. Weighted averaging for output calculation.293
6.1. IRFFs’ structure294
Step 1: Here, X = [x1 . . . xN ] is the input where Nis the number of data points. In state space reconstruc-tion, trajectory matrix with lag τ and dimension m isobtained as follows:
S =
⎡⎢⎢⎣
x1 x1+τ · · · x1+(m−1)τ
.... . . · · · ...
xN−(m−1)τ xN−(m−2)τ · · · xN
⎤⎥⎥⎦ (13)
Each rows of Sis one state in state space. So nth295
input data is s(n) = (x(n), x(n+ τ), . . . , x(m− 1)τ))296
and the corresponding output is y(n) = x(n+mτ).297
Step 2: State space points are clustered for deter-298
mining system subspaces. UOFC clustering results in299
separating system behaviors. After termination of clus-300
tering, membership degrees of each state in each cluster301
is determined by Equation 9.302
Step 3: In this step, augmented inputs of FFs arecreated. Then membership values and their transfor-mations are appended to the primary inputs:
ϕi = [s, �, f (�)] i = 1, . . . ,M.
where � is the membership vector containing mem-303
bership values of different clusters and f (�) indicates304
membership values transformation (Details of selecting305
appropriate transformation is available in [10]).306
Step 4: to estimate the functions of IRFFs, MARS isapplied which uses matrix φ as follows:
φi =
⎡⎢⎢⎣s1 �1 f (�1)...
......
sK �K f (�K)
⎤⎥⎥⎦ , i = 1, . . . , C (14)
where, K is the number of members of a cluster and307
every row is an explanatory variable and its correspond-308
ing output is xn+mT . Here, a step prior to prediction309
is done and NARMAX model for all clusters created 310
which corresponds to the behavior of each cluster. 311
Step 5: Weights of recurrent part of IRFFs are con- 312
sidered to form local and global feedback loops. The 313
outputs of this step are utilized for aggregation weights 314
in calculation of IRFF output. These weights are com- 315
puted as follows: 316
ψtk(u) = (1 − γk)utk +
C∑i=1i /= k
λikψt−1i 317
k = 1, . . . , C, t = 1, . . . ,M (15) 318
Here γk = ∑Ci=1 λik, t, u
tk, ψ, and λ are the samples 319
number, the membership value of tth data in clus- 320
ter k, the weight of aggregation, and the coefficient 321
of interactive and recurrent relation, respectively. The 322
aggregation weights depend not only on current mem- 323
bership values of each cluster, but also on the previous 324
membership values of other clusters and itself. 325
Step 6: Finally, system output is calculated as fol-lows:
f (xj) =∑Ci=1 ψi(u).FFi(ϕj)∑C
i=1 ψi(u)(16)
where FFi(ϕj) is the output of i th function rule for j 326
th input data. 327
6.2. IRFFs’ learning 328
In this subsection the structure and parameters learn- 329
ings of IRFFs are presented. All free design parameters 330
are number of clusters (C), degree of fuzziness (m), 331
penalty per knot (GCV ), and recurrent layer coeffi- 332
cients (λ). Figure 4 presents IRFFs’ flow chart learning 333
scheme which includes structure and parameter 334
learning. 335
6.2.1. Structure learning 336
In order to have compact clusters and better IRFFs’ 337
performance, the following two objectives are consid- 338
ered: 339
1. Minimizing error value of prediction.
E = 1
2
(yt − f
(xt
))2 (17)
yt is the desired output value, andf(xt
)is IRFFs output 340
value of tth sample. 341
2. Maximizing partition density value for fuzzy clus- 342
tering (Equation 7). 343
Unc
orre
cted
Aut
hor P
roof
S. Goudarzi et al. / Interactively recurrent fuzzy functions with multi objective learning 7
Fig. 4. Flowchart of structure and parameter learning.
For having an optimum tradeoff between both344
objectives, the structure learning phase uses NSGAII345
algorithm for optimizing the IRFF structural param-346
eters (m, C, GCV ). At the end of training step, best347
value ofm∗, C∗, andGCV ∗ are specified and used for348
test data.349
6.2.2. Parameter learning350
In all generations of NSGAII, recurrent layer coef-351
ficients (λ) are obtained by applying gradient descent352
algorithm with line search based on strong wolf condi-353
tion. Its main advantage is automatically determination354
of learning rate (η).355
Interactively recurrent weights learning based on356
error of modeling is done as an iterative algorithm.357
Based on the error function introduced in Equation358
17, we trained coefficients by using the following359
equations:360
λt+1ij = λtij +�λtij+ ∝ �λt−1
ij (18)361
�λtij = −η ∂E∂λij
i, j = 1, . . . , K (19)362
where 0<η<1 and ∝ are the learning step and momen-tum weight (which is practically selected around 0.8),respectively. Developing the previous expressions, ithas that:
∂E
∂λij= − (y − f (x))
∂f
∂λij(20)
Where 363
∂f
∂λij=
{(ψi (t − 1) − ui (t))FFi (q) , i = j(
ψj (t − 1) − ui (t))FFi (q) + 364
(ψi (t − 1) − uj (t)
)FFj (q) , i /= j (21) 365
Equal case as for recurrent weight and inequality for 366
interactive weight, here we have critical assumption that 367
interactive weight is bidirectional. 368
7. Simulations 369
This section presents three examples including 370
chaotic sequence prediction, time series prediction 371
in noisy environment, and multichannel lung sound 372
Unc
orre
cted
Aut
hor P
roof
8 S. Goudarzi et al. / Interactively recurrent fuzzy functions with multi objective learning
modeling. In all of the following examples the valid-373
ity of our proposed scheme is confirmed by simulation.374
Here, population size and the number of generations375
are set to 100 and 500, respectively. Also Root Mean376
Squared Error (RMSE) and percent root mean squared377
difference (PRD) are considered as the quality criteria:378
RMSE =√∑N
i=1 (y (i) − y (i))2
N(22)379
PRD% =√∑N
i=1 (y (i) − y (i))2∑Ni=1 (y (i))2
× 100 (23)380
where y is the IRFFs’ output.381
7.1. Example 1: Lorenz equations time series382
Here, the time series prediction problem applied forthe data generated from the Lorenz equations given by[37]: ⎧⎪⎪⎨
⎪⎪⎩dxdt
= a (y − x)dydt
= cx− xz− y
dzdt
= xy − bz
(24)
where x, y, and z are state variables and a, b, and c383
are adjusting parameters. For a = 10, b = 8/3, and384
c = 28 the Lorenz equations will have chaotic behavior.385
After generating data sequences by settingtime steps386
of 0.05 and initialization values of (1, 1, 1) for state,387
we remove the first 201 transition points and keep the388
next 1000 points for every state variables. The first 800389
points are utilized for training dataset and last 200 points390
for checking set. Based on Taken’s embedding theo-391
rem [24], the delay and embedding dimension are set392
as 1 and 3, respectively. Figure 5 shows the prediction393
result for series xand the excelent ability of IRFFs for394
predicting Lorenz time series.395
Table 1 lists comparative results of Lorenz equations396
time series prediction based on the proposed IRFFs397
and other approaches. Here, we compared three FRB398
and two FFs system including Recurrent Self-Evolving399
Fuzzy Neural Network (RSEFNN) [18], Fuzzy Recur-400
rent Wavelet Neural Network (FRWNN) [2], and401
Squared Root Cubature Kalman Filter-� Echo State402
Network (SCKF-γESN [38], Fuzzy Functions with403
LSE (FFs-LSE) and Fuzzy Functions with MARS (FFs-404
MARS) [10, 12]. The results indicate that by addition405
of recurrent layer and interactive structure in IRFFs,406
better RMSE and PRD than those of other fuzzy struc-407
tures are achieved. Although the IRFFs performance408
Table 1RMSE and PRD results for Lorenz time series
Method Data RMSE PRD Optimum parameters
FFs-LSE Train 0.0284 5.1599 C = 5Test 0.0275 4.7939
FFs-MARS Train 0.0246 3.9934 GCV = 3Test 0.0265 4.2456 C = 5
m = 3.6RSEFNN Train 0.0131 2.6143 7 rules
Test 0.0142 2.7366FRWNN Train 0.0153 3.0457 –
Test 0.0187 3.5939SCKF-γESN Train 0.0315 7.183 –
Test – –IRFFs Train 0.0097 1.9620 GCV = 2.83
Test 0.0114 2.2599 C = 2m = 2.31
and recurrent structures such as RSEFNN and FRWNN 409
are similar, the number of clusters in IRFFs is fewer than 410
the number of fuzzy rules of them. 411
Example 2: noisy Mackey-Glass time series 412
The Mackey–Glass time series s(t) is extracted fromthe following differential equation [39]:
ds(t)
dt= 0.2s (t − l)
1 + s10 (t − l)− 0.1s (t) (25)
where l is lag parameter. Here, chaotic behav- 413
ior is occurred by setting l > 17. We obtain 414
Mackey–Glass time sequences by applying the fourth- 415
order Runge–Kutta method [40] for the Equation 25. 416
In this study, the initial value s (0) is set 1.2 and s (k) 417
is derived for 1 ≤ k ≤ 1200. Also, since the first 201 418
points are transition points, they are removed and the 419
next 500 generated data are utilized for training and the 420
remaining data is applied for testing. 421
Now, we want to compare performances of different 422
prediction methods in the presence of additive noise 423
which has uniform distribution. By setting SNR = 6 dB, 424
the result of Fig. 6 and Table 2 are achieved. Figure 6 425
shows that the system is able to reasonably follow the 426
sequence of Mackey Glass time-series in the presence 427
of noise. 428
In Table 2, the performance of the proposed IRFFs is 429
compared with the following fuzzy forecasters: 2u func- 430
tion type2 fuzzy systems (2u T2 FLS) [41], Quasi-T2 431
Fuzzy Logic System Q-T2 FLS [42], and other systems 432
those are used in Example 1. Multi objective learning of 433
IRFFs generates 5 multi-dimensional fuzzy subspaces 434
while FF-MARS have 7 ones. Therefore IRFFs has 435
simpler structure than that of FFs approaches which 436
Unc
orre
cted
Aut
hor P
roof
S. Goudarzi et al. / Interactively recurrent fuzzy functions with multi objective learning 9
0 100 200 300 400 500 600 7000
0.2
0.4
0.6
0.8
1
sample
)n(x
train portion of Lorenz time series
800 900 1000
test portion
original signalmodelled signal
Fig. 5. Prediction result for Lorenz chaotic time series.
10 20 30 40 50 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
sample
sign
al
result for portion of test part
noisy signalmodelled signaloriginal signal
Fig. 6. IRFFs ‘performance for noisy Mackey Glass time series.
Table 2RMSE and PRD result for noisy Mackey Glass time series
Method Data RMSE PRD Optimum parameters
FFs-LSE Train 0.1319 20.7890 C = 5Test 0.1277 19.8974
FFs-MARS Train 0.1204 19.2004 GCV = 3Test 0.1273 19.8079 C = 7
m = 2.9RSEFNN Train 0.1038 17.4791 7 rules
Test 0.1040 17.7887FRWNN Train 0.1143 18.2368 –
Test 0.1323 19.34222uF-T2 FLS Train 0.1478 21.1567 –
Test – –Q-T2 FLS Train 0.2026 26.6893 –
Test – –IRFFs Train 0.0954 16.9428 GCV = 2.68
Test 0.1016 17.1537 C = 5m = 2.33
decreases the output error. Moreover, Table 2 shows 437
that because of using local and global feedbacks, our 438
proposed method outperforms the 2u T2 FLS [41] and 439
Q-T2 FLS [42] approaches which used uncertainty in 440
their type2 membership functions. 441
Example 3: (real lung sound signal modeling) 442
Since the lung sound signal is chaotic and has very 443
complex dynamic [43], its estimation and modeling 444
are used in different areas like classification based 445
on model parameter, denoising, and compression [44]. 446
Here, we use the signals recorded in the Department of 447
Pneumology in Shariati Hospital by Bioinstrument and 448
Biological signal processing laboratory researchers of 449
Amirkabir University of Technology’s [45]. Data was 450
recorded using a 6-channel respiratory sound acquisi- 451
tion device in a non-sound proof room. Each record 452
contains 3 inhales and 3 exhales with 1 second pause 453
after each breath. 454
In state space reconstruction, the parameters value 455
of m = 6, τ = 8 are obtained. For estimation and pre- 456
diction purpose, one inhale of a lung sound was picked 457
up from a healthy subject. Due to the chaotic nature and 458
existence of high relationship among the states of the 459
lung sound signals, the shapes of the signals are very 460
complex. 461
Figure 7 shows original lung sound signal along 462
with one predicted by our proposed method in all 6 463
different channels. This figure confirms that the tempo- 464
ral dynamics of lung sound signals in each channel is 465
Unc
orre
cted
Aut
hor P
roof
10 S. Goudarzi et al. / Interactively recurrent fuzzy functions with multi objective learning
0 500 1000 1500 2000-0.5
0
0.5
1
sign
al
channel 1
0 500 1000 1500 2000-0.5
0
0.5
1
1.5channel 2
0 500 1000 1500 20000
2
4
6
sign
al
channel 3
0 500 1000 1500 20000
0.5
1channel 4
0 500 1000 1500 2000-0.5
0
0.5
1
1.5
sample
sign
al
channel 5
0 500 1000 1500 20000
0.5
1
sample
channel 6
train part test part
original signalmodelled signal
Fig. 7. Original and predicted lung sound data.
0 50 100 150 200 250 300 350 400-0.1
-0.08
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0.08residual error for channel 6
rorre
sample
Fig. 8. Residual error for test data in channel 6.
effectively modeled by interactively structure. In chan-466
nel 3, although IRFFs doesn’t track the signal in very467
short time duration around 1600, its performance in the468
rest time intervals and in all of the other channels are469
desirable. Also, Fig. 8 shows error of proposed method470
for channel 6 data.471
Table 3 lists the results of different FFs methods472
and other interactive structures on lung sound data473
modelling. The innovative learning algorithm generates474
optimal values compared to the other methods and leads475
Table 3RMSE and PRD results of Lung sound time series
Method Data RMSE PRD Optimum parameters
FFs-LSE Train 0.0223 5.8984 C = 5Test 0.0226 5.9319
FFs-MARS Train 0.0173 4.9132 GCV = 3Test 0.0192 5.1637 C = 6
m = 2.9RSEFNN Train 0.016 4.6003 –
Test 0.0183 4.9966FRWNN Train 0.0153 4.0257 –
Test 0.0177 4.7939IRFFs Train 0.0120 3.8140 GCV = 2.17
Test 0.0136 3.1542 C = 4m = 2.54
to lower training and test errors as well as less complex 476
system. 477
8. Conclusion 478
In this paper a recurrent structure for chaotic time- 479
series prediction is added to fuzzy functions system. 480
By placing local and global feedbacks in the output 481
parts of FFs’ multidimensional subspaces, an inter- 482
active structure is achieved and IRFFs systems is 483
Unc
orre
cted
Aut
hor P
roof
S. Goudarzi et al. / Interactively recurrent fuzzy functions with multi objective learning 11
proposed. IRFFs uses recurrent structure to memorize484
previous states of the system. Self-feedbacks in IRFFs485
provide capturing the critical local information of time486
series sequences. Furthermore, global feedbacks feed-487
ing the output part of each multidimensional subspace488
to others, enable IRFFs to effectively address tempo-489
ral sequences responding to memory information from490
prior system states. Therefore, generalization of IRFFs491
in comparison with feed forwards FFs is improved.492
Multi-dimensional fuzzy subspaces are created by493
use of UOFC. This method can consider different types494
of cluster shape, density and number of data points in495
each cluster, which is necessary in time series clustering496
for identifying all kinds of system behaviors.497
Also, MARS method has been applied for linear498
and nonlinear regression of each multidimensional sub-499
space. In fact by MARS can learn different kind of500
dynamical behavior.501
The structure and parameter learning algorithm502
enables IRFFs to automatically identify its desired503
structure. It eliminates the sensitivity of learning algo-504
rithm to initial selection of parameters and provides505
the admissible structure for given data-sets. Simultane-506
ous optimization of parameters with NSGAII provides507
minimal structures and maximum prediction capabil-508
ity. Therefore, IRFFs has less complex architecture509
than that of conventional FFs. Also, in each gen-510
eration of NSGAII, it is needed to optimize the511
recurrent coefficients. Gradient descent algorithm with512
line search based on Wolfe condition is utilized for513
this purpose. Because of automatically determination of514
learning rate, possibility of divergence and increasing515
the computational complexity resulted from inappro-516
priate selection of learning rate are eliminated. Also517
convergence of algorithm become faster.518
To confirm the performance of IRFFs, two bench-519
mark and one recorded lung sound time-series were520
utilized for simulation. This is the first time that FFs521
structures was implemented for biological signal pro-522
cessing. In all examples, IRFFs was compared with523
latest popular FRB and FFs methods. As a future work,524
we build a framework for approximating FFs in conse-525
quent of each multi-dimensional subspaces, based on526
fuzzy method and in such a way that can use parameter527
of fuzzy function in pattern recognition purposes.528
References529
[1] H. Gholizade-Narm and M.R. Shafiee Chafi, Using repetitive530
fuzzy method for chaotic time series prediction, Journal of531
Intelligent & Fuzzy Systems: Applications in Engineering and 532
Technology 28 (2015), 1937–1946. 533
[2] S. Ganjefar and M. Tofighi, Single-hidden-layer fuzzy 534
recurrent wavelet neural network: Applications to function 535
approximation and system identification, Information Sciences 536
294 (2015), 269–285. 537
[3] J.-S.R. Jang, ANFIS: Adaptive-network-based fuzzy inference 538
system, Systems, Man and Cybernetics, IEEE Transactions on 539
23 (1993), 665–685. 540
[4] B.C. Tripathy and A.J. Dutta, Lacunary bounded variation 541
sequence of fuzzy real numbers, Journal of Intelligent & 542
Fuzzy Systems: Applications in Engineering and Technology 543
24 (2013), 185–189. 544
[5] N. Nithya and K. Duraiswamy, Correlated gain ratio based 545
fuzzy weighted association rule mining classifier for diagnosis 546
health care data, Journal of Intelligent and Fuzzy Systems. 547
[6] H.-A. Bazoobandi and M. Eftekhari, A fuzzy based memetic 548
algorithm for tuning fuzzy wavelet neural network parameters, 549
Journal of Intelligent and Fuzzy Systems. 550
[7] B.C. Tripathy, N. Braha and A.J. Dutta, A new class of fuzzy 551
sequences related to the �p space defined by Orlicz function, 552
Journal of Intelligent & Fuzzy Systems: Applications in Engi- 553
neering and Technology 26 (2014), 1273–1278. 554
[8] B.C. Tripathy and S. Debnath, �-open sets and �-continuous 555
mappings in fuzzy bitopological spaces, Journal of Intelligent 556
& Fuzzy Systems: Applications in Engineering and Technology 557
24 (2013), 631–635. 558
[9] I.B. Turksen, A review of developments from fuzzy rule bases 559
to fuzzy functions, Hacettepe Journal of Mathematics and 560
Statistics 41 (2012). 561
[10] I.B. Turksen, Fuzzy functions with LSE, Applied Soft Com- 562
puting 8 (2008), 1178–1188. 563
[11] O. Uncu and L. Turksen, A novel fuzzy system modeling 564
approach: Multidimensional structure identification and infer- 565
ence, in Fuzzy Systems, 2001. The 10th IEEE International 566
Conference on, 2001, pp. 557–561. 567
[12] M. Zarandi, M. Zarinbal, N. Ghanbari and I. Turksen, A 568
new fuzzy functions model tuned by hybridizing imperialist 569
competitive algorithm and simulated annealing. Application: 570
Stock price prediction, Information Sciences 222 (2013), 571
213–228. 572
[13] A. Celikyilmaz and I. Burhan Turksen, Fuzzy functions with 573
support vector machines, Information Sciences 177 (2007), 574
5163–5177. 575
[14] H. Kantz and T. Schreiber, Nonlinear time series analysis vol. 576
7: Cambridge university press, 2004. 577
[15] C.-F. Juang, A TSK-type recurrent fuzzy network for dynamic 578
systems processing by neural network and genetic algorithms, 579
Fuzzy Systems, IEEE Transactions on 10 (2002), 155–170. 580
[16] D. G. Stavrakoudis and J. Theocharis, A recurrent fuzzy neural 581
network for adaptive speech prediction, in Systems, Man and 582
Cybernetics, 2007 ISIC IEEE International Conference on, 583
2007, pp. 2056–2061. 584
[17] J. Theocharis, A high-order recurrent neuro-fuzzy system with 585
internal dynamics: Application to the adaptive noise cancella- 586
tion, Fuzzy Sets and Systems 157 (2006), 471–500. 587
[18] C.-F. Juang, Y.-Y. Lin and C.-C. Tu, A recurrent self-evolving 588
fuzzy neural network with local feedbacks and its application 589
to dynamic system processing, Fuzzy Sets and Systems 161 590
(2010), 2552–2568. 591
[19] P. Mastorocostas and J.B. Theocharis, A recurrent fuzzy-neural 592
model for dynamic system identification, Systems, Man, and 593
Cybernetics, Part B: Cybernetics, IEEE Transactions on 32 594
(2002), 176–190. 595
Unc
orre
cted
Aut
hor P
roof
12 S. Goudarzi et al. / Interactively recurrent fuzzy functions with multi objective learning
[20] M. Alizadeh and M. Tofighi, Full-adaptive THEN-part596
equipped fuzzy wavelet neural controller design of FACTS597
devices to suppress inter-area oscillations, Neurocomputing598
118 (2013), 157–170.599
[21] Y.-Y. Lin, J.-Y. Chang and C.-T. Lin, Identification and pre-600
diction of dynamic systems using an interactively recurrent601
self-evolving fuzzy neural network, Neural Networks and602
Learning Systems, IEEE Transactions on 24 (2013), 310–321.603
[22] J.T. Connor, R.D. Martin and L.E. Atlas, Recurrent neural604
networks and robust time series prediction, Neural Networks,605
IEEE Transactions on 5 (1994), 240–254.606
[23] K. Deb, A. Pratap, S. Agarwal and T. Meyarivan, A fast and eli-607
tist multiobjective genetic algorithm: NSGA-II, Evolutionary608
Computation, IEEE Transactions on 6 (2002), 182–197.609
[24] F. Takens, Detecting strange attractors in turbulence, in610
Dynamical systems and turbulence, Warwick 1980, ed:611
Springer, 1981, pp. 366–381.612
[25] M.B. Kennel, R. Brown and H.D. Abarbanel, Determining613
embedding dimension for phase-space reconstruction using a614
geometrical construction, Physical Review A 45 (1992), 3403.615
[26] A.M. Fraser and H.L. Swinney, Independent coordinates for616
strange attractors from mutual information, Physical Review617
A 33 (1986), 1134.618
[27] S. Theodoridis and K. Koutroumbas, Pattern Recognition:619
Elsevier Science, 2008.620
[28] I. Gath and A.B. Geva, Unsupervised optimal fuzzy clustering,621
Pattern Analysis and Machine Intelligence, IEEE Transactions622
on 11 (1989), 773–780.623
[29] J.C. Bezdek and J.C. Dunn, Optimal fuzzy partitions: A heuris-624
tic for estimating the parameters in a mixture of normal625
distributions, Computers, IEEE Transactions on 100 (1975),626
835–838.627
[30] T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman628
and R. Tibshirani, The elements of statistical learning vol. 2:629
Springer, 2009.630
[31] J.H. Friedman, Multivariate adaptive regression splines, The631
annals of statistics, 1991, pp. 1–67.632
[32] K. Deb, Multi-objective optimization using evolutionary algo-633
rithms vol. 16: John Wiley & Sons, 2001.634
[33] I. Saha, U. Maulik and D. Plewczynski, A new multi-objective635
technique for differential fuzzy clustering, Applied Soft Com-636
puting 11 (2011), 2765–2776.637
[34] N. Srinivas and K. Deb, Muiltiobjective optimization using 638
nondominated sorting in genetic algorithms, Evolutionary 639
computation 2 (1994), 221–248. 640
[35] K. Deb, Multi-objective genetic algorithms: Problem dif- 641
ficulties and construction of test problems, Evolutionary 642
Computation 7 (1999), 205–230. 643
[36] J. Horn, N. Nafpliotis and D.E. Goldberg, A niched Pareto 644
genetic algorithm for multiobjective optimization, in Evo- 645
lutionary Computation, 1994. IEEE World Congress on 646
Computational Intelligence, Proceedings of the First IEEE 647
Conference on, 1994, pp. 82–87. 648
[37] S. Banerjee, M. Mitra and L. Rondoni, Applications of chaos 649
and nonlinear dynamics in engineering vol. 1: Springer Science 650
& Business Media, 2011. 651
[38] M. Han, M. Xu, X. Liu and X. Wang, Online multivariate time 652
series prediction using SCKF-�ESN model, Neurocomputing 653
147 (2015), 315–323. 654
[39] J. Belair, L. Glass, U. an der Heiden and J. Milton, Dynamical 655
disease: Identification, temporal aspects and treatment strate- 656
gies of human illness, Chaos: An Interdisciplinary Journal of 657
Nonlinear Science 5 (1995), 1–7. 658
[40] R. Butt, Introduction to Numerical Analysis Using 659
MATLAB®: Jones & Bartlett Learning, 2009. 660
[41] S.F. Molaeezadeh and M.H. Moradi, A 2uFunction represen- 661
tation for non-uniform type-2 fuzzy sets: Theory and design, 662
International Journal of Approximate Reasoning 54 (2013), 663
273–289. 664
[42] J.M. Mendel, F. Liu and D. Zhai, -plane representation for 665
type-2 fuzzy sets: Theory and applications, Fuzzy Systems, 666
IEEE Transactions on 17 (2009), 1189–1207. 667
[43] C. Ahlstrom, A. Johansson, P. Hult and P. Ask, Chaotic dynam- 668
ics of respiratory sounds, Chaos, Solitons & Fractals 29 669
(2006), 1054–1062. 670
[44] R. Palaniappan, K. Sundaraj and N.U. Ahamed, Machine 671
learning in lung sound analysis: A systematic review, Biocy- 672
bernetics and Biomedical Engineering 33 (2013), 129–135. 673
[45] P.J.M. Fard, M.H. Moradi and S. Saber, Chaos to randomness: 674
Distinguishing between healthy and non-healthy lung sound 675
behaviour, Australasian Physical & Engineering Sciences in 676
Medicine, 2014, pp. 1–8. 677