GenSoFNN: a generic self-organizing fuzzy neural network ...looney/cs773b/NNself-organizing.pdf ·...
Embed Size (px)
Transcript of GenSoFNN: a generic self-organizing fuzzy neural network ...looney/cs773b/NNself-organizing.pdf ·...
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1075
GenSoFNN: A Generic Self-Organizing FuzzyNeural Network
W. L. Tung and C. Quek, Member, IEEE
AbstractExisting neural fuzzy (neuro-fuzzy) networks pro-posed in the literature can be broadly classified into two groups.The first group is essentially fuzzy systems with self-tuningcapabilities and requires an initial rule base to be specified priorto training. The second group of neural fuzzy networks, on theother hand, is able to automatically formulate the fuzzy rulesfrom the numerical training data. No initial rule base needs to bespecified prior to training. A cluster analysis is first performedon the training data and the fuzzy rules are subsequently derivedthrough the proper connections of these computed clusters.However, most existing neural fuzzy systems (whether they belongto the first or second group) encountered one or more of thefollowing major problems. They are 1) inconsistent rule-base;2) heuristically defined node operations; 3) susceptibility tonoisy training data and the stabilityplasticity dilemma; and 4)needs for prior knowledge such as the number of clusters to becomputed. Hence, a novel neural fuzzy system that is immune tothe above-mentioned deficiencies is proposed in this paper. Thisnew neural fuzzy system is named the generic self-organizingfuzzy neural network (GenSoFNN). The GenSoFNN network hasstrong noise tolerance capability by employing a new clusteringtechnique known as discrete incremental clustering (DIC). Thefuzzy rule base of the GenSoFNN network is consistent andcompact as GenSoFNN has built-in mechanisms to identify andprune redundant and/or obsolete rules. Extensive simulationswere conducted using the proposed GenSoFNN network and itsperformance is encouraging when benchmarked against otherneural and neural fuzzy systems.
Index TermsBackpropagation (BP), compact and consistentrule-base, compositional rule inference (CRI), generic self-orga-nizing fuzzy neural network (GenSoFNN), laser data, learningvector quantization (LVQ), noise tolerance, one-pass learning,rule pruning, traffic modeling and prediction, 2-spiral.
NEURAL fuzzy networks are the realizations of the func-tionality of fuzzy systems using neural networks . Themain advantage of a neural fuzzy network is its ability to modela problem domain using a linguistic model instead of complexmathematical models. The linguistic model is essentially a fuzzyrule base consisting of a set of IFTHEN fuzzy rules that arehighly intuitive and easily comprehended by the human users. Inaddition, the black-box nature of the neural-network paradigmis resolved, as the connectionist structure of a neural fuzzy net-work essentially defines the IFTHEN fuzzy rules. Moreover, aneural fuzzy network can self-adjust the parameter of the fuzzyrules using neural-network-based learning algorithms.
Manuscript received May 9, 2001; revised October 26, 2001.The authors are with the Intelligent Systems Laboratory, Nanyang Technolog-
ical University, School of Computer Engineering, Singapore 639798, Singapore(e-mail: [email protected]).
Publisher Item Identifier S 1045-9227(02)05561-3.
Existing neural fuzzy systems proposed in the literature canbe broadly classified into two groups. The first group is essen-tially fuzzy systems with self-tuning capabilities and requiresan initial rule base to be specified prior to training , . Thesecond group of neural fuzzy networks, on the other hand, isable to automatically formulate the fuzzy rules from the numer-ical training data , , . No initial rule base needs tobe specified prior to training. The main advantage that the lattergroup of neural fuzzy systems has over the former is that they arenot subjected to the Achilles heel of traditional fuzzy systems.This is because the first group of neural fuzzy systems may havedifficulty in obtaining the initial rule base. That is, it may be dif-ficult to verbalize the knowledge of human experts or formalizethem into IFTHEN fuzzy rules if the system is complex. How-ever, most existing neural fuzzy systems (whether they belongto the first or second group) encountered one or more of the fol-lowing major problems. They are 1) inconsistent rule-base; 2)heuristically defined node operations; 3) susceptibility to noisytraining data and the stabilityplasticity dilemma ; and 4)needs for prior knowledge such as the number of clusters to becomputed.
A consistent rule base  is especially important for theknowledge interpretation of a neural fuzzy system. The fuzzyrules extracted from the neural fuzzy network will be meaning-less and/or obscure if a fuzzy label can be represented by morethan one fuzzy set and these fuzzy sets are allowed to evolve dif-ferently during the training phase. In addition, the operations ofthe neural fuzzy network needs to be clearly defined and mappedto formal fuzzy inference schemes such as the compositionalrule of inference (CRI) , approximate analogous reasoningschema (AARS) , or the truth value restriction (TVR) .If not, the inference steps of the neural fuzzy network becomelogically heuristic and mathematically unclear.
The choice of clustering techniques in neural fuzzy networksis also an important consideration. The established pseudoouter-product based fuzzy neural network (POPFNN) familyof networks ,  has weak resistance to noisy/spurioustraining data. This is due to the use of partition-based clusteringtechniques  such as fuzzy -means (FCM) , linear vectorquantization (LVQ)  and LVQ-inspired techniques suchas modified LVQ, fuzzy Kohonen partitioning (FKP) andpseudo FKP  to perform the cluster analysis. Such clusteringtechniques require prior knowledge such as the number ofclusters present in a data set and are not sufficiently flexibleto handle nonpartitionable problems such as the XOR dilemmaand the 2-spiral problem . Generally, neural fuzzy networksthat employ partition-based clustering techniques also lack
1045-9227/02$17.00 2002 IEEE
1076 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002
Fig. 1. Structure of GenSoFNN.
the flexibility to incorporate new clusters of data after thetraining has completed. This is known as the stabilityplasticitydilemma .
Hence, a novel neural fuzzy system that is immune to theabove-mentioned deficiencies is proposed in this paper. Thenew neural fuzzy system is named generic self-organizing fuzzyneural network (GenSoFNN). The GenSoFNN network auto-matically formulates the fuzzy rules from the numerical trainingdata as compared against the ANFIS  and ARIC  modelsand maintains a consistent rule base. Each fuzzy label in theinputoutput dimensions is uniquely represented by only onecluster (fuzzy set). The GenSoFNN network employs a newclustering technique known as discrete incremental clustering(DIC) to enhance its noise tolerance capability. DIC creates sep-arate clusters for noisy/spurious data that have poor correlationto the genuine or valid data and does not require prior knowl-edge of the number of clusters present in the training data set. Inaddition, the proposed GenSoFNN network does not require thepredefinition of the number of fuzzy rules, as the rule formula-tion process is entirely data-driven. GenSoFNN is suitable foron-line applications as its training cycle takes place in a singlepass of the training data.
This paper is organized as follows. Section II describes thegeneral structure of the GenSoFNN and its on-line trainingcycle. Section III presents the GenSoFNN-CRI(S) network thatis developed by mapping the CRI inference scheme onto theGenSoFNN structure. In Section IV, the GenSoFNN-CRI(S)network is evaluated using three different simulations and itsperformances are benchmarked against other neural and neuralfuzzy systems. Section V concludes this paper.
The training cycle of the GenSoFNN network (Fig. 1)consists of three phases: self-organizing, rule formulation, andparameter learning. These are performed sequentially with asingle pass of the training data. The DIC clustering techniqueis developed and incorporated into the GenSoFNN networkto automatically compute the inputoutput clusters from thenumerical training data. The fuzzy rules are subsequentlyformulated by connecting the appropriate input and outputclusters during the rule-mapping phase of the training cycle.Consequently, the popular backpropagation (BP)  learningalgorithm based on negative gradient descent is employed totune the parameters of the GenSoFNN network.
A. Structure of the GenSoFNN
The GenSoFNN network consists of five layers of nodes.Each input node , has a single input.The vector represents theinputs to the GenSoFNN. Each output node , where
, computes a single output denoted by .The vector denotes the outputs ofthe GenSoFNN network with respect to the input stimulus .In addition the vector representsthe desired network outputs required during the parameterlearning phase of the training cycle. The trainable weights of theGenSoFNN network are found in layers 2 and 5 (enclosed in rect-angular boxes in Fig. 1). Layer 2 links contain the parameters ofthe input fuzzy sets while layer 5 links contain the parameters ofthe output fuzzy sets. The weights of the remaining connectionsare unity. The trainable weights (parameters) are interpreted as
TUNG AND QUEK: GenSoFNN: A GENERIC SELF-ORGANIZING FUZZY NEURAL NETWORK 1077
the corners of the trapezoidal-shaped fuzzy sets computed bythe GenSoFNN network. They are denoted as and (left andright support points), and and (left and right kernel points).The subscripts denote the presynaptic and postsynaptic nodes,respectively. For clarity in subsequent discussions, the variables
are used to refer to arbitrary nodes in layers 1, 2, 3, 4and 5, respectively. The output of a node is denoted as andthe subscripts specify its origin.
Each input node may have different number of inputterms . Hence, number of layer 2 nodes is , where
. Layer 3 consists of the rule nodes , where. At layer 4, an output term node
may have more than one fuzzy rule attached to it. Each outputnode in layer 5 can have different number of outputterms . Hence, number of layer 4 nodes is , where
. In Fig. 1, the black solid arrows denotethe links that are used during the feedforward operation ofthe GenSoFNN network. The dashed, grey arrows denote thebackward links used during the self-organizing phase of thetraining cycle of the GenSoFNN. The GenSoFNN networkadopts the Mamdanis fuzzy model  and the th fuzzy rule
has the form as defined in (1)
is and is and
is is and
is and is (1)
the th fuzzy label of the th input that is
connected to and
the th fuzzy label of the th output to
which is connected.
Two motivations drive the development of the GenSoFNN net-work. The first is to define a systematic way of crafting thelinguistic model required in neural fuzzy systems and avoidsthe above-mentioned deficiencies faced by many of the existingneural fuzzy networks. The second motivation is to create a gen-eralized network architecture whereby different fuzzy inferenceschemes such as CRI can be mapped onto such a network withease. This closely relates to our definition of what a neural fuzzynetwork is. That is, a neural fuzzy network is the integrationof fuzzy system and neural network, whereby the operations ofthe hybrid system should be functionally equivalent to a similarstandalone fuzzy system. Hence, the operations and outputs ofthe various nodes in the GenSoFNN network are defined by thefuzzy inference scheme adopted by the network. However, thegeneric operations of the proposed GenSoFNN can be definedas follows. The forward-based aggregation and activation func-tions of each layer are denoted as and , respectively,where . In addition, the label Net defines the ag-gregated input to an arbitrary node.
Layer 3:The feedforward operation is defined as follows:
output of fuzzy label connected to rule
The backward operation (for the self-organizing phase) is de-fined as follows:
backward-based output of fuzzy label
that is connected to rule (the subscripts are
reversed to denote the backward flow of data).
Layer 4:The forward operation is defined as follows:
first rule in GenSoFNN with
as part of its consequent;
th rule in GenSoFNN with
as part of its consequent; and
last rule in GenSoFNN with
as part of its consequent.
The backward operation (for the self-organizing phase) is de-fined as follows. The order of the subscripts has been reversedto reflect the backward operation
backward output of node
Layer 5:The forward operation is defined as follows:
1078 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002
output of node in layer 4.
The backward operation (for the self-organizing phase) is de-fined as follows:
the th desired output for the GenSoFNN network.
The detailed node operations are defined by the fuzzy inferencesystem adopted by the GenSoFNN network. For instance, whenthe CRI  inference scheme utilizing the Mamdanis impli-cation rule  is mapped onto the GenSoFNN network as in, the generic forward operation of rule node as specifiedby (4) are defined as
B. Self-Organization (Clustering) of GenSoFNN
The proposed GenSoFNN network models a problem do-main by first performing a cluster analysis of the numericaltraining data and subsequently deriving the fuzzy rule basefrom the computed clusters. Generally, clustering techniquesmay be classified into hierarchical-based and partition-basedtechniques. Hierarchical-based clustering techniques includedsingle link  and complete link , . The main drawbackof hierarchical clustering is that the clustering is static, andpoints committed to a given cluster in the early stages cannotmove to a different cluster. This violates our vision of a dy-namic neural fuzzy system where the network can self-organizeand self-adapt with changing environments. Prototype-basedpartition clustering techniques, on the other hand, are dynamicand the data points can move from one cluster to another undervarying conditions. However, partition-based clustering tech-niques require prior knowledge such as the number of classes
in the training data. Such information may be unknownand is difficult to estimate in some data set such as trafficflow data . For classification tasks such as the XOR and2-spiral problems, computing a predefined number of clusters
may not be good enough to satisfactorily solve the problems.Moreover, partition-based clustering techniques suffer fromthe stabilityplasticity dilemma  where new informationcannot be learned without running the risk of eroding old (pre-viously learned) but valid knowledge. Hence, such deficienciesserve as the main motivations behind the development of thediscrete incremental clustering (DIC) technique. This newclustering technique is not limited by the need to have priorknowledge of the number of clusters and is able to robustly
Fig. 2. New cluster (fuzzy set) in DIC with respect to the ith input dimension,max(x ) = maximum input and min(x ) = minimum input.
handle noisy/spurious data, as well as preserving the dynamismof partition-based clustering techniques.
1) DIC: The proposed DIC technique uses raw numericalvalues of a training data set with no preprocessing. In the cur-rent implementation, DIC computes trapezoidal-shaped fuzzysets and each fuzzy label (fuzzy set) belonging to the sameinputoutput dimension has little or no overlapping of kernelwith its immediate neighbors. This is similar to but does not havethe same restrictions as a pseudopartition  of the data space.The DIC technique maintains a consistent representation of thefuzzy sets (fuzzy labels) by performing clustering on a localbasis. That is, the number of fuzzy sets for each inputoutputdimension need not be the same. This is similar to the ART concept. However, unlike ART, if the fuzzy label (fuzzyset) for a particular inputoutput dimension already exists, thenit is not recreated. Hence, DIC ensures that a fuzzy label isuniquely defined by a fuzzy set and this serves as a basis to for-mulate a consistent rule base in the GenSoFNN network. Theproposed DIC technique has five parameters: a plasticity param-eter , a tendency parameter TD, an input threshold IT, an outputthreshold OT, and a fuzzy set support parameter SLOPE.
a) Fuzzy Set Support Parameter SLOPE: Each newcluster in DIC begins as a triangular fuzzy set as shown inFig. 2(a). The kernel of a new cluster (fuzzy set) takes the valueof the data point that triggers its creation and its support isdefined by the parameter SLOPE. As training continues, thecluster grows to include more points, but maintains the sameamount of buffer regions on both sides of the kernel [Fig. 2(b)].The same applies for the output clusters.
b) Plasticity Parameter : A cluster grows by ex-panding its kernel. This expansion is controlled by the plasticityparameter . A cluster expands its kernel when it is the best-fitcluster (has the highest membership value) to a data point andthis point has not yet appear within its kernel. The plasticityparameter determines the amount a cluster (fuzzy set)expands its kernel to include the new data point. To satisfy thestabilityplasticity dilemma , the initial value of for allnewly formed inputoutput clusters is preset to 0.5. The valueof its parameter decreases as the cluster expands its kernel.The first quadrant of a cosine waveform (Fig. 3) is used tomodel the change of in a cluster.
The parameter in Fig. 3 is intuitively interpreted as the max-imum expansion a cluster (fuzzy set) can have and a parameterSTEP controls the increment of from 0 to 1.57 rad. Hence,the amount of expansion a cluster can adopt decreases with thenumber of expansions.
c) Tendency Parameter TD: The tendency parameter TDis analogous to a clusters willingness to grow when it is the
TUNG AND QUEK: GenSoFNN: A GENERIC SELF-ORGANIZING FUZZY NEURAL NETWORK 1079
Fig. 3. Modeling of the plasticity parameter .
Fig. 4. Dynamics of the tendency parameter TD.
best fit cluster to a data point that falls outside its kernel. Theparameter TD complements the use of the plasticity parameter
. This is because only decreases with the number of times acluster expand its kernel. The parameter TD maintains the rele-vance of a cluster and prevents it from incorporating too manydata points that has low fitness or membership values to thecluster. Otherwise, the kernel of a cluster may become overlylarge and the semantic meaning of the fuzzy label the clusterrepresents may become obscure and poorly defined. The initialvalue of TD of a newly created cluster is preset at 0.5 and thecluster stops expanding its kernel when TD reaches zero. Therate of decrease depends on the fitness or membership valuesof the data points that the cluster incorporates as shown in (12).With respect to node in Fig. 1
where and membership function ofthe node When TD is less than or equal to zero, thecluster stops growing and sets its plasticity parameter tozero. It must be noted here that has to be less than zero,otherwise TD can never reach or exceed zero. This is becausethe value of the term is in the range [0, 1) (Thecase when is not valid as the point is then notrelevant to the expanding cluster). Here, is defined as 0.5.The dynamics of the parameter TD is illustrated by Fig. 4.
Hence, the less relevant the data points (with small member-ship values) a cluster tries to incorporate or absorb; the faster itsTD decreases and vice versa. Thus, the tendency parameter TDand the plasticity parameter works together to maintain the in-tegrity of the input clusters and the fuzzy labels they represent.The same applies for the output clusters.
d) Thresholds (IT and OT): The input (output) threshold,IT (OT), specifies the minimum fitness or membership valuean input (output) data point must have before it is considered asrelevant to any existing input (output) clusters or fuzzy sets. If
Fig. 5. Effects of IT on clusters for the ith input. (The same applies for OTand the output clusters.)
the membership value of the input (output) data point with re-spect to the existing best fit input (output) cluster falls below thepredefined IT (OT), then a new cluster is created based on thatdata point. In addition, IT (OT) determines the degree of over-lapping of an input (output) cluster with its immediate neighbors(see Fig. 5).
Hence, the larger the preset value of IT (OT), the closer are thecomputed input (output) clusters. In order to prevent excessiveoverlapping of the input (output) clusters (whereby the fuzzy la-bels become obscure or poorly defined), IT (OT) is predefinedat 0.5. The following algorithm performs clustering of the inputspace. (The same applies to clustering of the output space.) Moredetails on the DIC technique is reported in .Algorithm DICAssume data set , where is
the number of training vectors.Vector represents the thinput training vector to the GenSoFNN network.Initialize STEP, and .
.For all training vector do
For all input dimensions doIf there are no fuzzy labels (clusters) in the th inputdimension
Create a new cluster usingElse do
Find the best-fit cluster Winner for using (13).
Where membership function of fuzzy label .If /* Membership value greater thaninput threshold */
Update kernel of Winner/* grows cluster Winner*/Else
Create a new cluster usingEnd If-Else
End For allEnd For all
The parameters used in the DIC technique are constants ex-cept for two: the STEP and SLOPE parameters. In the cur-rent implementation, the selection of the parameters STEP andSLOPE is heuristic and varies with different tasks. However,
1080 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002
there are several guidelines to help in selecting suitable valuesfor these two parameters. A small STEP value results in fatfuzzy sets with large kernels and vice versa. On the other hand,a small SLOPE value results in steep slopes (nearly crisp fuzzysets) and the fuzziness of the fuzzy sets (input and output clus-ters) increases as the value of SLOPE increases.
C. Rule Formulation of GenSoFNN
The fuzzy rules of the GenSoFNN network are formulatedusing a rule mapping process RuleMAP. Under the GenSoFNNframework, input space partition of rule is the col-lective term for all the input fuzzy labels (layer 2 nodes) thatcontribute to the antecedent of rule node (refer to Fig. 1).Similarly, output space partition of rule refers toall the output fuzzy labels (layer 4 nodes) that form the conse-quent of rule node . During the rule mapping process, eachrule , activates its ISP and OSP. For the ISPs,it means a firing of layers 1 and 2 of the GenSoFNN with theinput stimulus feeding into layer 1. To activate the OSPs,layers 4 and 5 are fired with the desired outputs (denoted by thevector ) feeding backward from the output nodes of layer 5.The backward links depicted by the dashed, gray arrows in Fig. 1are used for the activation of the OSPs. For rule , the ag-gregated input due to the activation of its is denoted as
(where from (4)) and the aggregated inputdue to the activation of its is denoted as [where
from (5)]. Two user-defined parameters,and , govern the updating of the fuzzy rules
in GenSoFNN. When a fuzzy rule is updated in GenSoFNN, thelabels (fuzzy sets) in its ISP and OSP grow to incorporate theinput vector and the desired output vector , respectively.An existing rule must satisfy (14) to qualify for update
The flowchart of the rule mapping process RuleMAP with theembedded self-organizing and parameter learning phases is pre-sented as Fig. 6.
The function EstLink identifies the proper connections be-tween the input fuzzy labels (layer 2 nodes), the fuzzy rules(layer 3 nodes) and the output fuzzy labels (layer 4 nodes).Overlapping inputoutput labels are annexed and their respec-tive rules are combined if necessary to maintain a consistent rulebase. A new rule and a new input space partitionare created in tandem to the creation of a new output space parti-tion . This is to prevent the crafting of ambiguous ruleswhere an ISP maps to two or more OSPs (that is, the same con-dition maps to different consequent). This same reason promptsthe creation of a new when both andare connected to different rules. Details on the rule mappingprocess RuleMAP is described in . The process RuleMAPis responsible for the structural learning of the GenSoFNN net-work. The crafted rule base is consistent but not compact, asthere may be numerous redundant and/or obsolete rules. Redun-dant and obsolete rules are the results of the dynamic trainingof the GenSoFNN where the fuzzy sets of the fuzzy rules areconstantly tuned by the backpropagation algorithm. To main-
Fig. 6. Flowchart of RuleMAP.
tain the integrity and accuracy as well as the compactness of therule base, these redundant rules have to be deleted. The deletionof obsolete and redundant rules is performed at the end of eachtraining epoch.
1) Deletion of Redundant and/or Obsolete Rules: Each rulenode is time-stamped with a training epoch number duringits creation. The training epoch number is initialized at zeroprior to the start of the training cycle and increases with the it-eration of the training data set. Whenever a rule is updated,its time-stamp reflects the current training epoch number. Ruleswith time-stamp that is more than a training epoch old are con-sidered as obsolete/redundant rules and are deleted at the end ofthe current training epoch.
D. Parameter Learning of GenSoFNN
The backpropagation learning equations for the parameterlearning phase depends on the fuzzy inference scheme adoptedby the GenSoFNN network. In Section III, the GenSoFNN-CRI(S) network is presented. GenSoFNN-CRI(S) is created bymapping the CRI  inference scheme together with the Mam-dani implication rule  onto the generic structure of the Gen-SoFNN network. Singleton fuzzifiers are implemented in layer1 of the GenSoFNN-CRI(S) network in order to logically mapthe operations of the network to the CRI inference scheme.Hence, the (S) in the networks name refers to the singletonfuzzifiers.
TUNG AND QUEK: GenSoFNN: A GENERIC SELF-ORGANIZING FUZZY NEURAL NETWORK 1081
Section II described the basic structure and training cycle ofthe GenSoFNN network. However, the operation and the outputof the various nodes in the GenSoFNN network have yet tobe defined. This is resolved by mapping an inference schemeonto the basic GenSoFNN architecture. Subsequently, the equa-tions describing the learning operations (of the backpropaga-tion algorithm) for the parameter learning phase in the trainingcycle of the GenSoFNN network can be derived. These equa-tions are used to tune the fuzzy sets of the term nodes in layers2 and 4. The GenSoFNN-CRI(S) network results from map-ping the CRI  reasoning scheme (with Mamdanis impli-cation rule ) onto the basic GenSoFNN architecture. TheGenSoFNN-CRI(S) network has the same structure as the basicGenSoFNN network (Fig. 1). The CRI inference scheme pro-vides a strong fuzzy logic theoretical foundation for the opera-tions of the GenSoFNN-CRI(S) network. This ensures the op-erations of the GenSoFNN-CRI(S) network imitate that of thehuman cognitive process. Please refer to  for details on howthe mappings are performed.
IV. SIMULATION RESULTS AND ANALYSIS
The GenSoFNN-CRI(S) network is evaluated using three dif-ferent simulations: 1) 2-Spiral classification, 2) traffic predic-tion; and 3) time series prediction of a set of laser data. Thebackground of the data sets and the objectives of the simulationsare given in the respective sections. For all the simulations, theparameters specified in Table I are used.
A. 2-Spiral Classification
The 2-spiral classification problem is a complex neural-net-work benchmark task developed by Lang . The task involveslearning to correctly classify the points of two intertwined spi-rals (denoted here as Class 0 and Class 1 spirals, respectively).The two spirals each make three complete turns in a two-dimen-sional (2-D) plane, with 32 points per turn plus an endpoint, to-taling 97 points per spiral (Fig. 7).
Lang et al.  reported that this problem cannot be solvedusing a conventional feedforward neural network based on theBP learning algorithm. Instead, they proposed a special networkwith a 2-5-5-5-1 structure that has 138 trainable weights. In ,the fuzzy ARTMAP system is trained using the standard 2-spiraldata set consisting of 194 points . Evaluation of the fuzzyARTMAP is performed using the training set as well as a testset that consists of two dense spirals, each with 385 points. Forthe evaluation of the proposed GenSoFNN-CRI(S) network, thetraining set is the standard 2-spiral data set consisting of 194points. The test set consists of two dense spirals with 385 pointseach (as in ) and is generated using (15) to (18). For
TABLE IPREDEFINED GenSoFNN-CRI(S) NETWORK PARAMETERS
Fig. 7. 2-spiral problem.
Fig. 8. 2-spiral results of GenSoFNN-CRI(S) versus SLOPE with STEP =0:01.
Class 0 and Class 1. (18)
There are two inputs and a single output. During a training epoch,the outermost Class 0 point is presented first followed by the out-ermost Class 1 point and the sequence continues, alternating be-tween the two spirals and moving toward the center of each spiral.Fig. 8 is drawn to show the effect of the parameter SLOPE on theclassification rate of GenSoFNN-CRI(S) for the 2-spiral task.Both the training and test sets are used in the evaluation.
It is seen that the classification rate of the training set is notaffected by the change in the parameter SLOPE that varies from0.05 to 0.075 and maintains at 100%. That is, all the 194 pointsare correctly classified. However, the classification rate of thetest set decreases rapidly from 100% at to84.3% at . It is probably due to the increasedfuzziness of the clusters (fuzzy sets) that result from a largerSLOPE. As fuzziness of the clusters increases, more uncer-tainty and ambiguity arises between the fuzzy sets (due to gen-tler slopes). Hence, the test set, which contains a higher densityof points packed into two spirals, gives a poorer classificationrate with increasing SLOPE as points appearing between thefuzzy (uncertain) regions of clusters are subjected to high prob-ability of wrong classification. Table II shows the best classi-fication results for the 2-spiral task using GenSoFNN-CRI(S),Fuzzy ARTMAP  and Langs proposed neural structure .
1082 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002
TABLE IIBEST CLASSIFICATION RESULTS IN 2-SPIRAL SIMULATIONS
Lang et al. considered the task as completed when each ofthe 194 points in the two spirals used for training produces anoutput within 0.4 of its target output value. On the other hand,Carpenter et al. used the most stringent criteria to train the fuzzyARTMAP system using the standard 2-spiral data set in order toobtain 100% classification for the dense spirals . As a result,the fuzzy ARTMAP system creates 194 ART categories for thestandard 2-spiral data set that contains 194 points. In compar-ison, GenSoFNN-CRI(S) achieves 100% classification for boththe standard 2-spiral data set as well as the dense spirals withonly 23 fuzzy sets in each of the two input dimensions. This oc-curs when SLOPE is 0.05 and a total of 156 rules are created(as compared to fuzzy ARTMAPs 194 categories). Moreover,the output responses of the GenSoFNN-CRI(S) network to the194 points in the standard 2-spiral data set are all within 0.01 ofthe desired value, as compared to the value of 0.4 specified byLang.
B. Traffic Prediction
This simulation is conducted to evaluate the effectiveness ofthe GenSoFNN-CRI(S) network in universal approximation anddata modeling using a set of traffic flow data. The raw trafficflow data for the simulation was obtained from . The datawere collected at a site (Site 29) located at exit 15 along theeast-bound Pan Island Expressway (PIE) in Singapore (see Ap-pendix A) using loop detectors embedded beneath the road sur-face. There are a total of five lanes at the site, two exit lanesand three straight lanes for the main traffic. For this experiment,only the traffic flow data for the three straight lanes were con-sidered. The traffic data set has four input attributes. The fourattributes are time and the traffic density of the three lanes. Thepurpose of this simulation is to model the traffic flow trend atthe site using the GenSoFNN-CRI(S) network. The trained Gen-SoFNN-CRI(S) network is then used to obtain prediction forthe traffic density of a particular lane at a time , where
and min. Fig. 9 shows a plot of the trafficflow density data for the three straight lanes spanning a periodof six days from 5th to 10th September 1996.
For the simulation, three cross-validation groups of trainingand test sets are used. They are CV1, CV2, and CV3. Thetraining windows are labeled as such in Fig. 9. The square ofthe Pearson product-moment correlation value (denoted as ) is used to compute the accuracy of the predicted traffictrends obtained using the GenSoFNN-CRI(S) network. Thepredictions made by the GenSoFNN-CRI(S) network forto minutes for lane 1 traffic density (using CV1 as trainingset) are shown in Fig. 10. The squared errors of each predictionare also included for analysis.
When prediction is made at 5-min intervals, the predictedtrend follows very closely to the actual traffic density trend,
Fig. 9. Traffic density of three straight lanes along PIE.
hence a high accuracy index of 0.740 678 [Fig. 10(a)]. TheGenSoFNN-CRI(S) is able to predict the peaks and the troughsof the traffic density of lane 1 accurately. However, as the timeinterval increases, more errors are observed in the subse-quent predicted trends. This is illustrated by Fig. 10(b)(e). Theincreased errors are particularly more obvious at the peaks andtroughs of the predictions for and min. The in-creased errors for longer time intervals are expected as moreuncertainties set in for larger values of . The squared errorsof all the predictions also show that more errors are expectedat the peaks than the troughs. This is probably due to the sharptransitions and oscillations in the traffic density characterizingthe peaks. The mean squared errors (MSEs) for the differentpredictions in Fig. 10(a)(e) at to min are shown byFig. 10(f). Hence, Fig. 10(f) shows that the accuracy of the pre-dictions decreases as the time interval increases.
The same set of experiment is repeated using the multilayerperceptron (MLP)  network with four input nodes, tenhidden nodes and one output node. The structure of the MLPis decided after several experiments. The bipolar sigmoidalfunction with an output range of [ 1, 1] is used as the activationfunction for the hidden and output nodes. The traffic densitydata set has to be normalized to a range of [0, 1] to fit the datapoints into the output range of the bipolar sigmoidal function.The input nodes simply relay the input signals to the hiddennodes. The MLP network is trained with the backpropagationalgorithm. During the prediction phase, the network functionsin a feedforward mode. The average accuracy of the predictions(denoted as ) by GenSoFNN-CRI(S) and the MLPacross the three cross-validation groups (CV1, CV2 and CV3)as increases from 5 to 60 min for all the three lanes are shownin a plot as Fig. 11.
Comparing against the results of the GenSoFNN-CRI(S), theMLP achieves better predictions initially (when andmins). However, for predictions when to mins, theMLP network experienced a drastic drop in its accuracy. Thisdrop in the accuracy of the predictions is obvious among all thethree lanes. In addition, the trained MLP is a black box and thelinguistic rules defining the traffic flow pattern cannot be ex-tracted from it. Hence, the GenSoFNN-CRI(S) network may notbe as accurate as the MLP in the initial predictions, but it offersa better overall performance from to mins. In addition,an intuitive set of fuzzy rules can be extracted from its trainedstructure to describe the dynamics of the traffic conditions. Thiscontrast in prediction accuracy for different time intervals also
TUNG AND QUEK: GenSoFNN: A GENERIC SELF-ORGANIZING FUZZY NEURAL NETWORK 1083
Fig. 10. Prediction of lane 1 density using GenSoFNN-CRI(S) (Training set is CV1).
Fig. 11. Avg R versus time interval for GenSoFNN-CRI(S) and MLP(4-10-1).
reveals that the proposed DIC technique has a better noise han-dling capability than the MLP network. The results of the Gen-SoFNN-CRI(S) network are subsequently benchmarked againstthat of other neural and neural fuzzy systems in Table III.
Two indicators are used for the benchmarking of the varioussystems. The first indictor is Var (the change in valuefrom min to min expressed as a percentage of theformer) and the second indictor is Avg Var, the mean Varvalues across all three lanes. These two indicators reflect theconsistency of the predictions made by the benchmarked sys-tems over the time interval when changes from 5 to 60 minacross the three lanes. Table III shows that the performance ofthe GenSoFNN-CRI(S) network compares favorably against theFalcon-class of networks  and the GAMFFRC system , aGA (Genetic algorithm) ,  based system that is able toautomatically construct fuzzy membership functions and fuzzy
TABLE IIIBENCHMARKING OF SIMULATION RESULTS OF TRAFFIC PREDICTION
rules from numerical data. However, an analysis of the detailedresults in  and  reveal that the accuracy of the predictionsis poor as compared to that of the GenSoFNN-CRI(S) network.Moreover, the results of the GAMFFRC system cannot be easilyreproduced, as GA is a search paradigm that derives its strengthfrom randomness. In addition, the numbers of fuzzy rules andfuzzy labels have to be predefined in the GAMFFRC systemand the fuzzy rules derived by the GAMFFRC system may notbe consistent, as there is no control over the evolution of thefuzzy labels. Table III shows that GenSoFNN-CRI(S) has supe-rior performance to all the networks except Falcon-MART .However, Falcon-MART uses more than 200 rules for the sim-ulation as compared to the 120130 fuzzy rules derived by theGenSoFNN-CRI(S) network.
C. Time Series Prediction Using Laser Data
The third simulation uses a set of laser data  from theSanta Fe Institute (hereby denoted as SFI) time series predic-tion and analysis competition . The laser data is publiclyavailable at SFI . The original laser data set from the SFIcompetition consists of 1000 observations of the fluctuations in
1084 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002
Fig. 12. Laser data used in the SFI competition.
a far-infrared (FIR) laser. The task is to use these 1000 pointsand try to predict the continuation of the series for the next 100points (i.e., observations at 1001st1100th time instants). Theplot of the first 1000 observations in the laser data set is shownas Fig. 12.
The laser data time series has relatively simple oscillations(though gradually increasing) but has global events that arehard to predict (the rapid decay of the oscillations). In order totrain the GenSoFNN-CRI(S) network for the prediction task, atraining set is generated using the 1000 points in the laser dataset. This training set consists of five inputs and one output. Thedata points in the training set are governed by
where is the next output to be predicted,is the set of the five past observations, and is the nonlinear
function relating the next observation to the five past observa-tions which the GenSoFNN-CRI(S) tries to model. Hence,acts as the output in the training set while
forms the inputs, and .The predicted continuation of the time series by GenSoFNN-
CRI(S) for the laser data for the next 100 time instants is givenin Fig. 13. The true continuation to the laser series is shown asthe darker of the two plots.
The predicted continuation of the laser series by the Gen-SoFNN-CRI(S) network is accurate except at the point ofrapid decay of the oscillations. However, the general trend ofthe decay (as seen by the similar transitions) is still capturedby GenSoFNN-CRI(S) and the continuation of the predictionthereafter is fairly accurate as shown by Fig. 13. A measure ofprediction accuracy is given by the normalized mean-squarederror (NMSE) 
true value of the th point of the series of length
variance of the true time series during the
A value of NMSE corresponds to simply predicting the av-erage of the time series. The NMSE value is used as an indicatorto benchmark against the results reported in .
As can be seen from Table IV, the prediction by Gen-SoFNN-CRI(S) is more accurate than all the feedforwardnetworks of various structures but inferior to the systems
Fig. 13. Continuation of the laser series of length 100 by GenSoFNN-CRI(S).
TABLE IVBENCHMARKING OF SIMULATION RESULTS OF LASER SERIES
proposed by Sauer (DCE) and Wan (IDL) . A major factorcontributing to the large NMSE value of 0.244 as comparedagainst DCEs 0.080 and IDLs 0.028 is the failure by theGenSoFNN-CRI(S) to closely approximate the characteristicof the laser series at the point of rapid decay in the oscillations.This is probably because the DCE and IDL structures are betterequipped to capture distinct temporal information, which isvery significant in prediction task with sudden, varying globalevents as in the case of the rapid decay in oscillations for thelaser data time series.
In this paper, a novel neural fuzzy architecture namedGenSoFNN is proposed. The CRI fuzzy inference scheme using Mamdanis implication  is subsequently incorpo-rated into the general structure of the proposed GenSoFNNnetwork to create the GenSoFNN-CRI(S) network. In this way,the operations and outputs of the various nodes in the Gen-SoFNN-CRI(S) network are clearly defined and have a strongfuzzy logic foundation. This ensures that the functionality ofthe GenSoFNN-CRI(S) network is similar to that of the humancognitive process.
The key strength of the GenSoFNN-CRI(S) network is thatan intuitive and consistent fuzzy rule base describing the dy-namics (behavior) of the problem domain can be extracted fromits trained structure. The GenSoFNN-CRI(S) network maintainsa consistent fuzzy rule base by ensuring that each fuzzy label inthe rule base is represented by only one cluster (fuzzy set). Thefuzzy rule base formulated by the GenSoFNN-CRI(S) networkis consistent as well as compact. That is because redundant orobsolete fuzzy rules are pruned off at the end of each trainingepoch. This maintains the integrity of the derived fuzzy rulebase and ensures that the dynamics (behavior) of the problemdomain is properly modeled. The GenSoFNN-CRI(S) networkemploys a new clustering technique called DIC to compute thetrapezoidal-shaped fuzzy sets during its self-organization phase.DIC is superior to partition-based clustering techniques such asLVQ  and FCM . It does not require prior knowledge of
TUNG AND QUEK: GenSoFNN: A GENERIC SELF-ORGANIZING FUZZY NEURAL NETWORK 1085
Fig. 14. (a) Location of Site 29 along PIE (Singapore). (b) Actual site at exit 15.
the number of classes in the data set and has good noise-toler-ance capability.
In addition, the GenSoFNN-CRI(S) network does not requirean initial rule base to be specified prior to training, which is aprerequisite for systems like ANFIS  and ARIC . Thisremoves the tedious task of having to translate subconsciousknowledge of a problem domain into IFTHEN fuzzy rules. TheGenSoFNN-CRI(S) network has great flexibility to include newtraining data and new fuzzy rules even after training terminates.This is a great contrast to the POPFNN ,  class of net-works where the rule base is fixed after training. In the latter,incorporating new fuzzy rules or new clusters of data usuallymean a retraining for the entire system.
The performance of the GenSoFNN-CRI(S) network is evalu-ated using three simulations: 1) 2-Spiral classification; 2) trafficprediction; and 3) time series prediction using a set of laser data.The results of the GenSoFNN-CRI(S) network have been en-couraging when benchmarked against other neural fuzzy sys-tems and traditional systems such as the MLP network.
The site location (Site 29) at which traffic flow data for thesecond experiment is collected is as shown in Fig 14. The arrowsshow the direction of traffic flow.
 K. K. Ang, H. C. Quek, and M. Pasquiera, POPFNN-CRI(S)PseudoOuter Product-Based Fuzzy Neural Network Using the CompositionalRule of Inference and Singleton Fuzzifier, IEEE Trans. Syst., Man, Cy-bern. B, 2002, to be published.
 H. Benkirane, M. A. Kbir, K. Maalmi, and R. Benslimane, Hierarchicalfuzzy partition for pattern classification with fuzzy if-then rules, Pat-tern Recognition Lett., vol. 21, pp. 503509, 2000.
 H. R. Berenji and P. Khedkar, Learning and tuning fuzzy logic con-trollers through reinforcements, IEEE Trans. Neural Networks, vol. 3,pp. 724740, Sept. 1992.
 J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algo-rithms. New York: Plenum, 1981.
 G. A. Carpenter, S. Grossberg, N. Markuzon, J. H. Reynolds, and D.B. Rosen, Fuzzy ARTMAP: A neural network architecture for incre-mental supervised learning of analog multidimensional maps, IEEETrans. Neural Networks, vol. 3, pp. 698713, Sept. 1992.
 W. J. Chen, GA-Optimized Fuzzy Systems, M.Eng. Thesis, SchoolComput. Eng., Nanyang Technol. Univ., Singapore, 2000.
 H. Frigui and R. Krishnapuram, A robust competitive clustering algo-rithm with applications in computer vision, IEEE Trans. Pattern Anal.Machine Intell., vol. 21, pp. 450465, May 1999.
 R. N. Goldman and J. S. Weinberg, Statistics: An Introduc-tion. Englewood Cliffs, NJ: Prentice-Hall, 1985.
 J. C. Gower and G. J. S. Ross, Minimum spanning trees and singlelinkage cluster analysis, Appl. Statist., vol. 18, pp. 5464, 1969.
 S. Grossberg, Adaptive pattern classification and universal recoding: II.Feedback, expectation, olfaction, illusions, Biol. Cybern., vol. 23, pp.187202, 1976.
 U. Hubner, N. B. Abraham, and C. O. Weiss, Dimensions and entropiesof chaotic intensity pulsations in a single-mode far-infrared Nh3 laser,Phys. Rev. A, vol. 40, p. 6354, 1989.
 H. Ishibuchi, K. Nozaki, N. Yamamoto, and H. Tanaka, Selecting fuzzyif-then rules for classification problems using genetic algorithms, IEEETrans. Fuzzy Syst., vol. 3, pp. 260270, Aug. 1995.
 J. S. Jang, ANFIS: Adaptive-network-based fuzzy inference systems,IEEE Trans. Syst., Man, Cybern., vol. 23, pp. 665685, Sept. 1993.
 S. C. Johnson, Hierarchical clustering schemes, Psychometrika, vol.323, pp. 241254, 1967.
 T. K. Kohonen, Self-organized formation of topologically correct fea-ture maps, Biol. Cybern., vol. 43, pp. 5969, 1982.
 K. J. Lang and M. J. Witbrock, Learning to tell two spirals apart, inProc. 1988 Connectionist Models Summer School, 1988, pp. 5259.
 C. T. Lin and C. S. G. Lee, Neural Fuzzy SystemsA Neuro-Fuzzy Syn-ergism to Intelligent Systems. Upper Saddle River, NJ: Prentice-Hall,1996.
 C. J. Lin and C. T. Lin, An ART-based fuzzy adaptive learning controlnetwork, IEEE Trans. Fuzzy Syst., vol. 5, pp. 477496, Aug. 1997.
 R. L. Mantaras, Approximate Reasoning Models. Chichester, WestEssex, U.K.: Ellis Horwood Limited, 1990.
 E. H. Mamdani, Application of fuzzy logic to approximate reasoningusing linguistic systems, IEEE Trans. Comput., vol. C-26, pp.11821191, Nov. 1977.
 D. Nauck, F. Klawonn, and R. Kruse, Foundations of Neuro-Fuzzy Sys-tems. New York, : Wiley, 1997.
 H. C. Quek and R. W. Zhou, POPFNN: A pseudo outer-product basedfuzzy neural network, Neural Networks, vol. 9, no. 9, pp. 15691581,1996.
 , POPFNN-AARS(S): A pseudo outer-product based fuzzy neuralnetwork, IEEE Trans. Syst., Man, Cybern., vol. 29, pp. 859870, Nov.1999.
 H. C. Quek and W. L. Tung, A novel approach to the derivation offuzzy membership functions using the falcon-MART architecture, Pat-tern Recognition Lett., vol. 22, no. 9, pp. 941958, 2000.
 , PACL-FNNS: A novel class of falcon-like fuzzy neural networksbased on positive and negative exemplars, in Intelligent System Tech-niques and Applications. Boca Raton, FL: CRC, 2000.
 D. E. Rumelhart, G. E. Hinton, and R. J. Williams et al., Learning in-ternal representations by error propagation, in Parallel Distributed Pro-cessing, D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA:MIT Press, 1986, vol. 1, ch. 8.
 Repository for the laser data. [Online]. Available: ftp.santafe.edu S. C. Shin and S. B. Park, GA-based predictive control for nonlinear
processes, Electron. Lett., vol. 34, no. 20, pp. 19801981, 1998.
1086 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002
 G. K. Tan, Feasibility of Predicting Congestion States with NeuralNetworks, Final Year Project Thesis, School Civil Structural Eng.,Nanyang Technol. Univ., Singapore, 1997.
 W. L. Tung, A Generalized Platform for Fuzzy Neural Network,School Comput. Eng, Nanyang Technol. Univ., Singapore, Tech. Rep.,ISL-TR-01/01, 2001.
 W. L. Tung and C. Quek, Derivation of GenSoFNN-CRI(S) From CRI-Based Fuzzy System, School Comput. Eng., Nanyang Technol. Univ.,Singapore, Tech. Rep., ISL-TR-04/01, 2001.
 I. B. Turksen and Z. Zhong, An approximate analogical reasoningscheme based on similarity measures and interval valued fuzzy sets,Fuzzy Sets Syst., vol. 34, pp. 323346, 1990.
 L. X. Wang and J. M. Mendel, Generating fuzzy rules by learning fromexamples, IEEE Trans. Systs., Man, Cybern., vol. 22, pp. 14141427,Nov. 1992.
 A. S. Weigend and N. A. Gershenfeld, Time series prediction:Forecasting the future and understanding the past, presented at theNat. Advanced Res. Workshop Comparative Time Series Anal., A.S. Weigend and N. A. Gershenfeld, Eds., Santa Fe, New Mexico,1992.
 L. A. Zadeh, Calculus of fuzzy restrictions, in Fuzzy Sets and TheirApplications to Cognitive and Decision Processes. New York: Aca-demic, 1975, pp. 139.
W. L. Tung received the B.A.Sc. degree in computerengineering from Nanyang Technological University,Singapore, Singapore, in 2000. He is currently pur-suing the Ph.D. degree at the same university.
His research interests include neural networks,fuzzy systems, fuzzy rule-based systems, and geneticalgorithms.
C. Quek (M83) received the B.Sc. degree in elec-trical and electronics engineering and the Ph.D. de-gree in intelligent control from Heriot Watt Univer-sity, Edinburgh, U.K.
He is an Associate Professor and a Member of theIntelligent Systems Laboratory, School of ComputerEngineering, Nanyang Technological University,Singapore. His research interests include intelligentcontrol, intelligent architectures, artificial intelli-gence in education, neural networks, fuzzy systems,fuzzy rule-based systems, and genetic algorithms.
Index: CCC: 0-7803-5957-7/00/$10.00 2000 IEEEccc: 0-7803-5957-7/00/$10.00 2000 IEEEcce: 0-7803-5957-7/00/$10.00 2000 IEEEindex: INDEX: ind: Intentional blank: This page is intentionally blank