Multi-criterion Pareto based particle swarm optimized...

22
COMPUTER SCIENCE REVIEW 3 (2009) 19–40 available at www.sciencedirect.com journal homepage: www.elsevier.com/locate/cosrev Survey Multi-criterion Pareto based particle swarm optimized polynomial neural network for classification: A review and state-of-the-art S. Dehuri a,* , S.-B. Cho b a Department of Information and Communication Technology, Fakir Mohan University, Vyasa Vihar, Balasore-756019, Orissa, India b Soft Computing Laboratory, Department of Computer Science, Yonsei University, 262 Seongsanno, Seodaemun-gu, Seoul 120-749, Republic of Korea ARTICLE INFO Article history: Received 5 July 2008 Received in revised form 14 November 2008 Accepted 14 November 2008 ABSTRACT In this paper, we proposed a multi-objective Pareto based particle swarm optimization (MOPPSO) to minimize the architectural complexity and maximize the classification accuracy of a polynomial neural network (PNN). To support this, we provide an extensive review of the literature on multi-objective particle swarm optimization and PNN. Classification using PNN can be considered as a multi-objective problem rather than as a single objective one. Measures like classification accuracy and architectural complexity used for evaluating PNN based classification can be thought of as two different conflicting criterions. Using these two metrics as the criteria of classification problem, the proposed MOPPSO technique attempts to find out a set of non-dominated solutions with less complex PNN architecture and high classification accuracy. An extensive experimental study has been carried out to compare the importance and effectiveness of the proposed method with the chosen state- of-the-art multi-objective particle swarm optimization (MOPSO) algorithm using several benchmark datasets. A comprehensive bibliography is included for further enhancement of this area. c 2008 Elsevier Inc. All rights reserved. 1. Introduction Classification is one of the most studied tasks in Data Mining and Knowledge Discovery in Databases (DM and KDD) [1– 7], pattern recognition [8–12], image processing [13–15] and bio-informatics [16–21]. In solving classification tasks, the classical algorithm such as PNN [22,23] and its variants [24] try to measure the performance by considering only one evaluation criterion, i.e. classification accuracy. However, * Corresponding author. Tel.: +82 2 2123 3877; fax: +82 2 365 2579. E-mail addresses: [email protected] (S. Dehuri), [email protected] (S.-B. Cho). one more important criterion like architectural complexity embedded in PNN is being completely ignored. The PNN architecture takes more computation time as the partial description (PDs) grows over the training period layer-by- layer and makes the network more complex. Furthermore, the complexity of PNN is based on the parameters such as the number of input variables, the order of the polynomial, the number of layers of the polynomial neural network and the number of PDs in a layer. These all together constitute 1574-0137/$ - see front matter c 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.cosrev.2008.11.002

Transcript of Multi-criterion Pareto based particle swarm optimized...

Page 1: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0

available at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/cosrev

Survey

Multi-criterion Pareto based particle swarm optimizedpolynomial neural network for classification: A review andstate-of-the-art

S. Dehuria,∗, S.-B. Chob

aDepartment of Information and Communication Technology, Fakir Mohan University, Vyasa Vihar, Balasore-756019, Orissa, Indiab Soft Computing Laboratory, Department of Computer Science, Yonsei University, 262 Seongsanno, Seodaemun-gu, Seoul 120-749,Republic of Korea

A R T I C L E I N F O

Article history:

Received 5 July 2008

Received in revised form

14 November 2008

Accepted 14 November 2008

A B S T R A C T

In this paper, we proposed a multi-objective Pareto based particle swarm optimization

(MOPPSO) to minimize the architectural complexity and maximize the classification accuracy

of a polynomial neural network (PNN). To support this, we provide an extensive review

of the literature on multi-objective particle swarm optimization and PNN. Classification

using PNN can be considered as a multi-objective problem rather than as a single objective

one. Measures like classification accuracy and architectural complexity used for evaluating PNN

based classification can be thought of as two different conflicting criterions. Using these

two metrics as the criteria of classification problem, the proposed MOPPSO technique

attempts to find out a set of non-dominated solutions with less complex PNN architecture

and high classification accuracy. An extensive experimental study has been carried out to

compare the importance and effectiveness of the proposed method with the chosen state-

of-the-art multi-objective particle swarm optimization (MOPSO) algorithm using several

benchmark datasets. A comprehensive bibliography is included for further enhancement

of this area.c© 2008 Elsevier Inc. All rights reserved.

d

1. Introduction

Classification is one of the most studied tasks in Data Miningand Knowledge Discovery in Databases (DM and KDD) [1–7], pattern recognition [8–12], image processing [13–15] andbio-informatics [16–21]. In solving classification tasks, theclassical algorithm such as PNN [22,23] and its variants [24]try to measure the performance by considering only oneevaluation criterion, i.e. classification accuracy. However,

∗ Corresponding author. Tel.: +82 2 2123 3877; fax: +82 2 365 2579.E-mail addresses: [email protected] (S. Dehuri), [email protected]

1574-0137/$ - see front matter c© 2008 Elsevier Inc. All rights reservedoi:10.1016/j.cosrev.2008.11.002

onsei.ac.kr (S.-B. Cho).

one more important criterion like architectural complexityembedded in PNN is being completely ignored. The PNNarchitecture takes more computation time as the partialdescription (PDs) grows over the training period layer-by-layer and makes the network more complex. Furthermore,the complexity of PNN is based on the parameters such asthe number of input variables, the order of the polynomial,the number of layers of the polynomial neural network andthe number of PDs in a layer. These all together constitute

.

Page 2: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

20 C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0

the PNN architecture more complex. The implicit criterion(i.e. architectural complexity) is one of the main bottlenecksof PNN to use it for harvesting knowledge from large datasets.However, the pattern recognition and image processingcommunity can tolerate its architectural complexity due totheir small-scale datasets. Therefore, to make this networkas a useful tool for data mining we need to minimize thecomplexity of the architecture without compromising theclassification accuracy. In order for acquaintance in PNN wegave a comprehensive study on PNN architecture and itsstate-of-the-art in the first part of the article.

Decision-makers here desire solutions that simultane-ously optimize multiple objectives such as architecturalcomplexity and classification accuracy and obtain an ac-ceptable tradeoff amongst objectives. Multi-criteria problemsoften characterize a range of solutions, none of which domi-nate the others with respect to the multiple objectives. Thesespecify the Pareto-frontier of non-dominated solutions, eachoffering a different level of tradeoff. Multi-objective geneticalgorithms (MOGAs) [25–30] are a popular approach to con-fronting these types of problem. The use of genetic algorithms(GAs) [31–33] as a tool of preference is due to such problemsbeing typically complex, with both a large number of param-eters to be adjusted, and several objectives to be optimized.GAs, which can maintain a population of solutions, are in ad-dition able to explore several parts of the Pareto front simul-taneously. Particle swarm optimization (PSO) [34–40] similarlyhas these characteristics. So, given the promising results re-ported in the literature [41] comparing PSO to GA techniquesin the single-objective domain; a transfer of PSO to the MOdomain seems an expected headway.

The architectural complexity, an important criterion ofPNN, is reduced based on the following assumption. Theproposed PNN model is a three-layer architecture: the inputlayer contains only the input features, the hidden layercontains PDs and the output layer contains only one neuron.We can select an optimal set from the PD’s generated in thehidden layer along with the input features using MOPPSO as adriving force to further decrease the architectural complexity.This optimal set of features is fed to the output layer. Further,the proposed MOPPSO technique implicitly optimize theweight between hidden layer and output layer. Furthermore,to support the designmethodology of the proposed techniquewe review and give several multi-objective particle swarmoptimization (MOPSO) methods and their applications in thesecond part of the paper.

In a nutshell, the paper is organized into three parts:(i) review of PNN (ii) a comprehensive survey of MOPSO andits applications and (iii) proposed method for classification.Fig. 1 shows a hierarchical diagrammatic view of the threemain wings of this article.

The remainder of this paper is organized as follows:Section 2 describes the design and analysis of basicarchitecture and algorithmic view of PNN and its state-of-the-art; In Section 3 we provide some basic conceptsof multi-objective optimization and reviewed the currenttheoretical and practical development of PSO for multi-objective problems. The proposed method is formulated anddiscussed in Section 4; In Section 5 experimental studyand comparative performance of the proposed method ispresented; Section 6 draws the conclusions with a few linesof open research.

Fig. 1 – A hierarchical view of the components of thisarticle.

2. Polynomial Neural Network

2.1. PNN architecture

The PNN architecture is based on the Group Method ofData Handling (GMDH) [42–44]. GMDH [45–49] was developedby Ivakhnenko in late 1960s for identifying non-linearconnections between input and output variables, inspired bythe form of Kolmogorov–Gabor polynomial. However, thereare several drawbacks associated with the GMDH such asits limited generic structure, overly complex network, etc,and hence prompted a new class of neural networks knownas polynomial neural networks (PNNs). In summary, thesenetworks come with a high level of flexibility as each node(PD) can have a different number of input variables as well asexploit a different order of polynomial (say linear, quadratic,cubic, etc). Unlike neural networks whose topologies arecommonly decided prior to all detailed (parametric) learning,the PNN architecture is not fixed in advance but becomes fullyoptimized (both structurally and parametrically).

Even though various types of topologies of the PNN areavailable [50], here we explain the basic architecture ofthe PNNs for familiarity. The PNN architecture utilizes aclass of polynomials such as linear, quadratic, cubic, etc.By choosing the most significant number of variables andan order of the polynomial available, we can obtain thebest ones from the extracted PDs according to the selectednodes of each layer. Additional layers are generated untilthe best performance of the extended model is obtained.Such methodology leads to an optimal PNN structure. Letus assume that the input–output of the data is given in thefollowing form:

(Xi, yi) = (xi1, xi2, . . . , xim, yi), where i = 1,2,3, . . . , n, n is thenumber of samples and m is the number of features. Inmatrixform it is represented as follows:

x11 x12 · · · x1m : y1x21 x22 · · · x2m : y2· · · · ·

xn1 xn2 · · · xnm : yn

.

The input–output relationship of the above data by PNNmodel can be described in the following manner: y =

f(x1, x2, . . . , xm).

Page 3: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0 21

Fig. 2 – Generalized architecture of PNN.

Fig. 3 – Basic building blocks of PNN architecture.

The estimated output of variables can be approximatedby Volterra functional series, the discrete form of which isKolmogorov–Gabor Polynomial [9] and is written as follows.

y = a0 +∑

1≤i≤m

aixi +∑

1≤i,j≤m

aijxixj

+

∑1≤i,j,k≤m

aijkxixjxk + · · · (1)

where ak denotes the coefficients or weights of theKolmogorov–Gabor polynomial and x is an input variable. Thearchitecture of the basic PNN is shown in Fig. 2. The basicbuilding block (i.e., a PD) of the PNN model is shown in Fig. 3.

To compute the value of output y, we construct a PD foreach possible pair of independent variables. For example, ifthe number of independent variables is m, then the totalnumber of possible PDs is mC2 =

m!2!(m−2)!

.One can determine the parameters of PDs by the method

of least square fit by using the training samples. Furthermore,we choose an optimal set of PDs from the first layer basedon their predictive performance and construct a new setof PDs for the next layer of PNN and repeat this operationuntil stopping criterion is met. Once the final layer PDs hasbeen constructed, the node that shows the best performanceis selected as the output node and the remaining PDs arediscarded. Furthermore, by back tracking, the nodes of theprevious layers that do not have influence on the output nodePD are deleted.

2.2. High-level algorithm of PNN

The algorithm of PNN is described as the following sequenceof steps:

1. Determine the system’s input variables and if needed carryout the normalization of input data.

2. Partition the given samples into training and testingsamples: the input–output dataset is divided into twoparts: training and test part. Training part is denoted asTR and test part is denoted as TS. Let the total number of

samples be n. Then obviously we can write n=TR+TS. Usingtraining part we construct the PNN model (including anestimation of coefficients of the PDs of every layer of PNN)and test data are used to evaluate the estimated PNN.

3. Select a structure of the PNN: The structure of the PNNis selected based on the number of input variables andthe order of PDs in each layer. The PNN structures canbe categorized into two types, namely a basic PNN and amodified PNN. In the case of basic PNN the number of theinput variables of PDs is the same in every layer, whereasinmodified PNN the number of input variables of PDs variesfrom layer to layer.

4. Generate PDs: In particular, we select the input variablesof a node from m input variables x1, x2, . . . , xm. Thetotal number of PDs located at the current layer differsaccording to the number of the selected input variablesfrom the nodes of the preceding layer. This results in c =m!/r!(m− r)! nodes, where r is the number of chosen inputvariables. The choice of the input variables and the order ofa PD is very important to select the bestmodel with respectto the characteristics of the data, model design strategy,non-linearity and predictive capability.

5. Estimate the coefficient of the PD: The vector ofcoefficients Ea = (a0, a1, a2, a3, a4, a5) is derived by minimi-zing the mean squared error between yi and yji,

E =1

TR

TR∑i=1

(yi − yji)2,

where yji = a0+a1xp+a2xq+a3x2p+a4x2q+a5xpxq, 1 ≤ p, q ≤m, j = 1,2,3, . . . , m(m − 1)/2, is computed on the basis oftwo input variables (i.e., r = 2) and 2nd order polynomial.

In order to find out the coefficients, we need tominimize the error criterion E. Differentiate E with respectto all the coefficients, we get the set of linear equations. Inmatrix form we can write as

Y = X.A,

equivalently, XT.Y = XT.X.A ⇒ A = (XT.X).XT.Y.The details description of these matrices are defined inAppendix.

This procedure is implemented repeatedly for all nodesof the layer and also for all layers of PNN starting from theinput layer and moving towards the output layer.

Further the following simple algorithms can find outthe index of the input features

⟨p, q

⟩,1 ≤ p, q ≤ m, p 6= q

for each PDk, k = 1(1)m(m − 1)/2 by assuming the numberof layers for which we compute the PDs be one.

Features (p,q)_PDk{

k = 1;FOR i = 1 to m− 1

FOR j = i+ 1 to mThen PD1

k receives input from the features p = iand q = j;

k = k+ 1;END FOR

END FOR}

Page 4: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

22 C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0

Alternatively the index k for each PD can be computeddirectly from the tuple

⟨p, q

⟩,1 ≤ p, q ≤ m, p 6= q by the

following formula:

k =

( p∑t=1

(m− t)

)−(m− q

).

6. Select PDs with best predictive capability:Each PD is estimated and evaluated using both the

training and testing data sets. Using the evaluated values,choose PDs which give the best predictive performancefor the output variable. Normally we use a prespecifiedcutoff value of the performance for all PDs. In order to beretained at the next generation the PD has to exhibit itsperformance above the cutoff value.

7. Check the Stopping Criterion:Two termination methods can be exploited.

7.1. The following stopping condition indicates that anoptimal PNN model has been accomplished at theprevious layer, and the modeling can be terminated.This condition reads asEc ≥ Ep,

where Ec is a minimal identification error of thecurrent layer, and Ep denotes a minimal identificationerror that occurred at the previous layer.

7.2. The PNN algorithm terminates when the number ofiterations (predetermined by the designer) is reached.When setting up a stopping (termination) criterion,one should be prudent in achieving a balance betweenmodel accuracy and an overall computational com-plexity associated with the development of themodel.

If any of the above two criteria fails, then go to step 4 elseexecute the step number 8.

8. Post processing steps: Once the final layer is constructed(i.e., the else part of step 7 is executed) then the nodeexhibit best performance can be chosen as the outputnode. All the remaining nodes in that layer are discarded.Further all the nodes of the previous layers that do nothave influence on the estimated output node are alsoremoved by back tracing.

2.3. Recent developments of PNN

In this section, we will introduce few of the selected andmoreinformative polynomial neural networks, which are recentlydeveloped and applied to different areas of interest.

Vasechkina and Yarin [51] give a proposal of evolvingpolynomial neural network by means of genetic algorithmand shown a direction to use in biology, ecology and otherneural sciences. However, the work does not fully exploit thenature of polynomial neural architecture. That means theproposed method is evolving feed-forward neural networkarchitecture by a self-organizing process, and does notrequire learning as a separate process since evaluation ofweights is carried out simultaneously with the architectureconstruction. The NN architecture is built by adding hiddenlayers to the network, while configuration of connectionsbetween neurons is defined bymeans of GA. The output of theneuron is represented as a polynomial function of the inputswith coefficients evaluated using the least-squares method.

Schetinin [52] showed a direction of how polynomial neu-ral network can successfully extract polynomial classifica-tion rules from labeled electroencephalogram (EEG) signals.To represent the classification rules in an analytical form, heused the polynomial neural networks trained by a modifiedGroup Method of Data Handling (GMDH). Park et al.’s. [53]introduced a concept of fuzzy polynomial neural networks(FPNNs) a hybrid modeling architecture combining polyno-mial neural networks (PNNs) and fuzzy neural networks(FNNs) to extract rules. In this method FNNs contribute to theformation of the premise part of the rule-based structure ofthe FPNN. The consequent part of the FPNN is designed us-ing PNNs.

Oh and Pedrycz [54] introduced and investigate a classof neural architectures of polynomial neural networksand discuss a comprehensive design methodology. Theirexperimental part of the study involves two representativetime series such as Box–Jenkins gas furnace data and a pHneutralization process. Aksyonova et al.’s. [55] presents arobust polynomial neural network for quantitative-structureactivity relationship studies. Themethod is able to select non-linear models characterized by high prediction ability; it isinsensible to outliers and irrelevant variables. This featureof this work can be a good attracting point for data miningresearchers. Oh et al.’s. [56] proposed a new approach toself organizing polynomial neural networks by means ofgenetic algorithms. The proposed GA-based SOPNN givesrise to a structurally optimized structure and comes with asubstantial level of flexibility in comparison to PNNs. Thedesign procedure applied in the construction of each layerof a PNN deals with its structural optimization involving theselection of preferred nodes with specific local characteristics(such as the number of input variables, the order of thepolynomial, and a collection of the specific subset of inputvariables) and addresses specific aspects of parametricoptimization. Oh et al.’s. [57] developed a new architectureof hybrid fuzzy polynomial neural networks (HFPNN) that isbased on a genetically optimized multi-layer perceptron anddevelop their comprehensive design methodology involvingmechanisms of genetic optimization. The construction ofHFPNN exploits fundamental technologies of computationalintelligence, namely fuzzy sets, neural networks, and geneticalgorithms (GAs).

Kim and Park [58] proposed an architecture that is similarwith the architecture proposed in [56] but it is applied to adifferent domain of interest. The model is implemented tobetter use the optimal inputs and the order of polynomialin each node of PNN. The appropriate inputs and order areevolved accordingly and are tuned gradually throughout theGA iterations. They have employed a binary encoding andeach chromosome is made of three sub-chromosomes, whichrepresent the order, number of inputs, and input candidatesfor modeling. Liatsis et al.’s. [59], proposed an adaptivepolynomial neural networks for time series forecasting. Inthis work the structure and weight vectors are determinedby the use of evolutionary algorithms. Zarandi et al.’s. [60]uses the fuzzy polynomial neural networks for approximationof the comprehensive strength of concrete. To enhance theperformance of FPNN, back propagation and list square erroralgorithms are utilized for the tuning of the system. Six

Page 5: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0 23

different FPNN architectures are constructed, trained andtested using the experimental data of 458 different concretemix-designs collected from three distinct sources. Parket al.’s. [61], studied a new polynomial neural network withlayer over passing structure has been developed to replace arelatively time consuming reservoir simulator through robustand systematic search algorithm. The networks are subjectto some form of training based on a representative sample ofsimulations that can be used as a re-usable knowledge baseof information for addressing many different managementquestions.

Misra et al.’s. [24] proposed a reduced and comprehensiblepolynomial neural network (RCPNN) for classification. In thisnetwork the partial descriptions are developed for a singlelayer and the output of these PDs along with original set offeatures are fed to the output layer having only one neuron.The weights between hidden layer and output layer areoptimized by two different methods such as gradient decentand particle swarm optimization (PSO) [36]. Misra et al.’s. [62]proposed another variants of RCPNN for classification task ofdata mining known as RPNSN. In this network, the PDs aredeveloped for a single layer. Then PSO is used for selecting theoptimal set of PDs and features in a kind of wrapper approach,which are then fed to the output layer.

3. PSO for multi-objective problems

3.1. Multi-objective problems

Multi-objective optimization otherwise known as multi-criteria optimization or, multi-performance or vector opti-mization is defined as the problem of finding “A vector ofdecision variables which satisfies and optimizes a vectorfunction whose elements represent the objective functions.These functions form a mathematical description of perfor-mance criteria, which usually conflict with each other. Hencethe term optimizesmeans finding such solution, which wouldgive the values of all the objective functions acceptable tothe user.

Multi-objective optimization methods as the namesuggests, deal with finding optimal solutions to problemshaving multiple objectives. So in this type of problems findingone solution that is optimum with respect to single criterionnever satisfies the user. Mathematically this can be stated andvisualized as follows:

Find a vector Ex = 〈x1, x2, . . . , xd〉, which can optimize thevector functions simultaneously Ef(Ex) = 〈f1(Ex), f2(Ex), . . . , fn(Ex)〉.

Fig. 4 shows a mapping of a decision variable betweendecision space denoted as Rd to objective space Fn.

Multi-objective optimization concept states that whenmany objectives are simultaneously optimized, there is nosingle optimal solution; instead there is a set of optimalsolution called Pareto optimal set, each one considering acertain trade-off among the objectives.

Strong Pareto optimality: We say that a vector of decisionvariables Ex ∈ Rd is strong Pareto optimal if there does not existanother Ex′ ∈ Rd such that (i) fi(Ex

′) ≤ fi(Ex) for all i = 1,2, . . . , nand (ii) fi(Ex

′) < fi(Ex) for at least one i.

Weak Pareto optimality: A vector of decision variables Ex ∈Rd is said to be weak Pareto optimal solution if there does notexist another Ex′ ∈ Rd such that fi(Ex

′) < fi(Ex) for at least one i.The vector Ex corresponding to the solutions included in the

Pareto optimal set are called non-dominated. The plot of theobjective functions whose non-dominated vectors are in thePareto optimal set is called the Pareto front. Fig. 5 shows asimple Pareto front of two conflicting objectives.

3.2. Multi-objective problem solving approaches

Broadly we can categorize the multi-objective problems intothree types:

1. The conventional weighted-sum-approach: transformingthe original multi-objective problem into a single-objectiveproblem by using a weighted formula.

2. The lexicographical approach, where the objectives areranked in order of priority.

3. The Pareto approach, which consists of finding as manynon-dominated solution as possible and returning a set ofnon-dominated solution to user.

In a conventional weighted-sum-approach a multi-objective problem is transformed into a single-objectiveproblem. Basically a numerical weight is assigned to eachobjective (evaluation criterion) and then the values of theweighted criteria combined into a single value by eitheradding or multiplying all the weighted criteria.

Thus, the fitness function of a given candidate solutioncan be measured by the following two types of formula:

f(Ex) =

n∑i=1

wi.fi(Ex) (2)

or f(Ex) =

n∏i=1

fi(Ex)wi , (3)

where wi, i = 1, . . . , n, denotes the weight assigned to criteriafi(Ex) and n is the number of evaluation criteria.

This method is popular and more useable because of itssimplicity. However, there are several drawbacks associatedwith this method. First the setting of the weights in theseformulas is ad-hoc. Second, the problem with weights isthat, once a formula with precise values of weights hasbeen defined and given to a data mining algorithm itwill be effectively trying to find the best model for thatparticular setting of weights, missing the opportunity to findother models that might be actually more interesting tothe user, representing a better trade-off between differentquality criteria. Third, weighted formulas involving a linearcombination of different quality criteria can’t give solutionsin a non-convex region of the solution space.

In the lexicographic approach, different priorities areassigned to different objectives, and then the objectivesare optimized in order of their priority. So when two ormore candidate solutions are compared with each other tochoose the best one, then their performance measure arecompared for the highest-priority objective. If one candidatesolution is significantly better than the other with respectto that objective, the former is chosen. Otherwise theperformance measure of the two candidate solutions is

Page 6: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

24 C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0

Fig. 4 – Mapping a decision variable from Rd into Fn.

Fig. 5 – Pareto front of simple multi-objective problem.

compared with respect to the second objective. Again, if onecandidate solution is significantly better than the other withrespect to that objective, the former is chosen, otherwisethe performance measure of the two candidate solutions iscompared with respect to the third criterion. This process isrepeated until one finds a clear winner or until one has usedall the criteria. In the latter case, if there was no clear winner,one can simply select the solution optimizing the highest-priority objective.

The lexicographic approach has an advantage overconventional weighted approach, as it treats each of thecriteria separately, recognizing that each criterion measuresa different aspect of quality of a candidate solution. Thedisadvantage of this approach is how to specify a tolerancethreshold for each criterion.

In Pareto approach, instead of transforming a multi-objective problem into a single-objective problem and thensolving it by using a single-objective search method, a multi-objective algorithm is used to solve the original multi-objective problem. The concept is that the solution to amulti-objective optimization problem is normally not a singlevalue but instead a set of values also called the “Paretoset”. The proposed method for solving the classification taskbelongs to this category.

The disadvantage of Pareto approach is that in thisapproach it is very difficult to choose one best non-dominated

solution from a set of non-dominated solutions, as in practicethe user will use a single solution. As an advantage Paretoapproach is more generic than the Minimum DescriptionLength principle, since the latter is used only to cope withaccuracy and simplicity, whereas the Pareto approach cancope with any kind of non-commensurable model quality.

3.3. Multi-objective PSO

3.3.1. Basics of PSOParticle swarm optimization (PSO) is a population-basedstochastic approach for optimization. It is modeled on thesocial behaviors observed in animals or insects e.g. birdflocking, fish schooling, and animal herding [63]. JamesKennedy and Russell Eberhart originally proposed it in1995 [64]. Since its inception, PSO has gained increasingpopularity among researchers and practitioners as a robustand efficient technique for solving difficult optimizationproblems. In PSO, individual particles of a swarm representpotential solutions, which move through the problem searchspace seeking an optimal, or good enough solution. Theparticles broadcast their current positions to neighboringparticles. The position of each particle is adjusted accordingto its velocity (i.e. rate of change) and the difference betweenits current position, respectively the best position found byits neighbors, and the best position it has found so far. As themodel is iterated, the swarm focuses more and more on anarea of the search space containing high-quality solutions.

PSO has close ties to artificial life models. Early worksby Reynolds on a flocking model Boids [65], and Heppnersstudies on rules governing large numbers of birds flockingsynchronously [66], indicated that the emergent groupdynamics such as the bird flocking behavior are based onlocal interactions. These studies were the foundation forthe subsequent development of PSO for the application tooptimization. PSO is in some way similar to cellular automata(CA), which are often used for generating self replicatingpatterns based on very simple rules, e.g. John Conways Gameof Life. CAs has three main attributes: (1) individual cells areupdated in parallel; (2) the value of each new cell dependsonly on the old values of the cell and its neighbors; and (3)all cells are updated using the same rules [67]. Particles in aswarm are analogous to CA cells, whose states are updated inmany dimensions simultaneously.

The social behavior of animals, and in some cases ofhumans, is governed by similar rules. However, human social

Page 7: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0 25

behavior is more complex than a flock’s movement. Besidesphysical motion, humans adjust their beliefs, in a beliefspace. Although two persons cannot occupy the same spaceof their physical environment, they can have the same beliefs,occupying the same position in the belief space, withoutcollision. This abstractness in human social behavior isintriguing and has constituted the motivation for developingsimulations of it. There is a general belief, and numerousexamples coming from nature enforce the view, that socialsharing of information among the individuals of a populationmay provide an evolutionary advantage. In this context, it isthe rule rather than exception that individuals/particles inplants and animal systems live within groups. Why is this?The reason is that many things that a group of individualscan do that an isolated individual cannot. This was the coreidea behind the development of PSO.

3.3.2. PSO algorithmIn PSO, the velocity of each particle is modified iterativelyby its personal best position (i.e. the best position foundby the particle so far), and the best position found byparticles in its neighborhood. As a result, each particlesearches around a region defined by its personal best positionand the best position from its neighborhood. Suppose thatthe search space is D-dimensional, the ith particle of theswarm can be represented by a D-dimensional vector, xi =(xi1, xi2, . . . , xiD

). The velocity (position change) of this particle,

can be represented by another D-dimensional vector vi =(vi1, vi2, . . . , viD

). The best previously visited position of the

ith particle is denoted as pi =(pi1, pi2, . . . , piD

). Defining pg =(

pg1, pg2, . . . , pgD

)the best position found by particles in its

neighborhood. The swarm is manipulated according to thefollowing two equations:

vn+1id = w× vn

id + c1 × rn1 ×

(pn

id − xnid

)+ c2 × rn

2 ×(pn

gd − xnid

)(4)

xn+1id = xn

id + vn+1id (5)

where d = 1,2, . . . , D; i = 1,2, . . . , N, and N is the size ofthe swarm; c1 and c2 are called cognitive and social constant,w is called inertia; r1, r2 are random numbers, uniformlygenerated from [0, 1]; and n = 1,2, . . ., determines theiteration number. Eq. (4) shows that the velocity of a particle isdetermined by three parts “the momentum”, “the cognitive”,and “the social” part. The momentum term represents theproduct of inertia and previous velocity which is used tocarry the particle in the direction it has traveled so far; thecognitive part, represents the tendency of the particle toreturn to the best position it has visited so far; the social part,represents the tendency of the particle to be attracted towardsthe position of the best position found by the entire swarm.

Position pg =(pg1, pg2, . . . , pgD

)in the social part is the

best position found by particles in the neighborhood of theith particle. Different neighborhood topologies can be used tocontrol information propagation between particles. Examplesof neighborhood topologies include ring, star, and VonNeumann. Constricted information propagation as a result ofusing small neighborhood topologies such as Von Neumannhas been shown to perform better on complex problems,whereas larger neighborhoods generally perform better onsimpler problems [68]. Generally, a PSO implementation that

chooses pg =(pg1, pg2, . . . , pgD

)from within a restricted local

neighborhood is referred to as lbest PSO, whereas choosing

pg =(pg1, pg2, . . . , pgD

)from the entire swarm results in a

gbest PSO. The following algorithm summarizes a basic PSO,assume that the problem is a maximization one.

Algorithm_PSO( ){Randomly generate an initial swarmRepeat

{FOR each particle i

IF f(xi) > f(pi) thenpi ← xi

End IFEnd FORpg =max(pneighbors)

Update velocity using Eq. (4)Update position using Eq. (5)

} Until termination criterion is met.}

Earlier studies showed that the velocity as defined in Eq. (4)has a tendency to explode to a large value, resulting inparticles exceeding the boundaries of the search space. This ismore likely to happen especially when a particle is far from pg

and pi. To overcome this problem, a velocity clampingmethodcan be adopted where the maximum allowed velocity valueis set to Vmax in each dimension of vi. This method does notnecessarily prevent particles neither from leaving the searchspace nor from converging. However, it does limit the particlestep size, thereby preventing further divergence of particles.

3.3.3. PSO for multiple objectivesUntil recently PSO had only been applied to single objectiveoptimization problems, however, in a large number of designapplications there are a number of competing quantitativemeasures that define the quality of a solution. For instance,designing aircrafts requires simultaneous optimization offuel efficiency, payload, and weight. Similarly for bridgeconstruction a good design characterized by low total massand high stiffness. These objectives cannot be typically metby a single solution, so, by adjusting the various designparameters, the firm may seek to discover what possiblecombinations of these objectives are available, given a set ofconstraints. The relative simplicity of PSO and its population-based approach have made it a natural candidate to beextended for multi-objective optimization. Many differentstrategies for solving multi-objective problems using PSO hasbeen published since 2002 [69–72]. However, although most ofthese studies were generated in tandem, each of these studiesimplements MOPSO in a different fashion. Given the wealthof multi-objective evolutionary algorithms (MOEAs) [73,74]in the literature this may not seem particularly surprising,however the PSO heuristics puts a number of constraintson MOPSO that MOEAs are not subject to. In PSO itself theswarm population is fixed in size, and its member cannotbe replaced, only adjusted their pbest and gbest, which arethemselves easy to define. However, in order to facilitatean MO approach to PSO a set of non-dominated solutions

Page 8: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

26 C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0

(the best individuals found so far using the search process)must replace the single global best individual in the standarduni-objective PSO case. In addition, there may be no singleprevious best individual for each member of the swarm.Choosing both which gbest and pbest to direct a swarmmember’s flight therefore is not trivial in MOPSO.

In general, when solving a multi-objective problem usingevolutionary or non-evolutionary techniques, the followingthree main goals need to be achieved: (i) maximize thenumber of elements of the Pareto optimal set found;(ii) minimize the distance of the Pareto front produced bythe algorithm with respect to the true (global) Pareto front(assuming we know its location); (iii) maximize the spreads ofsolutions found, so that we can have a distributions of vectorsas smooth and uniform as possible.

Based on the population nature of PSO, it is desirable toproduce several (different) non-dominated solutions with asingle run. So, as with any other evolutionary algorithm, thethree main issues to be considered when using PSO to multi-objective optimization are: (i) how to select gbest particlesin order to give preference to non-dominated solutionsover those that are dominated? (ii) how to retain the non-dominated solutions found during the search process in orderto report solutions that are non-dominated with respect to allthe past populations and not only with respect to the currentone? Also it is desirable that these solutions are well spreadalong the Pareto front; (iii) how to maintain diversity in theswarm in order to avoid convergence to a single solution?

As we could see in the previous section, when solvingsingle-objective optimization problems, the gbest that eachparticle uses to update its position is completely determinedonce a neighborhood topology is established. However in thecase of multi-objective optimizations problems, each particlemight have a set of different gbests from which just one canbe selected in order to update its position. Such set of gbestsis usually stored in a different place from the swarm that wewill call external archive denoted as EX_ARCHIVE. This is arepository in which the non-dominated solutions found so farare stored. The solutions contained in the external archiveare used as global bests when the positions of the particlesof the swarm have to be updated. Furthermore, the contentsof the external archive are also usually reported as the finaloutput of the algorithm. The following algorithm describeshow a general MOPSO works.

Algorithm _MOPSO ( )1. INITIALIZATION of the Swarm2. EVALUATE the fitness of each particle of the swarm.3. EX_ARCHIVE = SELECT the non-dominated solutions

from the Swarm.4. t = 0.5. REPEAT6. FOR each particle7. SELECT the gbest8. UPDATE the Position9. MUTATION /* Optional */10. EVALUATE the Particle11. UPDATE the pbest12. END FOR

13. UPDATE the EX_ARCHIVE with gbests.14. t = t+ 115. UNTIL (t <= MAXIMUM_ITERATIONS)16. Report Results in the EX_ARCHIVE.

First the swarm is initialized. Then a set of gbests isalso initialized with the non-dominated particles from theswarm. As we mentioned before, the set of gbests is usuallystored in an external archive, which we call EX_ARCHIVE.Later on, some sort of quality measure is calculated forall the gbests in order to select (usually) one gbest for eachparticle of the swarm. At each generation, for each particle,a leader is selected and the flight is performed. Most of theexisting MOPSOs apply some sort of mutation operator afterperforming the flight. Then the particle is evaluated andits corresponding pbest is updated. A new particle replacesits pbest particle usually when this particle is dominated orif both are incomparable (i.e. they are both non-dominatedwith respect to each other). After all the particles have beenupdated, the set of gbests is updated, too. Finally, the qualitymeasure of the set of gbests is recalculated. This process isrepeated for a certain number of iterations.

Let us discuss a few of the MOPSO algorithms developedby the different researchers over the years. We canidentify the algorithms in five categories such as: (i)weighted sum approach; (ii) Lexicographic approach; (iii)sub-population approach; (iv) Pareto-based approaches; (v)combined approaches.

3.3.3.1. Weighted-sum approach. Under this category weconsider approaches that combine (or “aggregate”) all theobjectives of the problem into a single one. In other words,the multi-objective problem is transformed into a single-objective one. Parsopoulos and Vrahatis [72] adopted threetypes of aggregating functions: (1) a conventional linearaggregating function (where weights are fixed during therun), (2) a dynamic aggregating function (where weights aregradually modified during the run) and (3) the bang bangweighted aggregation approach (were weights are abruptlymodified during the run). Baumgartner et al. [75] approach,based on the fully connected topology, uses linear aggregatingfunctions. Suresh et al. [76] presents a multi-agent particleswarm optimization to design an optimal composite box-beam helicopter rotor blade.

3.3.3.2. Lexicographic approach. In this method, the useris asked to rank the objectives in order of importance.The optimum solution is then obtained by minimizingthe objective functions separately, starting with the mostimportant one and proceeds according to the assigned orderof importance of the objectives [77]. Lexicographic orderingtends to be useful only when few objective functions areused (two or three), and it may be sensitive to the orderingof the objectives [78]. The algorithm by Hu and Eberhart [71]optimizes only one objective at a time using a scheme similarto lexicographic ordering. This approach adopts the ring (localbest) topology with no external archive.

3.3.3.3. Sub-population approaches. These approaches in-volve the use of several subpopulations as single-objective

Page 9: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0 27

optimizers. Then, the subpopulations somehow exchange in-formation or recombine among themselves aiming to pro-duce trade-offs among the different solutions previously gen-erated for the objectives that were separately optimized. Par-sopoulos et al. [79] suggested a parallel version of the VectorEvaluated Particle Swarm (VEPSO) method for multi-objectiveproblems. VEPSO is amulti-swarm variant of PSO, which is in-spired on the Vector Evaluated Genetic Algorithm (VEGA) [80,81]. Chow and Tsui [82] used PSO as an autonomous agentresponse-learning algorithm. For that sake, the authors pro-pose to decompose the award function of the autonomousagent into a set of local award functions and, in this way, tomodel the response extraction process as a multi-objectiveoptimization problem. Koduru et al.’s. [83] proposed a PSO-Nelder Mead Simplex hybrid multi-objective optimizationalgorithm based on a numerical metric called ε-fuzzy dom-inance. Within each iteration of this approach, in addition tothe position and velocity update of each particle using PSO,the k-means algorithm is applied to divide the population intosmaller sized clusters. The Nelder–Mead Simplex algorithm isused separately within each cluster for added local search.Janson et al.’s. [84] used a multi-objective particle swarmoptimization algorithm to simultaneously optimize the intra-molecular energies occurring between the atoms of the flex-ible ligand and the macro-molecule. A clustering method isapplied to form several sub-swarms with the PSO algorithmin order to thoroughly sample the search space. Xhang andXue [85] proposed a dynamic sub-swarmsmulti-objective par-ticle swarm optimization algorithm. Based on solution distri-bution of multi-objective optimization problems, it separatesparticles into multi sub-swarms, each of which adopts animproved clustering archiving technique, and operates PSOin a comparably independent way. Clustering eventually en-hances the distribution quality of solutions. Omkar et al. [86]present a generic method/model for multi-objective designoptimization of laminated composite components, based onvector evaluated particle swarm optimization (VEPSO) algo-rithm. In this work a modified version of VEPSO algorithmfor discrete variables has been developed and implementedsuccessfully for the, multi-objective design optimization ofcomposites. Mostaghim et al. [87] proposed and empiricallycompare two parallel versions of multi-objective particleswarm optimization, which differs in the way they divide theswarm into sub-swarms that can be processed independentlyon different processors.

3.3.3.4. Pareto-based approaches. These approaches useleader selection techniques based on Pareto dominance. Thebasic idea of all the approaches considered here is to selectas leaders to the particles that are non-dominated with re-spect to the swarm. Note however, that several variationsof the leader selection scheme are possible since most au-thors adopt additional information to select leaders (e.g., in-formation provided by a density estimator) in order to avoida random selection of a leader from the current set of non-dominated solutions. Moore and Chapman [88] presented analgorithm in an unpublished document and it is based onPareto dominance. The authors emphasize the importance ofperforming both an individual and a group search (a cognitivecomponent and a social component). Ray and Liew [89] used

Pareto dominance and combines concepts of evolutionarytechniques with the particle swarm. Fieldsend and Singh [70]used an unconstrained elite external archive (in which a spe-cial data structure called “dominated tree” is adopted) to storethe nondominated individuals found along the search pro-cess. The archive interacts with the primary population inorder to define leaders. The selection of the gbest for a par-ticle in the swarm is based on the structure defined by thedominated tree. First, a composite point of the tree is locatedbased on dominance relations, and then the closest member(in objective function space) of the composite point is cho-sen as the leader. On the other hand, a set of personal bestparticles found (non-dominated) is also maintained for eachswarm member, and the selection is performed uniformly.This approach also uses a “turbulence” operator that is ba-sically a mutation operator that acts on the velocity valueused by the PSO algorithm. Coello et al.’s. [69,90] proposal isbased on the idea of having an external archive in which ev-ery particle will deposit its flight experiences after each flightcycle. The updates to the external archive are performed con-sidering a geographically based system defined in terms ofthe objective function values of each particle. Toscano andCoello [91] used the concept of Pareto dominance to deter-mine the flight direction of a particle. The authors adopt clus-tering techniques to divide the population of particles intoseveral swarms. This aims to provide a better distribution ofsolutions in decision variable space. Srinivasan and Hou [92]proposed an approach, called Particle Swarm Inspired Evolu-tionary Algorithm (PS-EA), is a hybrid between PSO and anevolutionary algorithm. The main aim is to use EA opera-tors (mutation, for example) to emulate the workings of PSOmechanisms, based on a fully connected topology. Mostaghimand Teich [93] proposed a sigma method in which the leaderfor each particle is selected in order to improve the conver-gence and diversity of a MOPSO approach.

In a further work, Mostaghim and Teich [94] studiedthe influence of ε-dominance [95] on MOPSO methods. ε-dominance is compared with existing clustering techniquesfor fixing the external archive size and the solutions arecompared in terms of computational time, convergence anddiversity. Mostaghim and Teich [96] proposed a new methodcalled covering MOPSO (cvMOPSO). This method works intwo phases. In phase 1, a MOPSO algorithm is run with anexternal archive with restricted size and the goal is to obtaina good approximation of the Pareto-front. In phase 2, the non-dominated solutions obtained from phase 1 are considered asthe input external archive of the cvMOPSO. Bartz et al. [97]approach starts from the idea of introducing elitism (throughthe use of an external archive) into PSO. Different methodsfor selecting and deleting particles (leaders) from the archiveare analyzed to generate a satisfactory approximation of thePareto front.

Li’s [98] approach is based on a fully connected topologyand incorporates the main mechanisms of the NSGA-II [99]to the PSO algorithm. Reyes and Coello’s [100] approach isbased on Pareto dominance and the use of a nearest neighbordensity estimator for the selection of leaders (by means of abinary tournament). This proposal uses two external archives:one for storing the leaders currently used for performing

Page 10: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

28 C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0

the flight and another for storing the final solutions. Thedensity estimator factor is used to filter out the list ofleaders whenever the maximum limit imposed on such listis exceeded. Alvarez-Benitez et al. [101] propose methodsbased exclusively on Pareto dominance for selecting leadersfrom an unconstrained non-dominated (external) archive. Hoet al. [102] proposed a novel formula for updating velocityand position of particles, based on three main modificationsto the known flight formula for the fully connected topology.Villalobos-Arias et al. [103] proposed a new mechanism topromote diversity in multi-objective optimization problems.Although the approach is independent of the search engineadopted, they incorporate it into the MOPSO proposed in [90].The new approach is based on the use of stripes that areapplied on the objective function space.

The main idea of Salazar–Lechuga and Rowe’s [104]approach is to use PSO to guide the search with the helpof niche counts (applied on objective function space) [105]to spread the particles along the Pareto front. Raquel andNaval [106] incorporates the concept of nearest neighbordensity estimator for selecting the global best particle andalso for deleting particles from the external archive of non-dominated solutions. Zhao and Cao [107] approach is verysimilar to the proposal of Coello and Lechuga [69]. However,the authors indicate that theymaintain two external archives.One of them is actually a list that keeps the pbest particle foreach member of the swarm. Another external archive storesthe non-dominated solutions found along the evolutionaryprocess. This truncated archive is similar to the adaptive gridof PAES [108].

Janson and Merkle [109] proposed a hybrid particle swarmoptimization algorithm for multi-objective optimization,called ClustMPSO. ClustMPSO combines the PSO algorithmwith clustering techniques to divide all particles into severalsubswarms.

Tsou et al.’s. [110] proposed an improved particle swarmPareto optimizer in which the local search and clusteringmechanism are incorporated. The local search mechanismprevents premature convergence, hence enhances theconvergence of optimizer to true Pareto-optimal front. Theclustering mechanism reduces the non-dominated solutionsto a handful number such that the diversity can bemaintained and speed up the search. Goldbarg et al. [111]presented a particle swarm optimization for the bi-objectivedegree-constrained minimum spanning tree problem. Theoperators for the particle’s velocity are based upon localsearch and path relinking procedures.

Baltar and Fontane [112] used a multi-objective particleswarm optimization algorithm to find non-dominated(Pareto) solutions when minimizing deviations from outflowwater quality targets of temperature, dissolved oxygen(DO), total dissolved solids, and potential of hydrogen (pH).Xu and Rahmat-Samii [113] studied the MOPSO proposedin [100] by applying it to two complex multi-objectiveelectromagnetic problems: a 16-element array antennasynthesized for a tradeoff between the beam efficiency andthe half-power beam width, and a single shaped reflectorantenna optimized for higher gains of multiple feeds.Gill et al.’s. [114] proposed a new multi-objective particleswarm optimization by introducing the Pareto rank concept

and is used for parameter estimation in hydrology. Liuet al.’s. [115] proposed a variable neighborhood particle swarmoptimization for multi-objective flexible job-shop schedulingproblems. The proposed method combines the concept ofvariable neighborhood search (VNS) and particle swarmoptimization (PSO). Niu and Shen [116] proposed an adaptivemulti-objective particle swarm optimization and used tosearch the optimal color image fusion parameters, which canachieve the optimal fusion indices.

Koppen and Veenhuis [117] proposed a new multi-objective particle swarm optimization algorithm by usingfuzzy-Pareto-dominance (FPD) relation. FPD can be seen as aparadigm or meta-heuristic to formally expand single objec-tive optimization algorithms to multi-objective optimizationalgorithms. Tripathi et al.’s. [118] describe a novel time vari-ant multi-objective particle swarm optimization (TV-MOPSO).TV-MOPSO is made adaptive in nature by allowing its vitalparameters (viz., inertia weight and acceleration coefficients)to change with iterations. A new diversity parameter hasbeen used to ensure sufficient diversity amongst the solutionsof the non-dominated fronts, while retaining at the sametime the convergence to the Pareto-optimal front. Reddy andNagesh Kumar [119] proposed a MOPSO with a newly intro-duced variable size external repository and an efficient elitist-mutation (EM) operator. Rahimi-Vahed and Mirghorbani [120]adopted a multi-objective particle swarm optimization forsolving flow shop scheduling problem. Exploiting a new con-cept of the Ideal point and a new approach to specify the su-perior particle’s position vectors in the swarm is designed andused for finding locally Pareto-optimal frontier of the prob-lem. Abido [121] introduced a two level of non-dominatedsolutions approach to multi-objective particle swarm opti-mization to handle the limitations associated with MOPSO.

3.3.3.5. Combined approaches. Mahfouf et al.’s. [122] pro-posed an Adaptive Weighted PSO (AWPSO) algorithm, inwhich the velocity is modified by including an accelerationterm that increases as the number of iterations increases.This aims to enhance the global search ability at the end ofthe run and to help the algorithm to jump out of local optima.

Xiao-hua et al. [123] proposed an Intelligent ParticleSwarm Optimization (IPSO) algorithm for multi-objectiveproblems based on an Agent-Environment-Rules (AER)model to provide an appropriate selection pressure topropel the swarm population towards the Pareto optimalfront. Nakamura et al.’s. [124] proposed a new multi-objective particle swarm optimization by incorporatingdesign sensitivities concerning objective and constraintfunctions to apply to structural and mechanical designoptimization problems.

4. Proposed method

Neural networks are computational models capable oflearning through adjustments of topology/architecture andinternal weight parameters according to a training algorithmin response to some training samples. Yao [125] describedthree common approaches to neural network training andthese are: (1) for a neural network with a fixed architecture

Page 11: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0 29

find a near-optimal set of connection weights; (2) find a nearoptimal neural network architecture; and (3) simultaneouslyfind both a near optimal set of connection weights and neuralnetwork topology/architecture.

Recall that multi-objective optimization deals withsimultaneous optimization of several possibly conflictingobjectives and generates the Pareto set. Each solution inthe Pareto set represents a trade-off among the variousparameters that optimize the given objectives. In supervisedlearning, model selection involves finding a good trade offbetween at least two objectives: predictive accuracy andarchitectural complexity. The usual approach is to formulatethe problem in a single objectivemanner by taking a weightedsum of the objectives. Abbass [126] presented several reasonsas to why this method is inefficient. Thus the multipleobjective optimization (MOO) approach [127–129] is suitablesince the architecture with connection weights and predictiveaccuracy can be determined concurrently and a Pareto set canbe obtained in a single simulation from which user has themore flexibility to choose a final solution.

4.1. Problem formulation

In data mining the scalability is one of the important issues:the scalability of the architecture can be determined from thearchitectural complexity. In this article we are considering thearchitectural complexity of the polynomial neural networkand it is based on the following parameter.

1. Input variables2. Order of the polynomial3. Number of layers of the polynomial4. Number of PDs in a layer.

Let us assume that the number of input variables r remainsconstant for all layers and m be the dimension of the patternvector. The order of the polynomial and number of layersis P and L respectively. Based on these assumptions we cancalculate the total possible number of PDs in each of the layerof PNN.

1st Layer: m1 =m!

r!(m− r)!,

2nd Layer: m2 =m1!

r!(m1 − r)!,

. . .

Lth Layer: mL =mL!

r!(mL − r)!,

where L is the number of hidden layers.Therefore, the total possible number of PDs (except input

and output layer) in the architecture is

M = m1 +m2 + · · · +mL

=

L∑i=1

mi

=1r!

{m!

(m− r)!+

m1!

(m1 − r)!+ · · · +

mL!

(mL − r)!

}. (6)

For simplicity, let us say m ≡ m0 and then replacing m withm0 in the Eq. (6), we get

M =1r!

L∑j=0

mj!

(mj − r)!

. (7)

As a result the architectural complexity can be defined asfollows:

fAC = fNV + fNP + fOP + fNL

= r+M+ P+ L

= r+ P+ L+L∑

i=1

mi

= (r+ P+ L)+

1r!

L∑j=0

mj!

(mj − r)!

(8)

where r, P, L, and m0 are independent variables andm1, m2, . . . , mL are dependent variables.

Hence, fAC = fAC(r, P, L, m0) = fAC1 + fAC2, where fAC1 =

r+ P+ L and fAC2 =1r!

{∑Lj=0

mj!

(mj−r)!

}.

As the values of r, P and L are small so no need to optimizethe function fAC1. Therefore the only function we have tooptimize is fAC2.

Further, in this work we are generating only one hiddenlayer; but we are considering the input variables twice (i.e. inthe input layer and hidden layer).

Therefore the Eq. (8) boils down to

fAC = (r+ P+ 1)+1r!

(m0!

(m0 − r)!

).

This implies that the value of fAC1 = (r + P + 1) and fAC2 =

1r!

(m0!

(m0−r)!

).

It should be noted that as we are considering the inputfeatures in the hidden layer along with the PDs so theobjective functions to measure the architectural complexityis

fAC = fAC2 + (m0 + 1)

=m0!

r!

((m0 − 1)!

(m0 − r)!+ r!

)+ 1. (9)

The measure for predictive accuracy is defined based on theconcept of confusion matrix. Let the problem be a C classproblem. Then the confusion matrix for the C class problemis defined as follows.

Actual PredictedC1 C2 CC

C1 a11 a12 a1cC2 a21 a22 a2c

CC ac1 ac2 acc

The entries in the confusion matrix have the followingmeaning in the context of our study.

a11 is the number of correct predictions that an instance isC1, a21 is the number of incorrect predictions that an instanceis C2, and so on.

Several standard terms have been defined:The Predictive accuracy is the proportion of total number

of predictions that were correct. It is determined using Eq. (10)

fA =

c∑i=1

aii

c∑i=1

c∑j=1

aij

. (10)

Page 12: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

30 C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0

Fig. 6 – Architecture of the MOPPSO based PNN classifier.

The correct classification accuracy corresponding to ith 〈i =1,2,3, . . . , C〉 class is

fci =aii

c∑j=1

aij

, i = 1,2, . . . , C. (11)

The incorrect classification accuracy corresponding to ith 〈i =1,2,3, . . . , C〉 class is

fIci=

c∑j=1,j6=i

aij

c∑j=1

aij

, i = 1,2, . . . , C. (12)

The accuracy determined using Eq. (9) might not be anadequate performance measure when the distributions of theclasses are unbalanced. For example, there are 100 samples,95 of which are class 1, 3 of which are class 2 and 2 of whichis class 3. If the system classify all of them as class 1, theaccuracy would be 95% even though classifiesmissed all othersamples belongs to class 2 and class 3.

Therefore we measure the predictive accuracy by usingEq. (13)

fPA =1c

⟨a11c∑

j=1a1j

+a22c∑

j=1a2j

+ · · · +accc∑

j=1acj

⟩. (13)

Finally, the two objective functions of our proposedmethod includes:

1. Classification accuracy measured by Eq. (13), and2. Architectural complexity measure by Eq. (9).

In the context of MOO domain, we can define the problemas follows:

Definition. Find a vector Ex =⟨x1, x2, . . . , xn

⟩that will optimize

the vector function Ef(Ex) ={fAC(Ex), fPA(Ex)

}, where Ex is the

vector of decision variables.

Further, our objective is to find out a vector Ex which can

maximize fPA =1c

⟨a11∑cj=1 a1j

+a22∑cj=1 a2j

+ · · · +acc∑c

j=1 acj

⟩and

minimize fAC = fAC2 + (m0 + 1) =m0!r!

((m0−1)!

(m0−r)! + r!)+ 1.

4.2. Architecture

We have proposed a multi-objective Pareto based particleswarm optimization for simultaneous optimization ofclassification accuracy and architectural complexity of PNN.The structural representation of the system is given inFig. 6. In this system we have developed PDs for a singlelayer (i.e. hidden layer). Along with the output of the firstlayer (i.e., the constructed PDs), we have considered theoriginal features of the dataset with a bias unit. The MOPPSOtechnique is used to minimize the number of PDs andfeatures without compromising the classification accuracy.In addition, MOPPSO optimizes the weight vector betweenhidden layer and output layer in the course of running of thealgorithm.

In the above system, m represents the number of features

in the dataset and N =(

m2

)represents the number of PDs in

the hidden layer. In addition to all features, one bias is alsoincluded to the system at this level.

4.3. Algorithms

The proposed algorithm simultaneously optimizes thearchitectural complexity and classification accuracy, whichare very conflicting to each other. In this study a particlerepresents a set of connection weights and the hidden layerconsisting of number of PDs and a set of features withone bias unit. Hence, a swarm consisting of population ofconnected set of weights, PDs and features of the hiddenlayer. To apply the PSO to multi-objective optimizationproblem the matters like representation of particles andrepresentation of velocity must be first fixed a priori.

Page 13: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0 31

Fig. 7 – A typical instance of representing a particle with r = 2 and a 3-dimensional pattern vector.

4.3.1. Representation of particlesThe position of the particle is logically divided into two parts:

1. First part represents the hidden layer consisting of PDs andFeatures.

2. Second part consists of connection weights.

The first part is represented by binary values i.e. 0 or 1. Thevalue 1 means the corresponding value is selected while ‘0’means the value is not selected. The second part is a setof connected weight values uniformly generated from theinterval [0, 1]. Here we used a fixed length particle, whichis suitably mapped into variable length. Fig. 7 represents aparticle of 3-dimensional input pattern vector and the valueof r = 2. Now from these values we can easily determine thelength of the particle. For this particular case the length of theparticle is 13. The length of the particle varies from domain todomain but in a particular domain it is remain fixed for the allthe particles of the swarm.

In this example, the first part of the particle contains abinary value for each cell and is chosen from the domain {0,1}. The value 1 or 0 represents whether the PDs/features areselected or not. Note that this approach implicitly assumespositional encoding. In other words, in each particle, the firstcell refers to the first PDs, the second refers to the secondPDs, and so on till all PDs are successfully represented. Thenafter all the features are represented consecutively until theend of first part. In the second part of the particle the weightvalues are represented corresponding to the PDs and features.The last cell of 2nd part refers to the weight value of biasunit. In any case, it is important to note that this positionalconvention simplifies the process of updating the position ofthe particle.

In general, the length of a particle for a particular domainof m attributes and r variables (chosen for constructing thePDs); the length of the particle can be determined by thefollowing formula:

Particle_Length =(

m!r!(m− r)!

+m)+ 1. (14)

4.3.2. Representation of velocityThe velocity of each particle is also logically divided intotwo parts and is represented as a vector. The first position

contains a positive integer, varying between 1 and Vmax. Itimplies howmany of the particles (bits) of the first part shouldbe changed, at a particular moment in time, to be the same asthat of the global best position, i.e., the velocity of the particleflying toward the best position. The number of differentbits between two particles relates to the difference betweentheir positions. The remaining position of the velocity vectorcorresponds to the 2nd part of the particle.

The following major step describes the proposed method.

1. DETERMINE the number of input variables and the order ofthe polynomial forming the partial description (PD) of thedata.

2. Construct all PDs for the hidden layer.

3. RUN MOPPSO_PNN using the objectives architecturalcomplexity and classification accuracy.

4. DECLARE a set of Pareto solution.

The MOPSO_PNN is the heart of the above algorithm. Thedetail of that procedure is described in Table 1.

The fitness of the particle is calculated by the followinghigh-level procedure.

Fitness_Particle(){

FOR each ParticleFind all the active bits from the first part of the Particle;FOR each active bit

Computing the coefficients by using the entiretraining set;

End FOREnd FORFOR each particle

Compute Eqs. (13) and (9) using the training set with thecurrent-set of particles weight;

End FOR}

The k-medoid [133–135] procedure for computing the index ifor EX_ARCHIVE is as follows:

Page 14: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

32 C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0

Table 1 – MOPPSO_PNN (Total_Number_of_Variables m, Number_of_ Input_variables r).

1. Initialize the Swarm SW:For i = 0 to max /* Max = number of particles */

Initialize SW[i];2. Evaluate each of the particles in SW by invoking the procedure Fitness_Particle().3. Store the positions of the particles that represent non-dominated vectors in the external archive called EX_ARCHIVE.4. Initialize the memory of each particle (this memory serves as a guide to travel through the search space. This memory is also stored in

the repository):For i = 0 to max

Pbest[i] = SW[i]5. Repeat

Compute the speed of each particle using the following expression:VELOCITY[i] =W× VELOCITY[i] + Rand1 × (Pbest[i] − SW[i])+ Rand2 × (EX_ARCHIVE[i] − SW[i]

where W (inertia weight) takes a values of 0.5; Rand1 and Rand2 are random numbers in the ranges [0, 1]; Pbest[i] is the bestposition that the particle i has had; EX_ARCHIVE[i] is a value that is taken from the repository; the index i is selected byk-medoidclustering invoking the procedure mediod(k) with less populated clusters. That means the mediod corresponding to the lesspopulated cluster denoted as i is chosen for velocity updation. SW[i] is the current value of the particle i.

Compute the new positions of the particles adding the speed produced from the previous step:SW[i] = SW[i] + VELOCITY[i]Evaluate each of the particles in SW by the procedure Fitness_Particle().

Update the contents of EX_ARCHIVE: here we have to decide whether a certain solution should be added to the repository or not.The non-dominated particles found at each iteration in the primary population of our algorithm are compared with respect to thecontents of the EX_ARCHIVE which, at the beginning of the search will be empty. If it is empty then the current solution is accepted. Ifthis new solution is dominated by a particle within the EX_ARCHIVE, then such a solution is automatically discarded. Otherwise, ifnone of the elements contained in the external repository dominates the solution wishing to enter, then such a solution is stored inthe EX_ARCHIVE. If there are solutions in the archive that are dominated by the new element, then such solutions are removed fromthe EX_ARCHIVE. Finally the external repository is restricted to its maximum allowable capacity by nearest neighbortechnique [130–132].

When the current position of the particle is better than the position contained in its memory, the particles position is updated using:Pbest[i] = SW[i];The criterion to decide what position from memory should be retained is simply to apply Pareto dominance (i.e. if the current

position is dominated by the position in memory, then the position in memory is kept; otherwise, the current position replaces theone in memory; if neither of them is dominated by the other, then we select one of them randomly).

6. Until (Termination criteria is satisfied).

medoid (k){

Arbitrarily choose k particles as the initial medoids;REPEAT

Assign each remaining particle to the cluster with thenearest medoid;

Randomly select a nonmedoid particle, SW[R];Compute the total cost, S, of swapping SW[j] with

SW[R];If S < 0 then

Swap SW[j] with SW[R] to form the new set of kmedoids;

UNTIL no change;}

Here SW[R] represents the non-medoid and SW[j] refers tothe current medoid.

5. Simulations and results

The performance of the model is evaluated using a setof benchmark datasets. Out of these, the most frequentlyused in the area of neural networks and of neuro-fuzzysystems are IRIS, WINE, PIMA, and BUPA Liver Disorders. Allthese databases are taken from the UCI Machine learningrepository [136].

5.1. Description of the datasets

Let us briefly discuss the datasets, which we have taken forour experimental setup.

IRIS dataset: This is the most popular and simpleclassification dataset based on multivariate characteristicsof a plant species (length and thickness of its petal andsepal) divided into three distinct classes (Iris Setosa, IrisVersicolorand Iris Virginica) of 50 instances each. One classis linearly separable from other two; the later are notlinearly separable from each other. In a nutshell, it has 150instances and 5 attributes. Out of 5 attributes four attributesare predicting attributes and one is goal attribute. All thepredicting attributes are real values.

WINE dataset: This dataset resulted from a chemicalanalysis of wines grown in the same region in Italybut derived from three different cultivars. In classificationcontext, this is a well-posed problem with well-behaved classstructures. The total number of instances is 178 and it isdistributed as 59 for class 1, 71 for class 2 and 48 for class3. The number of attributes is 14 including class attribute andall 13 are continuous in nature. There are nomissing attributevalues in this dataset.

PIMA Indians diabetes database: This database is a collectionof all female patients of at least 21 years old of PIMA Indianheritage. It contains 768 instances, 2 classes of positive andnegative and 9 attributes including the class attribute. The

Page 15: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0 33

Table 2 – Description of datasets used.

Dataset Patterns Attributes Classes Patterns in class 1 Patterns in class 2 Patterns in class 3

IRIS 150 4 3 50 50 50WINE 178 13 3 59 71 48PIMA 768 8 2 268 500 –BUPA 345 6 2 145 200 –

Table 3 – Division of dataset and its pattern distribution.

Patterns Patterns in class 1 Patterns in class 2 Patterns in class 3

IRIS Set1 75 25 25 25Set2 75 25 25 25

WINE Set1 89 29 36 24Set2 89 30 35 24

PIMA Set1 384 134 250 –Set2 384 134 250 –

BUPA Set1 172 72 100 –Set2 173 73 100 –

Table 4 – Parameters considered for simulation of theproposed model.

Parameters Values

Size of the swarm 20Maximum iterations 100Inertia weight 0.729844Cognitive parameter 1.49445Social parameter 1.49445EX_ARCHIVE size 20k 3

attribute contains either integer or real values. There are nomissing attribute values in the dataset.

BUPA liver disorders: This dataset is related to the diagnosisof liver disorders created by BUPA Medical Research, Ltd. Itconsists of 345 records, 7 attributes including the classattribute. The class attribute is repeated with only two classvalues for entire database. The first 5 attributes are all bloodtests, which are thought to be sensitive to liver disorders thatmight arise from excessive alcohol consumption. Each recordcorresponds to a single male individual.

Table 2 presents the summary of the main features of thedatasets used for experimental studies.

The data set is divided into two parts. The division ofdatasets and its class distribution is shown in Table 3.

One part is used for building the model and other part isused for testing the model.

5.2. Parameter setup

The parameters used for the simulation studies are given inTable 4. In addition to the entire user defined parametersof standard PSO we have to set the maximum capacityof external archive EX_ARCHIVE. The main objective ofthe external archive is to keep a historical record of thenon-dominated vectors found along the search process. K-medoid clustering fetches the non-dominated solution fromEX_ARCHIVE for velocity updation and it is restricted tomaximum allowable capacity by nearest neighbor search. Themaximum allowable size of EX_ARCHIVE is given in Table 4.

Table 5 – Confusion matrix obtained from IRIS dataset.

Actual class Predicted classC1 C2 C3

C1 50 0 0C2 1 49 0C3 0 0 50

Total prediction 0.3400 0.3267 0.3333Correct prediction 1.0000 0.9800 1.0000Classificationaccuracy

0.993

Table 6 – Confusion matrix obtained from WINE dataset.

Actual class Predicted classC1 C2 C3

C1 59 0 0C2 0 70 1C3 0 0 48

Total prediction 0.3315 0.3933 0.2753Correct prediction 1.0000 0.9859 1.0000Classificationaccuracy

0.9953

Note that no systematic parameter optimization processhas so far been attempted, but the in Table 4 parameter setwas used in our experiments. We use the value of k = 3 sincethis value provided good results in all the domain consideredin this article. Tuning the rest of the values to be suitable fora specific dataset may give rise to improved performance.

5.3. Results and analysis

First we show some experimental results by MOPPSO toclearly demonstrate the effect of considering non-dominance.The confusion matrix shows the experimental results, whichare selected from our proposed method by considering a sortof lexicographic ordering on objectives for the entire dataset.The optimum solution is then obtained by maximizing theclassification accuracy separately, starting with the mostimportant one and proceeds according to the assigned order

Page 16: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

34 C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0

Table 7 – Confusion matrix obtained from PIMA dataset.

Actual class Predicted classC1 C2

C1 134 134C2 44 456

Total prediction 0.2318 0.7682Correct prediction 0.5000 0.9120Classification accuracy 0.7060

Table 8 – Confusion matrix obtained from BUPA dataset.

Actual class Predicted classC1 C2

C1 85 60C2 30 170

Total prediction 0.3333 0.6667Correct prediction 0.5862 0.8500Classification accuracy 0.7181

of importance of the objectives. The confusion matrices forIRIS and WINE are given in Tables 5 and 6. The confusionmatrix of BUPA and PIMA are given in Tables 7 and 8.

Table 5 shows that the results obtained from MOPPSO byconsidering classification accuracy is higher rank than thearchitectural complexity for IRIS dataset. We can see that inthe learning stage class 2 is misclassified in one case.

Similar to Table 5, the classification accuracy gives morepriority than architectural complexity in Table 6 and we caneasily see that in learning stage class 2 is misclassified inone case. However, Tables 7 and 8 show more number ofmisclassified samples during the learning stage.

With the same lexicographic approach we minimize thearchitectural complexity of the network with a tolerabledegradation of classification accuracy. Fig. 8 shows thearchitectural complexity obtained for each datasets in termsof percentage of PDs used in the proposed method using thetraining set 1 and 2 alternatively.

5.4. Comparative performance

In order to know how competitive the proposedmethod is, wedecided to compare it against the state-of-the-art MOPSO [90]with a modification of particle and velocity representations.As we are solving a specific problem of maximizing classifi-cation accuracy and minimizing architectural complexity ofPNN the recommended parameters of MOPSO in [90] is nolonger used as such, therefore based on our solving approachwe modified the parameters. The size of the swarm, max-imum iterations, inertia weight, cognitive parameter, socialparameter and the size of the external archive remains sameas mentioned in Table 4. However, the number of divisions wekept 12 since this value provided good results in most cases.This parameter is used to determine the number of hyper-cubes that will be generated in objective function space.

However, we avoid using the mutation operator of MOPSOin the comparative study as it may cause the problem ofpermutations. This operator induces to explore the wholesearch space and allows maintaining diversity in the swarm,so that the algorithm is able to find solutions among allpossible permutations of the network.

Tables 9–12 show the obtained classification accuracy andarchitectural complexity in terms of percentage of PDs usedfor the four benchmark datasets for two algorithms duringtraining. These tables provide maximum, minimum, andaverage classification accuracy and architectural complexityon 20 simulations.

From Table 9 one can easily analyze that even thoughthe best classification accuracy of the proposed method is0.4% less efficient than MOPSO but on an average it is better.However a considerable gap of 7.14% is obtaining in theiraverage architectural complexity.

For WINE data the proposed method provides a goodaverage performance in classification accuracy. However inthe average case MOPSO outperform the proposed method inarchitectural complexity.

Table 9 – Comparative performance of proposed method with MOPSO by considering classification accuracy andarchitectural complexity for IRIS dataset during training.

Proposed method MOPSO

Classification accuracy Maximum 99.3 99.7Minimum 92.0 91.4Average 96.0 95.2

Architectural complexity Maximum 50.00 59.34Minimum 33.33 36.29Average 41.89 49.03

Table 10 – Comparative performance of proposed method with MOPSO by considering classification accuracy andarchitectural complexity for WINE dataset during training.

Proposed method MOPSO

Classification accuracy Maximum 99.43 94.87Minimum 93.21 89.23Average 96.6271 92.0730

Architectural complexity Maximum 32.05 30.01Minimum 19.23 20.67Average 25.6849 24.4143

Page 17: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0 35

Table 11 – Comparative performance of proposed method with MOPSO by considering classification accuracy andarchitectural complexity for PIMA dataset during training.

Proposed method MOPSO

Classification accuracy Maximum 76.823 69.102Minimum 70.34 65.213Average 74.0911 66.7597

Architectural complexity Maximum 32.54 41.7Minimum 21.43 30.10Average 25.8268 35.2310

Table 12 – Comparative performance of proposed method with MOPSO by considering classification accuracy andarchitectural complexity for BUPA dataset during training.

Proposed method MOPSO

Classification accuracy Maximum 73.9061 70.196Minimum 65.87 62.324Average 69.4466 65.4395

Architectural complexity Maximum 26.67 38.23Minimum 20.1 32.8Average 23.5519 35.4032

Table 13 – Comparative performance of proposed method with MOPSO by considering classification accuracy andarchitectural complexity for four dataset during testing.

Proposed method MOPSOClassification accuracy Architectural complexity Classification accuracy Architectural complexity

IRIS 97.8 38.4 95.3 50.02WINE 98.31 26.86 92.34 24.0PIMA 72.1 28.63 65.3 37.71BUPA 70.3 23.42 69.105 36.012

Fig. 8 – Architectural complexity in terms of percentage of PDs selected (a) Set 1, and (b) Set 2.

In case of PIMA and BUPA dataset, the proposed methodperformance w. r. t both the criterions are superior.

Table 13 presents the comparative results of the twoclassifiers on four benchmark datasets during testing stage.The results of Table 6 show that the proposed approachoutperforms the MOPSO method.

The results of Table 13 confirm that both the performancecriterions (i.e., classification accuracy and architecturalcomplexity) are really a part of classifier design. Hencesimultaneous optimization of these two criteria is really afruitful research direction.

6. Conclusions

In this paper, we have proposed a multi-objective Paretobased particle swarm optimization method for simultaneousoptimization of architectural complexity and classificationaccuracy of the PNN. To support the proposed method wereviewed state-of-the-art polynomial neural network andmulti-objective particle swarm optimization. Our methodgenerates PDs for a single layer of the basic PNN model.MOPPSO selects the optimal set of PDs and input features,which are fed to the hidden layer of our method. Further,

Page 18: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

36 C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0

e1 = y1 − (a10 + a11.x1p + a12.xq + a13.x1p.x1q + a14.x21p + a15.x21q)

e2 = y2 − (a10 + a11.x2p + a12.x2q + a13.x2p.x2q + a14.x22p + a15.x22q)

e3 = y3 − (a10 + a11.x3p + a12.x3q + a13.x3p.x3q + a14.x23p + a15.x23q)

. . .

eTR = yTR − (a10 + a11.xTRp + a12.xTRq + a13.xTRp.xTRq + a14.x2TRp + a15.x2TRq)

Box I.

∂E∂a10

= 2TR∑i=1

[yi − (a10 + a11xip + a12xiq + a13.xip.xiq + a14.x2ip + a15.x2iq)

]= 0

∂E∂a11

= 2TR∑i=1

xip

[yi − (a10 + a11xip + a12xiq + a13.xip.xiq + a14.x2ip + a15.x2iq)

]= 0

∂E∂a12

= 2TR∑i=1

xiq

[yi − (a10 + a11xip + a12xiq + a13.xip.xiq + a14.x2ip + a15.x2iq)

]= 0

∂E∂a13

= 2TR∑i=1

xip.xiq

[yi − (a10 + a11xip + a12xiq + a13.xip.xiq + a14.x2ip + a15.x2iq)

]= 0

∂E∂a14

= 2TR∑i=1

x2ip

[yi − (a10 + a11xip + a12xiq + a13.xip.xiq + a14.x2ip + a15.x2iq)

]= 0

∂E∂a15

= 2n∑

i=1

x2iq

[yi − (a10 + a11xip + a12xiq + a13.xip.xiq + a14.x2ip + a15.x2iq)

]= 0

Box II.

TR∑i=1

yi = a10.TR+ a11.

TR∑i=1

xip + a12.

TR∑i=1

xiq + a13.

TR∑i=1

x.ipxiq + a14.

TR∑i=1

x2ip + a15.

TR∑i=1

x2iq

TR∑i=1

yi.xip = a10TR∑i=1

xip + a11.

TR∑i=1

x2ip + a12.

TR∑i=1

xip.xiq + a13.

TR∑i=1

x2ip.xiq + a14.

TR∑i=1

x3ip + a15.

TR∑i=1

xip.x2iq

TR∑i=1

yi.xiq = a10TR∑i=1

xiq + a11.

TR∑i=1

xip.xiq + a12.

TR∑i=1

x2iq + a13.

TR∑i=1

xip.x2iq + a14.

TR∑i=1

x2ip.xiq + a15.

TR∑i=1

x3iq

TR∑i=1

yi.xip.xiq = a10.

TR∑i=1

xip.xiq + a11.

TR∑i=1

x2ip.xiq + a12.

TR∑i=1

xip.x2iq + a13.

TR∑i=1

x2ip.x2iq + a14.

TR∑i=1

x3ip.xiq + a15.

TR∑i=1

xip.x3iq

TR∑i=1

yi.x2ip = a10.

TR∑i=1

x2ip + a11.

TR∑i=1

x3ip + a12.

TR∑i=1

x2ip.xiq + a13.

TR∑i=1

x3ip.xiq + a14.

TR∑i=1

x4ip + a15.

TR∑i=1

x2ipx2iq

TR∑i=1

yi.x2iq = a10.

TR∑i=1

x2iq + a11.

TR∑i=1

xip.x2iq + a12.

TR∑i=1

x3iq + a13.

TR∑i=1

xip.x3iq + a14.

TR∑i=1

x2ip.x2iq + a15.

TR∑i=1

x4iq

Box III.

the method implicitly optimizes the weight vectors betweenhidden layer and output layer. The set of non-dominatedsolutions are maintained in an external archive calledEX_ARCHIVE, which is controlled by the nearest neighbormethod. Further, we employed the k-medoid algorithm forfetching the non-dominated solution from EX_ERCHIVE toupdate the velocity. The experimental studies demonstratedthat our method gives more flexibility to the user for choosinga final solution from the Pareto front. The performance of themethod is better in terms of complexity, which is also treatedas one of the crucial aspects in the data mining community.

The important directions for future research includeempirical comparisons of different methods and a sensitivityanalysis of the proposed method with a proper tuning of theparameters. An extensive simulation studies is required forfurther validation of this proposed method and is part of theauthors current research interests.

Acknowledgements

The authors would like to thank Department of Science andTechnology, Govt. of India for financial support under theBOYSCAST fellowship 2007–2008 and Brain Korea 21 projecton Next Generation Mobile Software at Yonsei University,South Korea.

Appendix

Thematrices X, Y and A of Section 2 can be defined as follows:Let us consider the equations for the first PD of first layer,which receives only two input features p and q of the trainingdata subset given in Box I.

In general the Eq. (3) can be written as:

ei = yi − (a10 + a11xip + a12xiq + a13xipxiq

Page 19: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0 37

X =

TRTR∑i=1

xip

TR∑i=1

xiq

TR∑i=1

xipxiq

TR∑i=1

x2ip

TR∑i=1

x2iq

TR∑i=1

xip

TR∑i=1

x2ip

TR∑i=1

xipxiq

TR∑i=1

x2ipxiq

TR∑i=1

x3ip

TR∑i=1

xipx2iq

TR∑i=1

xiq

TR∑i=1

xipxiq

TR∑i=1

x2iq

TR∑i=1

xipx2iq

TR∑i=1

x2ipxiq

TR∑i=1

x3iq

TR∑i=1

xipxiq

TR∑i=1

x2ipxiq

TR∑i=1

xipx2iq

TR∑i=1

x2ipx2iq

TR∑i=1

x3ipxiq

TR∑i=1

xipx3iq

TR∑i=1

x2ip

TR∑i=1

x3ip

TR∑i=1

x2ipxiq

TR∑i=1

x3ipxiq

TR∑i=1

x4ip

TR∑i=1

x2ipx2iq

TR∑i=1

x2iq

TR∑i=1

xipx2iq

TR∑i=1

x3iq

TR∑i=1

xipx3iq

TR∑i=1

x2ipx2iq

TR∑i=1

x4iq

Box IV.

+a14x2ip + a15x2iq), ∀i,1 ≤ i ≤ TR. (A.1)

The equation for the least square fit can be

E = e21 + e22 + · · · + e2TR =TR∑i=1

e2i . (A.2)

In order to minimize the error we can differentiate theEq. (A.2) in terms of all the unknown variables (See Box II).

On expanding the above equations, we get equation givenin Box III. This set of linear equations can be expressed in thefollowing matrix form:

Y = XA⇒ XTY = XT(XA)⇒ XTY = (XTX)A.

Hence the coefficient of the PD can be determined byA = (XTX)−1(XTY), where A = [a10, a11, a12, a13, a14, a15]

T,Y = [

∑TRi=1 yi,

∑TRi=1 yixip,

∑TRi=1 yixiq,

∑TRi=1 yixipxiq,

∑TRi=1 yix

2ip,∑TR

i=1 yix2iq]

T and see the equation given in Box IV. with TR

denotes the number of training samples.

R E F E R E N C E S

[1] H.-P. Kriegel, et al., Future trends in data mining, DataMining and Knowledge Discovery 15 (1) (2007) 87–97.Springer Netherlands.

[2] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From data min-ing to knowledge discovery: An overview, in: U.M. Fayyad,G. Piatetsky-Shaoiro, P. Smyth, R. Uthurusamy (Eds.), Ad-vances in Knowledge Discovery and Data Mining, AAAIPress, Menlo Park, CA, 1996, pp. 1–34.

[3] Y. Peng, G. Kou, Y. Shi, Z. Chen, A systemic framework forthe field of data mining and knowledge discovery, in: Proc.Workshops on the Sixth International Conference on DataMining Technique, ICDM’06, 2006, pp. 395–399.

[4] X. Wu, et al., Top 10 algorithms in data mining, Knowledgeand Information Systems 14 (2008) 1–37.

[5] J.-G. Lee, J. Han, X. Li, H. Gonzalez, TraClass: Trajectory clas-sification using hierarchical region-based and trajectory-based clustering, in: Proc. 2008 Int. Conf. on Very Large DataBase, VLDB’08, Auckland, New Zealand, Aug. 2008.

[6] D. Cai, X. He, J. Han, SRDA: An efficient algorithm forlarge scale discriminant analysis, IEEE Transactions onKnowledge and Data Engineering 20 (1) (2008) 1–12.

[7] J. Han, H. Cheng, D. Xin, X. Yan, Frequent pattern mining:Current status and future directions, Data Mining andKnowledge Discovery 15 (1) (2007) 55–86.

[8] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2ndedition, Wiley, New York, 2001.

[9] S. Theodoridis, K. Koutroumbas, Pattern Recognition, 3rdedition, Elsevier, 2006.

[10] C.M. Bishop, Pattern Recognition and Machine Learning,Springer, 2006.

[11] J.C. Russ, The Image Processing Handbook,ISBN: 0849372542, 2006.

[12] S.K. Pal, P. Mitra, Pattern Recognition Algorithms for DataMining, CRC Press, Boca Raton, FL, 2004.

[13] R.C. Gonzalez, R.E. Woods, Digital Image Processing, ISBN: 0-201-50803-6, 1992.

[14] W. Burger, M.J. Burge, Digital Image Processing: AnAlgorithmic Approach Using Java, Springer, 2007.

[15] C. Kwan, et al., A novel approach for spectral unmixing,classification, and concentration estimation of chemicaland biological agents, IEEE Transactions on Geoscience andRemote Sensing 44 (2) (2006) 409–419.

[16] S. Ma, J. Huang, Penalized feature selection and classifica-tion in bioinformatics, Brief Bioinform. 9 (5) (2008) 392–403.

[17] H.W. Ressom, et al., Classification algorithms for phenotypeprediction in genomics and proteomics, Front. Biosci. 13(2008) 691–708.

[18] R.L. Somorjai, et al., Class prediction and discovery usinggene microarray and proteomics mass spectrocopy data:Curses, caveats, cautions, Bioinformatics 19 (12) (2003)1484–1491.

[19] P. Baldi, S. Brunak, Bioinformatics: The Machine LearningApproach, 2nd edition, MIT Press, 2001.

[20] J.-H. Hong, S.-B. Cho, A probabilistic multi-class strategyof one-versus-rest support vector machines for cancerclassification, Neurocomputing 71 (2008) 3275–3281.

[21] S.-B. Cho, H.-H. Won, Cancer classification using ensembleof neural networks with multiple significant gene subsets,Applied Intelligence 26 (3) (2007) 243–250.

[22] S. Das, The polynomial neural network, InformationSciences 87 (4) (1995) 231–246.

[23] N. Nikolaev, H. Iba, Adaptive Learning of PolynomialNetworks, Springer, New York, 2006.

[24] B.B. Misra, S. Dehuri, P.K. Dash, G. Panda, A reduced andcomprehensible polynomial neural network for classifica-tion, Pattern Recognition Letters 29 (2008) 1705–1712.

[25] C.A.C. Coello, G.B. Lamont, D.A. Van Veldhuizen, Evolution-ary Algorithms for Solving Multi-objective Problems, 2ndedition, Springer, New York, 2007.

[26] S. Dehuri, S. Ghosh, A. Ghosh, Genetic algorithm foroptimization of multiple objectives in knowledge discoveryfrom large databases, in: A. Ghosh, et al. (Eds.), Multi-objective Evolutionary Algorithms for Knowledge Discoveryfrom Databases, Springer, 2008, pp. 1–22.

Page 20: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

38 C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0

[27] S. Dehuri, et al., Application of elitist multi-objective geneticalgorithms for classification rule generation, Applied SoftComputing Journal 8 (1) (2008) 477–487.

[28] A. Ghosh, S. Dehuri, S. Ghosh (Eds.), Multi-objectiveEvolutionary Algorithms for Knowledge Discovery inDatabases, Springer, 2008.

[29] K. Deb, Multi-objective Optimization Using EvolutionaryAlgorithms, Wiley, Chichester, London, 2001.

[30] Ma G.C. Tapia, C.A.C. Coello, Applications of multi-objectiveevolutionary algorithms in economics and finance: Asurvey, in: Proceedings of IEEE Congress on EvolutionaryComputation, CEC’ 2007, IEEE Press, Singapore, 2007,pp. 532–539.

[31] D.E. Goldberg, Genetic Algorithms in Search, Optimizationand Machine Learning, Kluwer Academic Publishers,Boston, MA, 1989.

[32] D.B. Fogel, Evolutionary Computation: Toward a NewPhilosophy of Machine Intelligence, 3rd edition, IEEE Press,Piscataway, NJ, 2006.

[33] A.E. Eiben, J.E. Smith, Introduction to Evolutionary Comput-ing, Corr. 2nd Printing, Springer, 2007.

[34] J. Kennedy, R. Eberhart, Swarm Intelligence, MorganKaufmann Academic Press, 2001.

[35] M. Dorigo, M. Birattari, Swarm intelligence, Scholarpedia 2(9) (2007) 1462.

[36] M. Clerc, Particle Swarm Optimization, ISTE, 2006.[37] M.G. Hinchey, R. Sterritt, C. Rouff, Swarms and swarm

intelligence, IEEE Computer 40 (4) (2007) 111–113.[38] J.-B. Weldner, Nanocomputers and Swarm Intelligence,

ISTE, 2007.[39] A.P. Engelbrecht, Fundamentals of Computational Swarm

Intelligence, John Wiley & Sons, 2005.[40] A.P. Engelbrecht (Ed.), Computational Intelligence: An

Introduction, John Wiley & Sons, England, 2002.[41] R.C. Eberhart, Y. Shi, Comparison between genetic algo-

rithms and particle swarm optimization, in: V.W. Porto,N. Saravanan, D.Waagen, A.E. Eiben (Eds.), Evolutionary Pro-gramming VII, Springer, 1998, pp. 611–616.

[42] N.L. Nikolaev, H. Iba, Automated discovery of polynomialsby inductive genetic programming, in: J. Zutkow, J. Ranch(Eds.), Principles of Data Mining and Knowledge Discovery(PKDD’99), Springer, Berlin, 1999, pp. 456–462.

[43] A.G. Ivakhnenko, A.A. Zholnarskiy, Estimating the coeffi-cients of polynomials in parametric GMDH algorithms bythe improved instrumental variables method, Journal of Au-tomation and Information Sciences c/c of Avtomatika 25 (3)(1992) 25–32.

[44] A.G. Ivakhnenko, Polynomial theory of complex systems,IEEE Transactions on Systems, Man and Cybernetics-Part A1 (4) (1971) 364–378.

[45] A.G. Ivakhnenko, H.R. Madala, Inductive Learning Algorithmfor Complex Systems Modelling, CRC Inc., Boca Raton, 1994.

[46] V.S. Stepashko, GMDH algorithms as a basis for automatingthe process of modeling from empirical data, Soviet Journalof Automation and Information Sciences c/c of Avtomatika21 (4) (1988) 42–52.

[47] A.G. Ivakhnenko, The group method of data handling —A rival of the method of stochastic approximation, SovietAutomatic Control c/c of Avtomatika 1 (3) (1968) 43–55.

[48] A.G. Ivakhnenko, The group method of data handlingin prediction problems, Soviet Automatic Control c/c ofAvtomatika 9 (6) (1976) 21–30.

[49] J.A. Muller, A.G. Ivakhnenko, F. Lemke, GMDH algorithms forcomplex systems modeling, Mathematical and ComputerModeling of Dynamical Systems 4 (4) (1998) 275–316.

[50] S.-K. Oh, W. Pedrycz, Self organizing polynomial neuralnetworks based on polynomial and fuzzy polynomialneurons: Analysis and design, Fuzzy Sets and Systems 142(2004) 163–198.

[51] E.F. Vasechkina, V.D. Yarin, Evolving polynomial neuralnetwork by means of genetic algorithm: Some applicationexamples, Complexity International 9 (2001) 1–13.

[52] V. Schetinin, Polynomial neural networks learnt to classifyEEG signals, NIMIA-SC2001, 2001.

[53] B.-J. Park, W. Pedrycz, S.-K. Oh, Fuzzy polynomial neuralnetworks: Hybrid architectures of fuzzy modeling, IEEETransactions on Fuzzy Systems 10 (5) (2002) 607–621.

[54] S.-K. Oh, W. Pedrycz, The design of self-organizingpolynomial neural networks, Information Sciences 141(2002) 237–258.

[55] T.I. Aksyonova, V.V. Volkovich, I.V. Tetko, Robust polynomialneural networks in quantitative-structure activity relation-ship studies, Systems Analysis Modeling Simulation 43 (10)(2003) 1331–1339.

[56] S.-K. Oh, et al., A new approach to self-organizing poly-nomial neural networks by means of genetic algorithms,in: F. Yin, J. Wang, C. Guo (Eds.), in: LNCS, vol. 3173, 2004,pp. 174–179.

[57] S.-K. Oh, W. Pedrycz, H.-S. Park, Multi-layer hybrid fuzzypolynomial neural networks: A design in the frameworkof computational intelligence, Neurocomputing 64 (2005)397–431.

[58] D. Kim, G.-T. Park, A new design of polynomial neuralnetworks in the framework of genetic algorithms, IEICETransactions on Information and System E89-D (8) (2006)2429–2438.

[59] P. Liatsis, A. Foka, J.Y. Goulermas, L. Mandic, Adaptivepolynomial neural networks for time series forecasting,in: Proc. 49th International Symposium ELMAR-2007, 2007,pp.35–39.

[60] M.H. Fazel Zarandi, et al., Fuzzy polynomial neural networksfor approximation of the compressive strength of concrete,Applied Soft Computing Journal 8 (2008) 488–498.

[61] H.-J. Park, J.-S. Lim, J.M. Kang, Applications of advancedpolynomial neural networks for prediction of multi-wellreservoir performance, Energy Sources, Part A 30 (2008)452–463.

[62] B.B. Misra, S. Dehuri, P.K. Dash, G. Panda, Reducedpolynomial neural swarm net for classification task in datamining, in: Proc. IEEE World Congress on ComputationalIntelligence, Honk Kong, 2008.

[63] J. Kennedy, R.C. Eberhart, Particle swarm optimization,in: Proceedings of the 1995 International Conference onNeural Networks, vol. 4, IEEE Press, Piscataway, NJ, 1995 pp.1942–1948.

[64] J. Kennedy, R.C. Eberhart, Y. Shi, Swarm Intelligence, MorganKaufmann Publishers, San Francisco, CA, 2004.

[65] C.W. Reynolds, Flocks, herds and schools: A distributedbehavioral model, Computer Graphics 21 (4) (1987) 25–34.

[66] F. Heppner, U. Grenander, A stochastic nonlinear model forcoordinated bird flocks, in: S. Krasner (Ed.), The Ubiquity ofChaos, AAAS Publications, Washington, DC, 1990.

[67] R. Rucker, Seek! Four Walls Eight Windows, New York, 1999.[68] R. Mendes, J. Kennedy, J. Neves, The fully informed particle

swarm: Simpler may be better, IEEE Transactions onEvolutionary Computation 8 (3) (2004) 204–210.

[69] C.A.C. Coello, M.S. Lechunga, MOPSO: A proposal for multi-ple objective particle swarm optimization, in: Proceedings ofthe 2002 Congress on Evolutionary Computation, Part of the2002 IEEE World Congress on Computational Intelligence,IEEE Press, Hawaii, 2002, pp. 1051–1056.

[70] J.E. Fieldsend, S. Singh, A multi-objective algorithm basedupon particle swarm optimization, an efficient datastructure and turbulence, in: Proc. UK Workshop onComputational Intelligence (UKCI’02), Birmingham, UK,2002, pp. 37–44.

Page 21: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0 39

[71] X. Hu, R. Eberhart, Multi-objective optimization using dy-namic neighborhood particle swarm optimization, in: Pro-ceedings of the 2002 Congress on evolutionary Computa-tion, Part of the 2002 IEEEWorld Congress on ComputationalIntelligence, IEEE Press, Hawaii, 2002, pp. 1677–1681.

[72] K.E. Parsopoulos, M.N. Vrahatis, Particle swarm optimiza-tion method in multi-objective problems, in: Proc. 2002 ACMSymposium on Applied Computing, SAC 2002, 2002, pp.603–607.

[73] K.C. Tan, E.F. Khor, T.H. Lee, Multi-objective EvolutionaryAlgorithms and Applications, Springer-Verlag, London,2005.

[74] A. Abraham, L.C. Jain, R. Goldberg (Eds.), EvolutionaryMulti-objective Optimization Theoretical Advances andApplications, Springer-Verlag, London, 2005.

[75] U. Baumgartner, Ch. Magele, W. Renhart, Pareto optimalityand particle swarm optimization, IEEE Transactions onMagnetics 40 (2) (2004) 1172–1175.

[76] S. Suresh, P.B. Sujit, A.K. Rao, Particle swarm optimizationapproach for multi-objective composite box-beam design,Composite Structures 81 (2007) 598–605.

[77] K. Miettinen, Nonlinear Multi-objective Optimization,Kluwer Academic Publishers, Boston, MA, 1999.

[78] C.A.C. Coello, A comprehensive survey of evolutionarybased multi-objective optimization techniques, Knowledgeand Information Systems: An International Journal 1 (3)(1999) 269–308.

[79] K.E. Parsopaulos, D.K. Tasoulis, M.N. Vrahatis, Multi-objective optimization using parallel vector evaluatedparticle swarm optimization, in: Proceedings of the IASTEDInternational Conference on Artificial Intelligence andApplications, AIA 2004, vol. 2, ACTA Press, Innsbruck,Austria, 2004, pp. 823–828.

[80] J.D. Schaffer, Multiple objective optimization with vectorevaluated genetic algorithms, Ph.D. Thesis, VanderbiltUniversity, 1984.

[81] J.D. Schaffer, Multiple objective optimization with vectorevaluated genetic algorithms, in: Genetic Algorithms andTheir Applications: Proceedings of the First InternationalConference on Genetic Algorithms, Lawrence Erlbaum,1985, pp. 23–100.

[82] C.-K. Chow, H.-T Tsui, Autonomous agent responselearning by a multi-species particle swarm optimization,in: Congress on Evolutionary Computation, CEC’ 2004, vol.1, IEEE Press, Portland, OR, USA, 2004, pp. 778–785.

[83] P. Koduru, S. Das, S.M. Welch, Multi-objective hybridPSO using ε-fuzzy dominance, in: Proc. Genetic andEvolutionary Computation Conference, London, England,United Kingdom, 2007.

[84] S. Janson, D. Merkle, M. Middendorf, Molecular docking withmulti-objective particle swarm optimization, Applied SoftComputing Journal 8 (2008) 666–675.

[85] Q. Zhang, S. Xue, An improved multi-objective particleswarm optimization algorithm, in: Proceedings of ISICA2007, in: LNCS, vol. 4683, 2007, pp. 372–381.

[86] S.N. Omkar, et al., Vector evaluated particle swarm opti-mization (VEPSO) for multi-objective design optimization ofcomposite structures, Computers and Structures 86 (2008)1–14.

[87] S. Mostaghim, J. Branke, H. Schmeck, Multi-objectiveparticle swarm optimization on computer grids, in:Proc. Genetic and Evolutionary Computation Conference,London, England, United Kingdom, 2007, pp. 869–875.

[88] J. Moore, R. Chapman, Application of particle swarmto multi-objective optimization, Department of ComputerScience and Software Engineering, Auburn University, 1999.

[89] T. Ray, K.M. Liew, A swarm metaphor for multi-objectivedesign optimization, Engineering Optimization 34 (2) (2002)141–153.

[90] C.A.C. Coello, G.T. Polido, M.S. Lechuga, Handling multipleobjectives with particle swarm optimization, IEEE Transac-tions on Evolutionary Computation 8 (3) (2004) 256–279.

[91] G.T. Pulido, C.A.C. Coello, Using clustering techniques toimprove the performance of a particle swarm optimizer,in: Proceedings of the Genetic and Evolutionary Computa-tion Conference, in: LNCS, vol. 3102, Springer-Verlag, 2004,pp. 225–237.

[92] D. Srinivasan, T.-H. Seow, Particle swarm inspired evolution-ary algorithm (PS-EA) for multi-objective optimization prob-lem, in: Congress on Evolutionary Computation, vol. 3, IEEEPress, Australia, 2003, pp. 2292–2297.

[93] S. Mostaghim, J. Teich, Strategies for finding good localguides in multi-objective particle swarm optimization(MOPSO), in: Proceedings of IEEE Swarm IntelligenceSymposium, IEEE Press, Indiana, 2003, pp. 26–33.

[94] S. Mostaghim, J. Teich, The role of ε-dominance inmulti-objective particle swarm optimization methods,in: Congress on Evolutionary Computation, vol. 3, IEEE Press,Canberra, Australia, 2003, pp. 1764–1771.

[95] M. Laumanns, L. Thiele, K. Dev, E. Zitzler, Combiningconvergence and diversity in evolutionary multi-objectiveoptimization, Evolutionary Computation 10 (3) (2002)263–282.

[96] S. Mostaghim, J. Teich, Covering Pareto optimal fronts bysubswarms in multi-objective particle swarm optimization,in: Congress on Evolutionary Computation, vol. 2, IEEE Press,Portland, OR, USA, 2004, pp. 1404–1411.

[97] T.B. -Beielstein, et al., Particle swarm optimizers forPareto optimization with enhance archiving techniques,in: Congress on Evolutionary Computation, vol. 3, IEEE Press,Canberra, Australia, 2003, pp. 1780–1787.

[98] X. Li, A non-dominated sorting particle swarm optimizer formulti-objective optimization, in: Proceedings of the Geneticand Evolutionary Computation Conference, in: LNCS, vol.3102, Springer-Verlag, 2004, pp. 117–128.

[99] K. Deb, et al., A fast and elitist multi-objective geneticalgorithm: NSGA II, IEEE Transactions on EvolutionaryComputation 6 (2) (2002) 182–197.

[100] M.R. Sierra, C.A.C. Coello, Improving PSO based multi-objective optimization using crowding, mutation, and ε-dominance, in: Third International Conference on Evolu-tionary Multi-criterion Optimization, in: LNCS, vol. 3410,Springer-Verlag, Mexico, 2005, pp. 505–519.

[101] J.E.A. -Benitez, et al., A MOPSO algorithm based exclu-sively on Pareto dominance concepts, in: Third Interna-tional Conference on Evolutionary Multi-criterion Optimiza-tion, in: LNCS, vol. 3410, Springer-Verlag, Mexico, 2005,pp. 459–473.

[102] S.L. Ho, et al., A particle swarm optimization based methodfor multi-objective design optimizations, IEEE Transactionson Magnetics 41 (5) (2005) 1756–1759.

[103] M.A.V. -Arias, et al., A proposal to use stripes to maintaindiversity in a multi-objective particle swarm optimizer,in: Proceedings of the 2005 IEEE Swarm IntelligenceSymposium, IEEE Press, Pasadena, California, USA, 2005,pp. 22–29.

[104] M.S. -Lechuga, J. Rowe, Particle swarm optimizationand fitness sharing to solve multi-objective optimizationproblems, in: Congress on Evolutionary Computation(CEC’2005), IEEE Press, Edinburgh, Scotland, UK, 2005,pp. 1204–1211.

[105] D.E. Goldberg, J. Richardson, Genetic algorithms with shar-ing for multi-modal function optimization, in: Proceedingsof the Second International Conference on Genetic Algo-rithms, Lawrence Erlbaum Associates, 1987, pp. 41–49.

Page 22: Multi-criterion Pareto based particle swarm optimized …sclab.yonsei.ac.kr/publications/Papers/IJ/SATCHI_2080705.pdf · 2009-06-03 · Multi-criterion Pareto based particle swarm

40 C O M P U T E R S C I E N C E R E V I E W 3 ( 2 0 0 9 ) 1 9 – 4 0

[106] C.R. Raquel Jr., P.C. Naval, An effective use of crowdingdistance in multi-objective particle swarm optimization,in: Proceedings of the Genetic and Evolutionary Computa-tion Conference (GECCO- 2005), ACM Press, Washington, DC,USA, 2005, pp. 257–264.

[107] B. Zhao, Y. Cao, Multiple objective particle swarm optimiza-tion technique for economic load dispatch, Journal of Zhe-jiang University SCIENCE 6 (5) (2005) 420–427.

[108] J.D. Knowles, D.W. Corne, Approximating the non-dominated front using the Pareto archived evolutionstrategy, Evolutionary Computation 8 (2) (2000) 149–172.

[109] S. Janson, D. Merkle, A new multi-objective particleswarm optimization algorithm using clustering appliedto automated docking, in: Hybrid Metaheuristics, SecondInternational Workshop, HM 2005, in: LNCS, vol. 3636,Barcelona, Spain, 2005, pp. 128–142.

[110] C.-S. Tsou, et al., An improved particle swarm Paretooptimizer with local search and clustering, in: Proceedingsof 6th International Conference on Simulated Evolution andLearning, in: LNCS, vol. 4247, 2006, pp. 400–407.

[111] E.F.G. Goldbarg, et al. Particle swarm optimization forthe bi-objective degree-constrained minimum spanningtree, in: Proc. IEEE Congress on Evolutionary Computation,Vancouver, BC, Canada, 2006, pp. 420–427.

[112] A.M. Baltar, D.G. Fontane, A generalized multi-objectiveparticle swarm optimization solver for spreadsheet models:Application to water quality, Hydrology Days, 2006.

[113] S. Xu, Y.R. -Samii, Multi-objective Particle Swarm Optimiza-tion for High Performance Array and Reflector Antennas,IEEE Press, 2006, 3292–3296.

[114] M.K. Gill, et al., Multi-objective particle swarm optimizationfor parameter estimation in hydrology, Water ResourcesResearch 42 (2006) 1–14.

[115] H. Liu, et al., Variable neighborhood particle swarm opti-mization for multi-objective flexible job-shop schedulingproblems, in: Proceedings of 6th International Conferenceon Simulated Evolution and Learning, in: LNCS, 4247, 2006,pp. 197–204.

[116] Y. Niu, L. Shen, An adaptive multi-objective particleswarm optimization for color image fusion, in: Proc.6th International Conference on Simulated Evolutionand Learning, in: LNCS, vol. 4247, Hefei, China, 2006,pp. 473–480.

[117] M. Koppen, C. Veenhuis, Multi-objective particle swarmoptimization by fuzzy-Pareto-dominance meta-heuristic,International Journal of Hybrid Intelligent Systems 3 (2006)179–186.

[118] P.K. Tripathi, et al., Multi-objective particle swarm optimiza-tion with time variant inertia and acceleration coefficients,Information Sciences 177 (2007) 5033–5049.

[119] M.J. Reddy, D.N. Kumar, Multi-objective particle swarmoptimization for generating optimal trade-offs in reservoiroperation, Hydrological Processes 21 (2007) 2897–2909.

[120] A.R. Rahimi-Vahed, S.M. Mirghorbani, A multi-objectiveparticle swarm for a flow shop scheduling problem, Journalof Comb. Optim. 13 (2007) 79–102.

[121] M.A. Abido, Two-level of non-dominated solutions approachto multi-objective particle swarm optimization, in: Proc. Ge-netic and Evolutionary Computation Conference, London,England, United Kingdom, 2007, pp. 1–8.

[122] M. Mahfouf, M.-Y. Chen, D.A. Linkens, Adaptive weightedparticle swarm optimization for multi-objective optimaldesign of alloy steels, in: Parallel Problem Solving fromNature — PPSN VIII, in: LNCS, vol. 3242, Springer Verlag,Birmingham, UK, 2004, pp. 762–771.

[123] Z.X. -Hua, M.H. -Yun, J.L. -Cheng, Intelligent particle swarmoptimization in multi-objective optimization, in: Congresson Evolutionary Computation, CEC’2005, IEEE Press, Edin-burgh, Scotland, UK, 2005, pp. 714–719.

[124] M. Nakamura, et al. A multi-objective particle swarm op-timization incorporating design sensitivities, in: Proc. 11thAIAA/ISSMO Multidisciplinary Analysis and OptimizationConference, Virginia, 2006.

[125] X. Yao, Evolving artificial neural networks, Proceedings ofthe IEEE 87 (1999) 1423–1447.

[126] H. Abbass, An evolutionary artificial neural networksapproach to breast cancer diagnosis, Artificial Intelligencein Medicine 25 (3) (2002) 265–281.

[127] J.G. Lin, Three methods for determining Pareto optimalsolutions of multiple objective problems, in: Y.C. Ho,S.K. Mitter (Eds.), Directions in Large Scale Systems, PlenumPress, New York, 1975, pp. 117–138.

[128] J.G. Lin, Multiple objective problems: Pareto optimalsolutions by method of proper equality constraints, IEEETransactions on Automatic Control AC-21 (5) (1976) 641–650.

[129] M. Ehrgott, Multi-criteria Optimization, Springer, 2005.[130] T.M. Cover, P.E. Hart, Nearest neighbor pattern classification,

IEEE Transactions on Information Theory IT-13 (1968) 21–27.[131] J.H. Freidman, F. Baskett, L. Shustek, An algorithm for

finding nearest neighbors, IEEE Transactions Computers C-24 (10) (1975) 1000–1006.

[132] S. Cha, S.N. Srihari, A fast nearest neighbor search algorithmby filtration, Pattern Recognition 35 (2) (2002) 515–525.

[133] J. Han, M. Kamber, Data Mining: Concepts and Techniques,2nd edition, Morgan Kaufmann Publishers, 2006.

[134] L. Kaufmann, P.J. Rousseeuw, Clustering by means ofmediod, in: Statistical Data Analysis Based on L1 Norm,Elsevier, 1987, pp. 405–416.

[135] S.-C. Chu, J.F. Roddick, J.S. Pan, An efficient k-medoidsbased algorithm using previous medoid index, triangularinequality elimination criteria and partial distance search,in: Proceedings of International Conference on DataWarehousing and Knowledge Discovery, DaWaK, Springer-verlag, London, 2002, pp. 63–72.

[136] C.L. Blake, C.J. Merz, UCI repository of machine learningdatabases. http://www.ics.uci.edu/~mlearn/MLRepository.