A Hybrid Elitist Pareto-based Coordinate Exchange ..._cao_etal.pdf · A Hybrid Elitist Pareto-based...

Noname manuscript No.(will be inserted by the editor)

A Hybrid Elitist Pareto-based Coordinate Exchange Algorithm forConstructing Multi-criteria Optimal Experimental Designs

Yongtao Cao · Byran J. Smucker · Timothy J. Robinson

Received: date / Accepted: date

Abstract This paper presents a new Pareto-based coordi-nate exchange algorithm for populating or approximatingthe true Pareto front for multi-criteria optimal experimen-tal design problems that arise naturally in a range of in-dustrial applications. This heuristic combines an elitist-likeoperator inspired by evolutionary multi-objective optimiza-tion algorithms with a coordinate exchange operator that iscommonly used to construct optimal designs. Benchmarkingresults from both a two-dimensional and three-dimensionalexample demonstrate that the proposed hybrid algorithm cangenerate highly reliable Pareto fronts with less computa-tional effort than existing procedures in the statistics liter-ature. The proposed algorithm also utilizes a multi-start op-erator, which makes it readily parallelizable for high perfor-mance computing infrastructures.

Keywords Multi-criteria optimal experimental design ·Pareto front · hybrid algorithm

1 Introduction

There is a general and growing acknowledgment within theexperimental design community that in realistic applicationsthere are typically multiple, conflicting objectives. It is wellknown that an experimental design which is optimal with

Yongtao CaoDepartment of Mathematics, Indiana University of Pennsylvania, Indi-ana, PA 15705, USATel.: +1724-357-4767Fax: +1724-357-7908E-mail: [email protected]

Byran J. SmuckerDepartment of Statistics, Miami University, Oxford, OH USA

Timothy J. RobinsonDepartment of Statistics, University of Wyoming, Laramie, WY USA

respect to a single criterion may be arbitrarily bad with re-spect to other criteria and thus may be unacceptable for thedecision maker. The Pareto optimality approach to multi-criteria optimal experimental design develops a set of trade-off designs by simultaneously evaluating several conflictingdesign criteria. The collection of designs that provide thebest tradeoffs among objectives is called the Pareto set, andthe collection of criteria values of the designs in the Paretoset are known as the Pareto front (PF); this will be morecarefully defined later. In real-world applications, the truePF is generally unknown and hence the major challenge ofsolving multi-criteria optimal experimental design problemsinvolves identifying the true PF, or at least a good approxi-mation of it, in an efficient and reliable manner.

Three algorithms have been used in the statistics liter-ature for generating Pareto fronts of experimental designs.First, Park (2009) considered the application of a multi-objectivegenetic algorithm (MOGA) for generating optimal second-order response surface designs. MOGA is a heuristic searchmethod that begins by generating a large number of randomdesigns (e.g. 10000) and then pairing them to obtain a set of‘parents’ (5000 in our example). These parents then gener-ate ‘offspring’ by simulating biological evolution and hered-ity using stochastic operators such as crossover, mutation,and elite-preserving cloning to optimize objective functionsthat assess the fitness of each individual design. This processcontinues for a specified number of generations to obtain anapproximated Pareto front. A drawback of the use of MOGAfor populating a Pareto front is that the algorithm’s achieve-ment depends heavily upon how the stochastic operators aretuned. Since this tuning is problem-specific, it is difficultto imagine routine use of a genetic algorithm in practice. Itshould be noted that the tuning issue is not unique to theconstruction of a Pareto front, but is a general issue withgenetic algorithms (Limmun et al., 2013). Previous work

2 Yongtao Cao et al.

in the experimental design community (Borkowski, 2003;Sexton et al., 2006) suggests augmenting the genetic algo-rithm with a local grid search on the generated solutions.While this suggestion may be practical for some situationsin single-objective design optimization, its implementationfor populating the Pareto front would result in a great dealof computational inefficiency.

Secondly, Lu et al. (2011) proposed a Pareto aggregatepoint exchange (PAPE) algorithm for providing multi-criteriaoptimal screening designs. The PAPE algorithm is a gener-alization of classical point-exchange algorithms (Fedorov,1972; Cook and Nachtsheim, 1980) that starts by randomlygenerating a set of starting designs. Each starting design is arandom sample of design points from a user-specified can-didate set of points. For a random starting design, each pointis exchanged sequentially with every single point in the can-didate set and designs that are currently Pareto optimal arestored in an archive so that an approximate Pareto front isconstructed. The Pareto fronts resulting from each startingdesign are then combined to produce a final approximatedfront. One potential difficulty in the PAPE structure is that itis not always easy to construct such a candidate list (highlyconstrained mixture designs, for instance), but even moretroubling are the computational inefficiencies that the PAPEengenders. For design problems with more than a few fac-tors, or in applications which have large candidate lists, thePAPE is computationally inefficient if not prohibitive.

The third method for generating Pareto fronts in the ex-perimental design literature involves a variation of the coor-dinate exchange algorithm (Meyer and Nachtsheim, 1995),this latter procedure commonly used in commercial optimaldesign implementations. Sambo et al. (2014) developed acoordinate-exchange two-phase local search algorithm (CE-TPLS) by combining the coordinate exchange operator andthe traditional weighted-sum design approach. In the firstphase, this algorithm generates designs which are optimalwith respect to each single criterion. The optimal designsgenerated in phase 1 are stored in the current Pareto front,say A . In the second phase, the algorithm randomly selectsa starting design from A and then creates a new objectivefunction via a weighted sum of the criteria based on a ran-domly generated set of weights, and finds the design thatresults in the optimum value of this weighted average. Ifthis design is non-dominated, it is stored in A along withthose designs generated in phase 1. After repeating the sec-ond phase by selecting weights a user-specified number oftimes, a Pareto optimal set is obtained. Though this algo-rithm uses the coordinate exchange operator and thus avoidsthe difficulties associated with candidate lists, changing amulti-objective optimization problem into a single-objectivevia the weighted-sum method does present some challenges.

This includes certain cases in which the Pareto front cannotbe fully explored (Das and Dennis, 1997), illustrated in thisarticle by the two-criteria example of Section 4.2.1.

This paper aims to address the issues discussed above bydeveloping heuristics that (1) do not rely on tuning param-eters, in contrast to MOGA and CE-TPLS; (2) can effec-tively explore the Pareto front, in contrast to MOGA, PAPE(because it relies on a candidate set of points) and at timesCE-TPLS; and (3) possess relative computational efficiency,in contrast to PAPE. We begin, in Section 2, by providing abrief survey of the multi-objective optimization algorithmliterature. Instead of being exhaustive, our review focuseson a discussion of the general framework for algorithm de-velopment. We then outline a framework for algorithm hy-bridization that has been recently emphasized in the multi-objective optimization community. We suggest that this frame-work can lay a foundation upon which high performance hy-brid algorithms can be effectively implemented in the settingof multiple-criteria optimal design. The remainder of themanuscript is organized as follows. Section 3 presents theproposed Pareto-based coordinate exchange operator alongwith the hybridized elitist coordinate exchange algorithmsfor populating Pareto fronts. Section 4 presents two bench-mark examples in the context of designing experiments forscreening factors and illustrates the reliability and efficiencyof the proposed algorithms versus existing Pareto front pop-ulating algorithms in the optimal design literature. In Sec-tion 5, the proposed algorithm is applied to the design ofexperiments for second-order response surface models. Sec-tion 6 offers conclusions and ideas for future work.

2 A framework for building hybrid Pareto frontgenerating algorithms

A survey of the optimization literature finds that heuristics(or algorithms — we use these terms interchangeably inthis paper) for optimization problems can typically be cate-gorized as enumerative, deterministic, or stochastic. Withinthe context of optimal experimental design, an enumerativeapproach would involve evaluating all possible designs interms of the criterion of interest and selecting the designwith the best criterion value. Clearly this approach is infea-sible for all but the smallest of problems. A deterministicdesign optimization algorithm is one in which, for a par-ticular starting design, one always obtains the same (local)optimum. A stochastic design optimization algorithm intro-duces one or more probabilistic components to the search,so that a particular starting design may produce a variety ofsolutions. As mentioned in the introduction, there are threeexisting algorithms in the field of multiple-criteria optimalexperimental design: point exchange, coordinate exchange

A Hybrid Elitist Pareto-based Coordinate Exchange Algorithm for Constructing Multi-criteria Optimal Experimental Designs 3

and genetic algorithm. Point exchange and coordinate ex-change algorithms are common deterministic heuristics usedin searching for optimal designs, though the implementa-tion of exchange algorithms in practice generally involvesa stochastic framework imposed outside of the algorithm,via multiple random starts, in order to increase the proba-bility that the global optimum is attained. MOGA is onlyone of many stochastic optimization heuristics that could beused for generating multiple-criteria optimal experimentaldesigns.

While deterministic and stochastic algorithms have beenshown to be successful for specific problems, it is unrealis-tic to expect that one or the other will be computationallyefficient in handling all design problems (see the No FreeLunch Theorems of Wolpert and Macready (1997) for in-stance). Thus, recent work in the multi-objective optimiza-tion community suggests that the hybridization of determin-istic and stochastic approaches can result in noticeable gainsin terms of computational efficiency (see for instance Ajithet al. (2007); Mashwani (2011); and Sindhya et al. (2013)).Hybridization can be realized in several ways: (1) hybridiz-ing different search methods (e.g. Sindhya et al. (2013) pre-sented a hybrid framework for evolutionary multi-objectiveoptimization which involves combining 6 different algorithms)or search operators from different algorithms (e.g. Wanget al. (2005) proposed a hybrid genetic algorithm that usedboth quantum-inspired operators as well as operators fromthe classical genetic algorithm); (2) hybridizing search andupdating methods (e.g. Elhossini et al. (2010) adapted theparticle swarm algorithm for multi-objective optimization);or (3) hybridizing different methods in different search phases(e.g. Yang et al. (2009) developed a hybrid algorithm thathas three phases and different search methods are used ineach). It is this hybridization framework that we explorewhen developing the Pareto front populating algorithms pre-sented in this paper. Specifically, we exploit a modified Pareto-based coordinate exchange operator found in deterministicdesign optimization algorithms and combine it with an elitist-like operator inspired by evolutionary algorithms. We call it“elitist-like” for reasons we will clarify when we describethe operator in more detail.

3 Elitist Pareto-based coordinate exchange algorithm

Research in optimal design includes two fairly distinct ap-proaches. The approximate, or asymptotic, design approach(e.g. Kiefer (1959, 1961); Kiefer et al. (1959); Farrell et al.(1967)) laid the mathematical foundation for optimal designby treating experimental designs as probability measures,assigning a probability distribution to each design point. There-fore, approximate theory does not require an integer-valued

number of trials at any design point, which is why the re-sulting designs are called approximate. When the number oftrials for any design point is an integer, the design is calledexact. Thus, designs that are used by researchers in practiceare always exact designs, and due to this we focus on exactdesigns in this paper. However, as soon as we do that, theoptimization becomes much more difficult. This difficultyleads to the heuristics that are customary in exact optimaldesign.

In what follows, we first present the multi-criteria opti-mal design problem in the language of Pareto dominance.We then proceed to show the development of a Pareto-basedcoordinate exchange operator, an enhanced elitism operator,and how to combine the two into an integrated algorithmwith a multi-start operator.

3.1 Pareto optimal experimental design

Let ξ denote a design. In the context of exact designs withN runs and k factors, we treat ξ as an N×k matrix. Assumethere are C ≥ 2 criteria of interest that are to be maximized.Denote the criteria values as a vector, f(ξ )= ( f1 (ξ ) , . . . , fC (ξ ))

T ,and let Ξ denote the space of all possible designs of a givensize and model specification. A design ξ1 ∈Ξ is said to dom-inate design ξ2 ∈ Ξ if both of the following conditions aretrue:

(1) f j (ξ1)≥ f j (ξ2) ,∀ j ∈ {1, . . . ,C}.(2) f j (ξ1)> f j (ξ2) for at least one j ∈ {1, . . . ,C}.

Notationally, we write ξ1 � ξ2. If only condition (1) is true,we say ξ1 weakly dominates ξ2 and write ξ1 � ξ2. The set ofall non-dominated designs ξ ∈Ξ is called the Pareto optimalset. The set of f(ξ ) = ( f1 (ξ ) , . . . , fC (ξ ))

T corresponding tothe designs in the Pareto set is termed the Pareto front. Asan illustration of a Pareto front, see the dark points that aredisplayed in Figure 1 in which two criteria f1 and f2 are tobe maximized.

The first goal when there are multiple conflicting designrequirements is to find the Pareto optimal designs along thePareto front. Secondly, the resulting Pareto front can be ana-lyzed to learn about a diverse set of solutions while consider-ing the trade-offs or inter-dependencies among design crite-ria. A good compromise design can be chosen from thoselying on the Pareto front according to the experimenter’spreferences in light of these trade-offs.

3.2 Pareto-based coordinate exchange operator

For purposes of the present discussion, we will assume thateach of the k factors have two levels, −1 and 1. This as-sumption does not impact the general implementation of our


f2

f1

Pareto

Dominated

Fig. 1 An example of Pareto front for maximizing two criteria f1 andf2.

algorithm but only serves as an aid in conceptualizing theapproach.

(1) For a given N-run design with k factors, ξ , evaluate theuser-specified C-dimensional criterion vector,

f(ξ ) = ( f1 (ξ ) , . . . , fC (ξ ))T .

(2) Initialize P, PF as null sets; then add ξ to P and addf(ξ ) to PF .

(3) Pareto-based coordinate exchange operator. For i= 1 to Nand j = 1 to k, swap (i, j) coordinate of ξ with the re-maining levels in {−1,1} to produce new designs, i.e.,generate neighbors of ξ along its rows. As an illustra-tion, we show the first exchange in the progression ofFigure 2a to 2b for an N = 12 run design with k = 4factors under the main effects only model. Since we areseeking a set of non-dominated designs for a newly gen-erated design, say ξ ∗, two comparison procedures areneeded to determine if ξ ∗ will be kept. As shall be ex-plained later, the first comparison decides whether to re-place ξ with ξ ∗, and the second comparison will decidewhether to add ξ ∗ to the current Pareto set.(i) Comparison 1.

If ξ � ξ ∗ then drop ξ ∗;If ξ ∗ � ξ , then set ξ = ξ ∗ and use ξ ∗ for secondcomparison;Else, just use ξ ∗ for comparison 2.

(ii) Comparison 2.If a design ξ ∗ is not dropped in comparison 1, thenit will be compared to the designs which exist in P,the current Pareto set of designs. If there exists a de-sign in P that dominates ξ ∗, then discard ξ ∗; if ξ ∗

dominates at least one of the designs in the currentgeneration of P, then add ξ ∗ to P and remove the de-signs dominated by ξ ∗; if ξ ∗ neither dominates nor

Fig. 2 Illustration of the coordinate exchange operator for a design oftwelve runs and four factors each at two levels. The large X means thisexchange can be omitted since the two levels are the same.

is dominated by any designs in the current genera-tion of P, then just add ξ ∗ to P.

(iii) If ξ = ξ ∗ in (i), reset i = j = 1 and return to thebeginning of Step (3). Else, continue.

Stop the above procedure when there is no new designξ ∗ such that ξ ∗ � ξ can be generated. At the end of theexchange we will have p non-dominated designs in theset P (i.e. P =

{ξ1,ξ2, . . . ,ξp

}) and p associated crite-

rion vectors in the set PF (i.e.,PF =

{f(ξ1) , f(ξ2) , . . . , f(ξp)

}).

3.3 Elitist Pareto-based coordinate exchange algorithm(EPCEA)

We now develop a hybrid Pareto front populating algorithmby incorporating the previously developed Pareto-based co-ordinate exchange operator, from Section 3.2, with an elitismoperator as well as a multiple random starts operator. Con-sider the illustration in Figure 3, while we describe the gen-eral implementation of the proposed procedure.

(1) Randomly generate a design ξ1. One can consider thetop leftmost point in Figure 3 as an example.

(2) Perform the Pareto-based coordinate exchange operatoron this starting design. Denote the resulting Pareto op-timal set as P0

1 and the Pareto front as PF01 , where the

superscript indicates the generation of Pareto sets/fronts.Assume that this Pareto front has p0

1 points. In the exam-ple in Figure 3, PF0

1 has p01 = 3 points.

(3) Enhanced elitism operator.(3a) Perform the Pareto-based coordinate exchange oper-

ator on each design in P01 to produce p0

1 Pareto setsand fronts. Note that we should skip the initial de-sign, ξ1, if it is included in P0

1 . If this is the case,


Fig. 3 The structure of elitist Pareto-based coordinate exchange algorithm. The squares represent designs for which the coordinate exchangeoperator has been previously applied. The circles represent new designs that can be explore further.

there will be p01− 1 sets of Pareto optimal designs.

Combine the p01 (or p0

1 − 1) Pareto sets as well asP0

1 and denote the resultant Pareto set as P11 , and the

Pareto front as PF11 . In Figure 3, it can be seen that

there are 3 designs in P01 and the initial design is

not on the associated Pareto front. The first designin P0

1 produces a single new design besides itself;the second design produces no new design; and thethird design produces two new designs. Combine thethree Pareto fronts (i.e. the five designs) based onPareto dominance and update P0

1 to P11 , which now

consists of four designs, with two carried over fromP0

1 (the two squares) and two newly generated non-dominated designs (the two solid circles).

(3b) Repeat Step (3a) by performing the Pareto-based co-ordinate exchange operator on the designs in the cur-rent Pareto optimal set Pg

1 but not in the previous setP(g−1)

1 for g = 1,2, . . .. Stop when Pg1 = P(g−1)

1 , thenset P1 = Pg

1 and PF1 = PFg1

(4) Multi-start operator.Repeat steps 1 - 3 with S−1 additional new starting de-signs, and obtain P2, . . . ,PS as well as PF2, . . . ,PFS.

(5) Combine the Pi (i = 1, . . . ,S) based on Pareto dominanceto produce the final set of Pareto optimal designs, P, andthe associated Pareto front PF .

We note that in the evolutionary algorithm literature theword “elitist” has a fairly specific meaning having to do withthe preservation of high fitness individuals from one genera-tion to the next. Our use of this term is similar but not iden-tical - in our procedure good designs are preserved from onestep to the next as starting points for further exploration.We name Step (3) in EPCEA the enhanced elitism opera-

tor because all the non-dominated solutions in the search-ing process will be used for generating new solutions. Notethat there are two possible outcomes for each of the non-dominated solutions: either a solution will persist through-out the entire procedure because a design which dominatesit is not found, or it will be usurped by a newly found non-dominated design. Thus we use the term “elitist” or “elitist-like” throughout.One might naturally ask if the proposed enhanced elitismoperator is better than a one-stage coordinate exchange al-gorithm (i.e. the algorithm described in Section 3.2) whichrelies on an increased number of random starts. We call thisone-stage approach a non-elitist Pareto-based coordinate ex-change algorithm (NEPCEA) and found in our algorithmdevelopment that the search performance is most efficientwhen choosing good starting designs (i.e. those designs thatare non-dominated). This is demonstrated in Sections 4.1.1and 4.1.2 where we compare the EPCEA with NPCEA. Wefind that, while both algorithms are similar in efficiency forthe smaller problem, the EPCEA is noticeably more efficientfor the larger.Note that in Step (3b) of the EPCEA algorithm we termi-nate the search procedure when Pg

1 = P(g−1)1 , which raises

a question about convergence. We expect our algorithm toconverge because for a given design the coordinate exchangeoperator searches only a finite number of the design’s neigh-bors, and each newly discovered non-dominated solution isused only once for generating new solutions. We confirmcomputationally, for the three-dimensional example in Sec-tion 4.1.2, that there are not convergence problems.


Table 1 The four factors and their coded and uncoded values in thetwo-criteria benchmark example.

Uncoded values Coded values

Factor Min Max Min Max

X1: NaCl(%) 3 25 -1 1X2: pH 1 4 -1 1X3: Shaking time(min) 2 10 -1 1X4: NaCl (times/min) 167 267 -1 1

4 Benchmark examples

For real-world multi-criteria optimal experimental design prob-lems, the true Pareto front is unknown. Hence, it is essentialto develop benchmark design problems for testing how wellan algorithm performs based upon the Pareto front it gen-erates. In this section we demonstrate the performance ofthe proposed Elitist Pareto-based Coordinate Exchange Al-gorithm (EPCEA) by evaluating and comparing its searchability with other algorithms on two benchmark examples: atwo-criteria and a three-criteria design problem.

4.1 Introducing two benchmark examples

The two-criteria example is inspired by Jones and Nacht-sheim (2011). The authors were interested in the construc-tion of two-level main-effect screening designs with protec-tion against two-factor interactions. We apply this idea toa real-world consulting problem. The three-criteria bench-mark example is based on a published design problem (Luet al., 2011).

4.1.1 Two-criteria benchmark example

Consider a chemical experiment where a researcher is inter-ested in optimizing the extraction of an organic substancefrom cultures developed in a laboratory. It is thought thatthe concentration of NaCl and pH of the solution may helpto increase the yield of the extraction process. Shaking timeand shaking rate of the mixing flask are also potential factorswhich influence extraction yield. Table 1 gives the minimumand maximum of the uncoded factors and their correspond-ing coded values.

Due to resource limitations, the design size is limited to12 runs and the researcher seeks a two-level screening de-sign that (1) can estimate the main effects model preciselyand (2) can protect against the possibility of any of the two-factor interactions. Thus the primary model is Y=X1β1+ε,where Y is the 12× 1 response vector, ε is tentatively as-sumed to have mean vector 012×1 and variance-covariancematrix σ2I12×12, and X1 contains the intercept and the fourmain effects. The potential model is Y = X1β1 +X2β2 +ε,

Table 2 Criterion values for the designs composing the true Paretofront for the two-criterion example.

Design D-efficiency bias-minimization efficiency

1 95.34% 100%2 95.69% 48.33%3 96.56% 45.80%4 97.67% 33.33%5 100% 16.67%

where X2 is a 12× 6 matrix containing the six two-factorinteractions.

The first goal can be formalized as finding a design thatmaximizes the D-efficiency of the primary model, i.e., |X′1X1|;the second goal can be formalized as finding a design thatminimizes a function of the bias for estimating β1, i.e., tr

(AA′

),

where A =(

X′1X1

)−1X′1X2 is known as the alias matrix

and shows the bias transmitted to β̂1 due to model mis-specification. For details regarding the latter criterion, read-ers are referred to Jones and Nachtsheim (2011) and Bursz-tyn and Steinberg (2006).

If experimental runs are allowed to be replicated, theo-retically there are

(16+12−112

)= 17,383,860 possible designs.

Using an enumerative method, we find five designs that com-prise the true Pareto optimal set. The Pareto front is pre-sented in Figure 4, and the results are compared in terms

of D-efficiency,(

|X′1X1||X′1optX1opt|

)1/p

, and the bias-minimization

efficiency,tr(

AoptA′opt

)tr(AA′)

, where p is the number of coefficients

in the specified model and the “opt” designation denotes thatthe matrices represent optimal designs according to their re-spective criteria.

4.1.2 Three-criteria benchmark example

A 3-criteria design problem is presented in Lu et al. (2011)where the experimenter wishes to obtain a 14-run screeningdesign for 5 factors, X1, . . . ,X5, each at 2 levels (−1 and 1).The user-defined model is Y = X1β1+ε, where X1 containsthe intercept, the 5 main effects and the particular two-factorinteractions X1X2,X1X3,X2X4 and X3X5. Though the exper-imenter wishes to have an efficient design to estimate theparameters in β1, there is also the desire to protect againstthe possibility that the true model is Y = X1β1 +X2β2 +

ε, where X2 is a 14× 6 matrix containing the remainingsix two-factor interactions. Consequently, the experimenterwishes to find a design that is efficient in terms of the D-criterion, tr

(AA

′)

-criterion, and the tr(

R′R)

-criterion, where


Fig. 4 The true Pareto front of the two-criterion example.

R = X1A−X2. Note that the D-criterion focuses upon theprecision of the model coefficient estimates, the tr

(AA

′)

-criterion seeks to minimize the effect of model mis-specificationupon the coefficient estimates, and the tr

(R′R

)-criterion

seeks to minimize the effect of model mis-specification uponthe error variance estimate. For more details on the last cri-terion, see Lu et al. (2011).

Though the possible designs for this experiment have notbeen fully enumerated and evaluated, it still serves as an ap-propriate benchmark for comparing algorithms in three di-mensions because it has been studied extensively in previouswork. This previous work has produced a Pareto front con-taining 351 designs, which have been visualized in Figure 5(a)-(d).

4.2 Results and comparison

4.2.1 Two-criteria example

We examine the performance of the elitist Pareto-based co-ordinate exchange algorithm (EPCEA) and compare it tothe non-elitist Pareto-based coordinate exchange algorithm(NEPCEA), which simply invokes the Pareto-based coor-dinate exchange operator on a number of independent ran-dom starts, as well as the Pareto aggregating point exchange(PAPE), coordinate exchange two-phase local search (CE-TPLS) and multi-objective genetic algorithms (MOGA). Thealgorithms were tested by fixing the appropriate input pa-rameter(s), and applied to the two-criteria benchmark exam-

ple 100 times. The behavior of each algorithm was examinedusing the following measures:

(1) Success rate. Represented by the percentage of time analgorithm is able to reach a specific known Pareto-optimalpoint.

(2) Average CPU time. Defined as the arithmetic averageprocessing time, across all 100 tries, that an algorithmtakes to terminate.

(3) The number of criterion vector evaluations, NE.(4) The number of Pareto-dominance comparisons, NC.

For each of EPCEA, NEPCEA and PAPE, the only in-put parameter is the number of random starts. For the CE-TPLS, the primary input parameter is the number of weightpairs, which are generated by letting α1 ∼U (0,1) and α2 =

1−α1. We used 20 random starts of the coordinate exchangealgorithm to find each of the individually optimal designsin the algorithm’s first stage. For the MOGA, we exploreten different versions: 1-3 has a population size of 100 withthe number of generations either 10, 20, or 30, respectively;4-6 has a population size of 500 with the number of gen-erations either 10, 20 or 30, respectively; 7-9 uses a pop-ulation size of 1000 with the number of generations either10, 20 or 30, respectively; 10 means a population size of10000 with 30 generations. The stochastic operators used inMOGA are real-value encoded and include (a) cloning, witha fixed probability of 0.4; i.e., 40% of the non-dominated so-lutions from the previous generation will be selected to carryforward into the next generation; (b) single-point crossover,with a fixed probability of 0.7; and (c) uniform random mu-tation of size 2 for each experimental run, with a mutationrate of 0.3. This means that in each row, two coordinatesare selected and changed with probability 0.3. For details


(a) (b)

(c) (d)

Fig. 5 PF351, from 4 different perspectives.

regarding the tuning of GA input parameters, see Scrucca(2014); we also consulted the MATLAB code provided bythe author of Park (2009). Since we are considering algo-rithms that can be used routinely, an extensive tuning exper-iment is beyond the scope of our work.

The results, obtained by running R×64 3.0.3 under Win-dows 7 Professional on a 2.80 GHz machine with a AMDPhenomT M II X6 1055T Processor and 8.00 GB RAM, aresummarized in Table 3.

For the two-criteria example, only EPCEA, NEPCEAand PAPE can find the true Pareto front. The three proce-

dures are also the only ones, of those considered, that dohave only a single tuning parameter (the number or randomstarts), thus being more appealing for routine use. However,of the three, EPCEA and NEPCEA appear to be similarin terms of speed (by comparing CPU time) and reliabil-ity, both more computationally efficient than PAPE. Laterin Section 4.4, we will show that actually EPCEA requiresthe least computational evaluations and comparisons. Thisexample also provides evidence that just a small number ofrandom starts will yield a high quality solution for EPCEA.


Table 3 Comparison of algorithms on the two-criteria example.

Input Point 1 Point 2 Point 3 Point 4 Point 5 Average CPU NE NCAlgorithm Parameter Success rate Success rate Success rate Success rate Success rate time (sec) (thousands) (thousands)

5 1 1 1 1 0.69 0.35 2.38 0.97EPCEA 10 1 1 1 1 0.92 0.80 4.31 2.00

15 1 1 1 1 0.98 1.25 5.57 2.7720 1 1 1 1 1 1.68 7.99 4.12

5 1 0.99 0.41 0.88 0.27 0.08 0.60 0.2610 1 1 0.69 0.98 0.62 0.15 1.10 0.57

NEPCEA 15 1 1 0.79 0.99 0.75 0.24 1.58 0.7320 1 1 0.91 1 0.81 0.37 2.18 1.0580 1 1 1 1 1 1.91 8.78 4.41

5 1 1 1 1 0.71 0.90 1.92 1.7410 1 1 1 1 0.91 1.81 3.84 3.68

PAPE 15 1 1 1 1 0.99 2.71 5.95 6.3520 1 1 1 1 0.99 3.66 7.68 8.6625 1 1 1 1 1 4.53 9.60 10.05

500 0.94 0 0.73 0.99 0.79 14.12 57.05 1.941000 0.87 0 0.77 0.99 0.84 30.86 89.96 3.99

CE-TPLS 2000 0.92 0 0.79 1 0.82 59.88 181.47 7.953000 0.89 0 0.82 1 0.84 87.76 270.79 11.995000 0.91 0 0.83 0.98 0.81 147.21 504.18 24.99

1 0 0.09 0.15 0.03 0 0.252 0.01 0.11 0.12 0.02 0 0.493 0 0.05 0.23 0.01 0 0.724 0.02 0.24 0.19 0.06 0.01 1.23

MOGA 5 0.03 0.27 0.29 0.05 0.01 2.566 0.05 0.20 0.18 0.10 0.02 3.877 0.06 0.27 0.33 0.14 0.04 2.578 0.03 0.46 0.29 0.09 0 4.689 0.03 0.35 0.32 0.10 0.04 7.61

10 0.30 0.95 0.94 0.67 0.25 79.11

From Table 3, Point 2 appears to be unobtainable forCE-TPLS, which is not unexpected because this point is noton the convex curve connecting point 1 and points 3-5 (Dasand Dennis, 1997). Interestingly, increasing the number ofweight pairs will not necessarily lead to an increase in thesuccess rate. One can see that the success rates for points 1and 3-5 with 5000 different weight pairs are about the sameas with 500 or even 100 different weight pairs. Furthermore,even though CE-TPLS was marked as a computationally ef-ficient algorithm because it converts a multi-objective opti-mization problem into a single-objective one (Sambo et al.,2014), it turns out to be computationally inefficient com-pared to the exchange-based algorithms. The reason, as willbe shown in Section 4.4 is due to this heuristic requires morenumber of evaluations and comparisons during the searchingprocess.

As expected, it is difficult for the MOGA to find thetrue Pareto front, though there are two findings worth not-ing. First, for given probabilities of cloning, crossover andmutation, if the population size is fixed, then increasing thenumber of generations does not translate into a notable dif-ference in the success rate. This can been seen by observingthat the patterns of success rates among versions 1-3, 4-6,

and 7-9 are nearly flat. Second, for given probabilities ofcloning, crossover and mutation, a simultaneous increase inthe population size and number of generations may translateinto a significant increase in the success rate. To summa-rize, the empirical results suggest that EPCEA and NEPCEAare superior to PAPE in efficiency, and superior to the othermethods in both efficiency and effectiveness. We note thatCE-TPLS and MOGA might be improved with additionaltuning; however, this is an issue requiring careful thoughtand likely trial and error. The proposed algorithms, as wellas PAPE, have the advantage of being simpler in this respect.

4.2.2 Three-criteria benchmark example

For the three-criteria example, we investigate the followingseven measures of quality, for an algorithm with a particu-lar input parameter setting: (1) the number of generated de-signs in the Pareto front, denoted by |PF |; (2) the number oftrue Pareto optimal solutions in the Pareto front, denoted by|PF

′ |; (3) the proportion of the designs in the Pareto frontthat are true Pareto optimal solutions; (4) the contributionrate, defined as the ratio between the hypervolume enclosedby the standardized true Pareto optimal solutions (i.e. thesolutions scaled to be between 0 and 1) obtained by the al-


gorithm and the hypervolume enclosed by the standardizedPF351; (5) the CPU time (minutes) required for an algorithmto reach the stopping criterion; (6) NE; and (7) NC. Note thatthe hypervolume measure requires a user-defined referencepoint, which we choose to be r=(−0.0031,−0.0031,−0.0031)T ,leading to a hypervolume of 0.6157 for the standardizedPF351; for justification and details see Cao et al. (2015).

Once again, the input parameter for EPCEA, NEPCEAand PAPE is the number of random starts. The primary in-put parameter for CE-TPLS is the number of weight pairs.We used 50 random starts of the coordinate exchange algo-rithm to obtain the three individually optimal designs. ForMOGA, 1-6 represent the population sizes are 1000, 3000,5000, 10000, 30000 and 50000 respectively. The operatorsof clone, crossover and mutation are defined in the same wayas in the two-criterion example but with probabilities of 0.4,0.7, and 0.2, respectively. The number of generations is 30for all versions of the MOGA algorithm.

Note that one has to be careful when generating weightsin high dimensions for the CE-TPLS procedure. For instance,Smith and Tromble (2004) shows that generating α1∼U (0,1),α2 ∼U (0,1) and α3 = 1−α1−α2 with the constraint thatα1+α2 < 1 cannot guarantee the generated weights are uni-formly distributed across the weight space. Thus, we usethe ‘simplex.sample’ function from the R package ‘hitan-drun’ (Gert van and Tommi, 2015) found that the weightsappeared to be uniformly distributed across the simplex.

The comparison results are presented in Table 4, whichclearly shows that EPCEA is the most reliable and computa-tionally efficient algorithm for this larger, three-dimensionalproblem. We want to emphasize that due to the elitist opera-tor, EPCEA doesn’t require a large number of random starts;for instance, with only five random starts EPCEA finds 334true Pareto optimal solutions among which 321 (96.11%)are global Pareto optimal and these solutions cover 99.59%of the hypervolume of PF351. It is worth noting that, eventhough for a random start we let the searching process stopwhen Pg

i = Pg−1i , the convergence is quite speedy. For in-

stance, with five random starts, the algorithm takes aboutthree minutes and with 100 random starts, the algorithmtakes about 60 minutes, implying that the time increases lin-early with the number of random starts. To further study theempirical convergence properties of EPCEA, we performeda small experiment in which we fixed the number of ran-dom starts and then increased the number of iterations oneby one. The results are displayed in Figure 6. It can be seenthat, no matter how many random starts are used, the eli-tist operator will always improve the searching ability, andthis is especially true when the number of random starts issmall. Furthermore, for this specific example the algorithmwill converge at g = 6.

This example suggests that the non-elitist algorithm isinferior. It takes the PAPE algorithm 10000 random startsand more than 60 times as long to produce roughly the samenumber of true Pareto optimal solutions with a smaller con-tribution rate as EPCEA with five random starts. Note thatthe proportion of true Pareto optimal solutions does not ex-actly comport with the contribution rate because more im-portant than the number of true points found is how wellthey are spread out. Clearly, the EPCEA does a better jobthan PAPE and CE-TPLS of characterizing the true Paretofront, based upon the contribution rate. MOGA is not com-petitive.

Note that the CE-TPLS finds only a small proportion ofthe true solutions, but that these solutions produce a highcontribution rate. This is because this algorithm starts withfinding the single-criterion optimal solutions, which makeswide-span Pareto fronts with large gaps. MOGA is not ableto find any true Pareto optimal solutions with the current pa-rameter settings due to poor convergence ability.

Based on these two examples, the EPCEA is preferredto all the other algorithms in terms of its high search abil-ity and computational efficiency, particular as the scale ofthe problem increases and the shape of the front becomesmore complicated. The results, while highly suggestive, arenot comprehensive as the recommendation is made basedupon the performance of these procedures for only these twobenchmark examples.

4.3 Running time analysis

Rigorous running time analysis is important for comparingthe performance of competing algorithms. We believe thatthe CPU time, as reported in Tables 3 and 4, give a reason-able representation of the computational efficiency of thevarious algorithms in this paper. Nevertheless, we presenttwo other measures, the number of criterion vector evalu-ations (NE) and the number of Pareto-dominance compar-isons (NC) that may shed additional light on several of thealgorithms we’ve considered. When overhead and code effi-ciency considerations are removed, the running time of EPCEA,NEPCEA, PAPE and CE-TPLS can be partitioned into NEand NC.

EPCEA, due to its elitist-like operator, outperforms NEPCEA,PAPE and CE-TPLS in these basic computational units. Forthe two-dimensional example, EPCEA finds the true Paretofront with 7993 number of criterion vector evaluations and4118 Pareto-dominance comparisons, fewer than for the otheralgorithms. For the three-dimensional example, EPCEA isable to find almost 99% of the true Pareto-optimal designswhile contributing up to 99.6% of the hypervolume enclosed


Table 4 Comparison of algorithms on the three-criteria example.

Algorithm Input Parameter |PF | |PF′ | |PF

′ ||PF | |CR(PFi,PF351) | CPU time (min) NE (millions) NC (millions)

5 334 321 96.11% 99.59% 3.11 0.27 1.1410 337 327 97.03% 99.60% 6.22 0.55 2.28

EPCEA 30 344 342 99.42% 99.63% 17.28 1.75 7.2350 346 345 99.71% 99.68% 29.52 2.76 11.54

100 348 347 99.71% 99.70% 60.44 5.61 22.91100 72 0 0% 0% 0.09 0.01 0.10

1000 123 24 19.51% 55.07% 1.01 0.18 0.87NEPCEA 3000 178 80 44.94% 87.72% 3.43 0.54 2.68

5000 200 111 55.5% 93.69% 5.83 0.90 4.4410000 249 164 65.86% 94.84% 11.86 1.82 8.9850000 311 281 90.35% 97.90% 67.40 9.07 44.94

50 118 61 51.69% 53.17% 0.90 0.06 1.31100 166 78 46.99% 54.04% 1.72 0.12 2.63500 264 215 81.44% 93.78% 9.54 0.62 13.74

PAPE 1000 290 252 86.90% 94.23% 19.52 1.23 27.295000 315 299 94.92% 97.03% 98.65 6.27 138.32

10000 322 309 95.96% 97.09% 188.68 12.48 275.171000 30 15 50% 86.89% 0.65 0.18 0.023000 40 16 40% 86.92% 1.94 0.64 0.095000 44 20 45.45% 88.44% 3.27 0.87 0.12

CE-TPLS 10000 49 25 51.02% 89.65% 7.13 1.74 0.2650000 56 27 48.21% 89.90% 37.46 9.54 1.70

100000 59 29 49.15% 92.07% 78.09 14.86 2.091 29 0 0% 0% 0.232 30 0 0% 0% 0.783 46 0 0% 0% 1.33

MOGA 4 46 0 0% 0% 3.025 60 0 0% 0% 9.916 65 0 0% 0% 17.78

Fig. 6 Illustration of effectiveness and convergence speed of EPCEA. The notation “10s2i" means run EPCEA with 10 random starts for each startthe Pareto front will be iterated for 2 times. The other notations are defined in the similar manner.


by the true Pareto front with NE= 268,137, NC= 1,143,144.The other algorithms are not competitive in terms of compu-tational efficiency, particularly when contribution rate is ac-counted for. Note that since we did not program the MOGAalgorithm ourselves, we were not able to compute NE andNC for this procedure.

5 Applications to response surface designs

In this section we apply EPCEA to build a catalog of simul-taneous G- and I-optimal designs for second-order responsesurface models. This catalog is not only useful for practition-ers to use directly, but also useful for researchers to use asbenchmark examples when modifying existing algorithmsor developing new ones.

Park (2009) described the use of a genetic algorithm forconstructing G- and I- optimal designs for the second-orderresponse surface model on k variables given by:

Y = β0 +k

∑i=1

βixi +k

∑i=1

k

∑j=i+1

βi jxix j +k

∑i=1

βiix2i + ε

where Y is the measured response, β0 is the intercept, βiare the coefficients of the first-order terms, βii and βi j arethe coefficients of the pure quadratic terms and the two-factor interaction terms, ε represents the experimental er-ror and is assumed to be NID(0,1). The G-optimal designin this example is defined as the design that minimizes themaxx⊂R

(NxT (XTX

)−1 x)

and the I-optimal design minimizes∫R NxT

(XTX

)−1 xdx, where xT is a vector of p real-valuedfunctions of the factors x1, . . . ,xk based on the model terms,N is the number of runs and R is the design region. For thesecond-order response surface model, we have

xT =(1,x1, . . . ,xk,x1x2, . . . ,xk−1xk,x2

1, . . . ,x2k).

In this section, we apply EPCEA to generate second-order response surface designs that satisfy the G- and I-optimality simultaneously over a cuboidal region for k =

3,4,5 factors and each k is associated with a several designsizes. The results for k = 3 factors are presented in Table 5due to the small number of Pareto optimal designs for eachcase. The cardinality of the Pareto front generally increasesas the number of factors and number of runs increase. So fork = 4,5 factors, we visualize the Pareto fronts in Figure 7.Due to the irregular geometry of all the Pareto fronts, we didnot choose to use the CE-TPLS. Also, due to the large scaleof the designs, we did not choose to use PAPE. In both Ta-ble 5 and Figure 7, we highlight a few criterion vectors thatare also found by the MOGA in Park (2009). It can be seenthat most of the elements of the Pareto fronts generated byEPCEA cannot be found by MOGA.

Table 5 Results of a catalog of Pareto fronts for second-order responsesurface designs. Bold entries denote Pareto optimal solutions that arealso found by the MOGA.

Pareto front

k N I-efficiency G-efficiency

3 13 100% 100%14 100% 100%15 100% 99.23%

99.75% 100%17 100% 96.17%

99.72% 96.44%99.42% 96.73%99.23% 96.96%99.16% 98.44%97.65% 98.97%95.58% 100%

18 100% 87.82%98.87% 88.16%98.8% 98.44%97.6% 100%

6 Discussion

As computing technologies and infrastructures become moreand more powerful, it is not enough to develop algorithmsthat can efficiently generate small, regular designs. Instead,algorithms are desired to reliably and quickly construct op-timal designs in complicated, multi-criteria, and large-scalesettings. With this in mind, we propose a hybrid elitist co-ordinate exchange algorithm for constructing multi-criteriaexperimental designs, and illustrate it using two benchmarkexamples and an exploration of response surface experiments.We find that our proposed algorithm is reliable and compu-tationally efficient, showing the most benefit compared tocompeting algorithms when the experiments are increasedin size. Furthermore, the only input parameter for this al-gorithm is the number of random starts, and this simplicityallows both its use in practical situations and its adaptationto high performance computing technologies, such as paral-lel computing, grid computing and cloud computing.

EPCEA works very well in the examples we consid-ered in this paper, but it is important to be aware that weare not making more general claims of its effectiveness. TheNo Free Lunch theorems for optimization state that there isno algorithm which can be the best for all problems in thesame category (Wolpert and Macready, 1997), so it is cer-tainly possible that in particular situations the EPCEA maynot perform particularly well. The best resort in these casesis to develop customized hybrid algorithms. To be specific,one needs to find a domain-specific computational opera-tor that combines effectively with an operator from a gen-eral evolutionary algorithmic method. For example, the co-ordinate exchange operator is a powerful operator for gen-


(a) 4-factor, 19-run (b) 4-factor, 21-run

(c) 4-factor, 24-run (d) 4-factor, 29-run

(e) 4-factor, 41-run (f) 5-factor, 26-run

(g) 5-factor, 30-run (h) 5-factor, 31-run

(i) 5-factor, 41-run

Fig. 7 Pareto fronts for second-order response surfaces designs: I-efficiency vs. G-efficiency. (The blue point means this design was found by bothEPCEA and MOGA.)


erating exact experimental designs, and the elitism opera-tor is important in the general evolutionary optimization ap-proach. As shown in this article, these operators can be com-bined into an effective tool for constructing multi-criterionoptimal designs. However, it is likely that EPCEA can befurther improved. As we discussed, MOGA is a very ba-sic EMO algorithm. If a more sophisticated biologically-inspired algorithm was considered (e.g. the non-dominatedsorting Genetic algorithm-II, particle swarm optimization,immune-based approaches, etc.; see Zhou et al. (2011)), theymay provide ideas and operators to construct even better al-gorithms.

Acknowledgements The authors would like to thank Drs. ChristineAnderson-Cook, Lu and Park for sharing their examples. The first au-thor is also grateful to Dr. Nancy Flournoy for her valuable suggestionsand comments.

References

Ajith, A., Crina, G., Hisao, I.: Hybrid Evolutionary Algo-rithms. Springer, Berlin Heidelberg (2007)

Borkowski, J.J.: Using a genetic algorithm to generate smallexact response surface designs. Journal of Probability andStatistical Science 1(1), 65–88 (2003)

Bursztyn, D., Steinberg, D.M.: Comparison of designs forcomputer experiments. Journal of Statistical Planning andInference 136, 1103–1119 (2006)

Cao, Y., Smucker, B.J., Robinson, T.J.: On using the hyper-volume indicator to compare pareto fronts: Applicationsto multi-criteria optimal experimental design. Journal ofStatistical Planning and Inference 160, 60–74 (2015)

Cook, D.R., Nachtsheim, C.J.: A comparison of algorithmsfor constructing exact d-optimal designs. Technometrics22(3), 315–324 (1980)

Das, I., Dennis, J.E.: A closer look at drawbacks of minimiz-ing weighted sums of objectives for pareto set generationin multicriteria optimization problems. Struct. Optim. 14,63–69 (1997)

Elhossini, A., Areibi, S., Dony, R.: Strength pareto particleswarm optimization and hybrid ea-pso for multi-objectiveoptimization. Evolutionary Computation 18, 127–156(2010)

Farrell, R.H., Kiefer, J., Walbran, A.: Optimum multivariatedesigns. In Proceedings of the fifth Berkeley symposiumon mathematical statistics and probability. LM LeCamand J. Neyman, University of California Press 1, 113–138(1967)

Fedorov, V.V.: Theory Of Optimal Experiments. ElsevierScience (1972)

Gert van, V., Tommi, T.: “hit and run" and “shake and bake"for sampling uniformly from convex shapes. R Package(2015)

Jones, B., Nachtsheim, C.J.: Efficient designs with minimalaliasing. Technometrics 53, 62–71 (2011)

Kiefer, J., , Wolfowitz, J.: Optimum designs in regressionproblems. The Annals of Mathematical Statistics 30,271–294 (1959)

Kiefer, J.: Optimum experimental designs. Journal of theRoyal Statistical Society. Series B (Methodological) 21,272–319 (1959)

Kiefer, J.: Optimum designs in regression problems, ii. TheAnnals of Mathematical Statistics 32, 298–325 (1961)

Limmun, W., Borkowski, J.J., Chomtee, B.: Using a geneticalgorithm to generate d-optimal designs for mixture ex-periments. Quality and Reliability Engineering Interna-tional 29(7), 1099–1638 (2013)

Lu, L., Anderson-Cook, C.M., Robinson, T.J.: Optimizationof designed experiments based on multiple criteria uti-lizing a pareto frontier. Technometrics 53(4), 353–365(2011)

Mashwani, W.K.: Hybrid multi-objective evolutionary al-gorithms: A survey of the state-of-the-art. InternationalJournal of Computer Science Issues 8, 374–392 (2011)

Meyer, R.K., Nachtsheim, C.J.: The coordinate-exchange al-gorithm for constructing exact optimal experimental de-signs. Technometrics 37(1), 60–69 (1995)

Park, Y.J.: Multi-optimal designs for second-order responsesurface models. Communications of the Korean Statisti-cal Society 16(1), 195–208 (2009)

Sambo, F., Borrotti, M., Mylona, K.: A coordinate-exchangetwo-phase local search algorithm for the d- and i-optimaldesigns of split-plot experiments. Comput. Statist. DataAnal. 71, 1193–1207 (2014)

Scrucca, L.: R package ‘ga’ (2014)Sexton, C.J., Anthony, D.K., Lewis, S.M., Please, C.P.,

Keane, A.J.: Design of experiment algorithms for assem-bled products. Journal of Quality Technology 38, 298–308 (2006)

Sindhya, K., Miettinen, K., Deb, K.: A hybrid framework forevolutionary multi-objective optimization. EvolutionaryComputation, IEEE Transactions on 17, 495–511 (2013)

Smith, N.A., Tromble, R.W.: Sampling uniformly from theunit simplex. Johns Hopkins University, Tech. Rep pp.1–6 (2004)

Wang, L., Wu, H., Tang, F., Zheng, D.Z.: A hybrid quantum-inspired genetic algorithm for flow shop scheduling. In:Huang, D.S., Zhang, X.P., Huang, G.B. (eds.) Advancesin Intelligent Computing, pp. 636–644. Springer, Heidel-berg (2005)

Wolpert, D.H., Macready, W.G.: No free lunch theorems foroptimization. Evolutionary Computation, IEEE Transac-tions on 1, 67–82 (1997)


Yang, D., Jiao, L., Gong, M.: Adaptive multi-objective op-timization based on nondominated solutions. Computa-tional Intelligence 25, 84–108 (2009)

Zhou, A., Qu, B.Y., Li, H., Zhao, S.Z., Suganthan, P.N.,Zhang, Q.: Multiobjective evolutionary algorithms: A sur-vey of the state of the art. Swarm and Evolutionary Com-putation 1, 32–49 (2011)

A Hybrid Elitist Pareto-based Coordinate Exchange ..._cao_etal.pdf · A Hybrid Elitist Pareto-based...

Documents

Transcript of A Hybrid Elitist Pareto-based Coordinate Exchange ..._cao_etal.pdf · A Hybrid Elitist Pareto-based...