An Optimality Theory Based Proximity Measure for Set …kdeb/papers/c2014015.pdf · 1 An Optimality...

17
1 An Optimality Theory Based Proximity Measure for Set Based Multi-Objective Optimization Kalyanmoy Deb, Fellow, IEEE and Mohamed Abouhawwash Department of Electrical and Computer Engineering Computational Optimization and Innovation (COIN) Laboratory Michigan State University, East Lansing, MI 48824, USA Email: [email protected], [email protected] COIN Report Number 2014015 Abstract—Set-based multi-objective optimization methods, such as evolutionary multi-objective optimization (EMO) meth- ods, attempt to find a set of Pareto-optimal solutions, instead of a single optimal solution. To evaluate these algorithms from their convergence to the efficient set, the current performance metrics require the knowledge of the true Pareto-optimal solutions. In this paper, we develop a theoretically-motivated KKT proximity measure that can provide an estimate of the proximity of a set of trade-off solutions from the true Pareto-optimal solutions without any prior knowledge of them. Besides theoretical development of the proposed metric, the proposed KKT proximity measure is computed for iteration-wise trade-off solutions obtained from specific EMO algorithms on two, three, five, and 10-objective optimization problems. Results amply indicate the usefulness of the proposed KKT proximity measure as a metric for evaluating different sets of trade-off solutions and also as a possible termination criterion for an EMO algorithm. Other possible uses of the proposed metric are also highlighted. I. I NTRODUCTION Multi-objective optimization problems give rise to a set of trade-off optimal solutions, known as Pareto-optimal solutions [1], [2]. In solving these problems, one popular approach has been to first find a representative set of Pareto-optimal solutions and then use higher-level information involving one or more decision-makers to choose a preferred solution from the set. Evolutionary multi-objective optimization (EMO) methodologies follow this principle of solving multi-objective optimization problems [2] and have received extensive atten- tion for the past two decades. Many books, journal and con- ference papers, and public-domain and commercial softwares are available for make EMO an emerging field of research and application. From an optimization point of view, the main task of a set based optimization methodology is to find, not one, but a set of trade-off solutions at the end of a simulation run. Since mul- tiple solutions are targets for a set based methodology, it has always been a difficulty to evaluate the performance of such algorithms and therefore develop a termination criterion for ending a simulation run. In the EMO literature, for example, several performance metrics, such as hypervolume measure [3], [4] and inverse generational distance (IGD) measure [5], are suggested, but they are not appropriate particularly for the purpose of terminating a simulation run. For the hypervolume measure, there is no pre-defined target that can be set for an arbitrary problem, thereby making it difficult to determine an expected hypervolume value for terminating a run. On the other hand, the IGD measure requires the knowledge of true Pareto-optimal solutions and their corresponding objective values and hence is not applicable to an arbitrary problem. For terminating a simulation run, it is ideal if some knowl- edge about the proximity of the current solution(s) from the true optimal solution(s) can be obtained. For single- objective optimization algorithms, recent studies on approx- imate Karush-Kuhn-Tucker (A-KKT) points have been sug- gested [6], [7], [8], [9], [10]. The latter studies have also suggested a KKT proximity measure that monotonically re- duced to zero as the iterates approach a KKT point for a single-objective constrained optimization problem. On many standard constrained test problems, the study has shown that the population-best solutions of a real-parameter genetic al- gorithm (RGA) – an evolutionary optimization procedure [2] – demonstrate a reducing trend to zero with RGA iterations. In this paper, we extend the definition of an A-KKT point for multi-objective optimization using the achievement scalarizing function concept proposed in the multiple criterion decision making (MCDM) literature and suggest a KKT proximity measure that can suitably used for evaluating an EMO’s performance in terms of convergence to the KKT points. The proposed KKT proximity measure can also be used as a termination criterion for an EMO. The proposed KKT proximity measure provides an uniform value for solutions on a non-dominated front parallel to the true efficient front, thereby providing an equal metric value near Pareto-optimal solutions. In the remainder of the paper, we first describe the principle of classical point-by-point and set based methods for solving multi-objective optimization problems in Section II. There- after, we present the KKT proximity measure concept devel- oped for single-objective optimization problems in Section III. Section IV develops the KKT proximity measure concept for multi-objective optimization by treating every trade-off solution as an equivalent achievement scalarizing problem. The concept is then applied on an illustrative problem to test its working. In Section V, the KKT proximity measure is computed for the entire objective space on a well-known two-

Transcript of An Optimality Theory Based Proximity Measure for Set …kdeb/papers/c2014015.pdf · 1 An Optimality...

1

An Optimality Theory Based Proximity Measure forSet Based Multi-Objective Optimization

Kalyanmoy Deb, Fellow, IEEE and Mohamed AbouhawwashDepartment of Electrical and Computer Engineering

Computational Optimization and Innovation (COIN) LaboratoryMichigan State University, East Lansing, MI 48824, USAEmail: [email protected], [email protected]

COIN Report Number 2014015

Abstract—Set-based multi-objective optimization methods,such as evolutionary multi-objective optimization (EMO) meth-ods, attempt to find a set of Pareto-optimal solutions, instead ofa single optimal solution. To evaluate these algorithms from theirconvergence to the efficient set, the current performance metricsrequire the knowledge of the true Pareto-optimal solutions. Inthis paper, we develop a theoretically-motivated KKT proximitymeasure that can provide an estimate of the proximity of a set oftrade-off solutions from the true Pareto-optimal solutions withoutany prior knowledge of them. Besides theoretical developmentof the proposed metric, the proposed KKT proximity measureis computed for iteration-wise trade-off solutions obtained fromspecific EMO algorithms on two, three, five, and 10-objectiveoptimization problems. Results amply indicate the usefulness ofthe proposed KKT proximity measure as a metric for evaluatingdifferent sets of trade-off solutions and also as a possibletermination criterion for an EMO algorithm. Other possible usesof the proposed metric are also highlighted.

I. INTRODUCTION

Multi-objective optimization problems give rise to a set oftrade-off optimal solutions, known as Pareto-optimal solutions[1], [2]. In solving these problems, one popular approachhas been to first find a representative set of Pareto-optimalsolutions and then use higher-level information involvingone or more decision-makers to choose a preferred solutionfrom the set. Evolutionary multi-objective optimization (EMO)methodologies follow this principle of solving multi-objectiveoptimization problems [2] and have received extensive atten-tion for the past two decades. Many books, journal and con-ference papers, and public-domain and commercial softwaresare available for make EMO an emerging field of research andapplication.

From an optimization point of view, the main task of a setbased optimization methodology is to find, not one, but a set oftrade-off solutions at the end of a simulation run. Since mul-tiple solutions are targets for a set based methodology, it hasalways been a difficulty to evaluate the performance of suchalgorithms and therefore develop a termination criterion forending a simulation run. In the EMO literature, for example,several performance metrics, such as hypervolume measure[3], [4] and inverse generational distance (IGD) measure [5],are suggested, but they are not appropriate particularly for thepurpose of terminating a simulation run. For the hypervolume

measure, there is no pre-defined target that can be set for anarbitrary problem, thereby making it difficult to determinean expected hypervolume value for terminating a run. Onthe other hand, the IGD measure requires the knowledge oftrue Pareto-optimal solutions and their corresponding objectivevalues and hence is not applicable to an arbitrary problem.

For terminating a simulation run, it is ideal if some knowl-edge about the proximity of the current solution(s) fromthe true optimal solution(s) can be obtained. For single-objective optimization algorithms, recent studies on approx-imate Karush-Kuhn-Tucker (A-KKT) points have been sug-gested [6], [7], [8], [9], [10]. The latter studies have alsosuggested a KKT proximity measure that monotonically re-duced to zero as the iterates approach a KKT point for asingle-objective constrained optimization problem. On manystandard constrained test problems, the study has shown thatthe population-best solutions of a real-parameter genetic al-gorithm (RGA) – an evolutionary optimization procedure [2]– demonstrate a reducing trend to zero with RGA iterations.In this paper, we extend the definition of an A-KKT point formulti-objective optimization using the achievement scalarizingfunction concept proposed in the multiple criterion decisionmaking (MCDM) literature and suggest a KKT proximitymeasure that can suitably used for evaluating an EMO’sperformance in terms of convergence to the KKT points.The proposed KKT proximity measure can also be usedas a termination criterion for an EMO. The proposed KKTproximity measure provides an uniform value for solutionson a non-dominated front parallel to the true efficient front,thereby providing an equal metric value near Pareto-optimalsolutions.

In the remainder of the paper, we first describe the principleof classical point-by-point and set based methods for solvingmulti-objective optimization problems in Section II. There-after, we present the KKT proximity measure concept devel-oped for single-objective optimization problems in Section III.Section IV develops the KKT proximity measure conceptfor multi-objective optimization by treating every trade-offsolution as an equivalent achievement scalarizing problem.The concept is then applied on an illustrative problem to testits working. In Section V, the KKT proximity measure iscomputed for the entire objective space on a well-known two-

2

objective ZDT test problems [11] to illustrate that as non-dominated solutions approach the Pareto-optimal set front-wise, the KKT proximity measure reduces to zero. Thereafter,KKT proximity measure is computed for trade-off sets foundusing specific EMO algorithms, such as NSGA-II [12] andNSGA-III [5], [13] on two, three, five and 10-objective testproblems. Three engineering design problems are also used forthe above purpose. The use of the proposed KKT proximitymeasure as a termination criterion is then evaluated on allthe above problems in Section VI by performing an extensivecomputer simulation study. The section also discusses a num-ber of viable extensions of this study. Finally, conclusions aredrawn in Section VII.

II. MULTI-OBJECTIVE OPTIMIZATION

Let us consider a n-variable, M -objective optimizationproblem with J inequality constraints:

Minimize(x) {f1(x), f2(x), . . . , fM (x)},Subject to gj(x) ≤ 0, j = 1, 2, . . . , J.

(1)

Variable bounds of the type x(L)i ≤ xi ≤ x

(U)i can be splitted

into two inequality constraints: gJ+2i−1(x) = x(L)i − xi ≤ 0

and gJ+2i(x) = xi − x(U)i ≤ 0. Thus, in the presence of

all n pairs of specified variable bounds, there are a total ofJ+2n inequality constraints in the above problem. We do notconsider equality constraints in this paper, but our analysis canbe easily modified to consider them as well.

For a multi-objective optimization, an optimal solution(known as a Pareto-optimal solution) is defined as a solutionthat is not worse than any other solution in the search spacein all objectives and is strictly better in at least one objective.For a problem having multiple conflicting objectives, usuallya number of Pareto-optimal solutions, instead of one, exists.When their objective vectors are computed, they often lineup as a front in the M -dimensional objective space. Therespective front is called the efficient front [1], [2].

A. Classical Point-by-Point Based Methods

Multi-objective optimization problems are traditionallysolved by converting them into a parametric scalarized single-objective optimization problem [1], [14]. In the concept ofgenerative multi-objective optimization methods, the resultingsingle-objective optimization problem is expected to be solvedmultiple times for different scalarizing parameter values. Mostof the scalarized single-objective optimization methods areexpected to find a weak or a strict Pareto-optimal solution (seeFigure 1), provided the chosen numerical optimization methodis able to find the true optimum of the resulting problem.However, very few methods have the converse property – findevery possible Pareto-optimal solution using certain parametervalues of the scalarization process. For example, the well-known weighted-sum approach cannot find any Pareto-optimalsolution which lies on the non-convex part of the efficientfrontier, whereas the ε-constraint [14] or the normal constraintmethod [15] can, in principle, find any Pareto-optimal solution.

However, the main disadvantage of the generative ap-proaches is that they are computationally inefficient, as one

application using a particular parameter value of scalarizationdoes not help in solving other applications with different pa-rameter values. This is because each parametric application isan independent optimization process and each Pareto-optimalsolution needs to be obtained independently in separate runsand without any help from other runs. A recent study [16] hasclearly shown that an evolutionary multi-objective optimiza-tion (EMO) approach that attempts to find multiple Pareto-optimal solutions simultaneously in a single simulation runcan constitute a parallel search of finding multiple solutionssimultaneously and is often computationally faster than thegenerative methods. In the following subsection, we discussthis principle of set-based multi-objective optimization meth-ods for solving multi-objective optimization problems.

B. Set Based Multi-Objective Optimization Methods

Instead of finding a single Pareto-optimal solution at atime, set based multi-objective optimization methods, suchan evolutionary multi-objective optimization (EMO) method,attempts to find a set of Pareto-optimal solutions in a singlesimulation run [2]. An EMO’s goal is two-fold:

1) Find a set of well-converged set of trade-off solutions,and

2) Find a set of well-diversified set of solutions across theentire efficient front.

The population approach of an EMO and the flexibilities inits operators allow both the above goals to be achieved formulti-objective problems. EMO algorithms, such as NSGA-II[12] and SPEA2 [17] are well-practiced methods for handlingtwo and three-objective problems. Recent extensions (such asNSGA-III [5] and MOEA/D [18]) are able to handle four andmore objectives (even more than 10 objectives). The topicis so important today that handling such large number ofobjectives is given a separate name – evolutionary many-objective optimization.

Figure 1 illustrates the principle of set based multi-objectiveoptimization. In each iteration, a set of points gets operated

Weak Iteration t

Iteration t+1

Efficientfront

(Strict)

frontEfficient

Fig. 1. Principle of set based multi-objective optimization procedure isillustrated.

byan EMO procedure and a new set of points – hopefullytowards the strict efficient front and having more diversity– are generated. This process is expected to continue with

3

iterations before converging on the true efficient front coveringits entire range.

When a set based multi-objective optimization algorithmtransits from one set of non-dominated points to hopefully abetter set of non-dominated points, there is a need to evaluatethe convergence and diversity of the new set of solutions vis-a-vis the old solutions. This is because, if we can measureboth aspects well through one or more performance metrics,they can also be used to determine the termination of theoptimization algorithm. In this paper, we do not consider thediversity aspect much and concentrate on defining a theoreticalKKT optimality based convergence measure which will enableus to achieve two aspects:

1) Provide a theoretical underpinning of the working of analgorithm, and

2) Establish a theoretically motivated termination criterionfor stopping an algorithm.

III. KKT PROXIMITY MEASURE FOR SINGLE-OBJECTIVEOPTIMIZATION

Dutta et al. [9] defined an approximate KKT solution tocompute a KKT proximity measure for any iterate xk for asingle-objective optimization problem of the following type:

Minimize(x) f(x),Subject to gj(x) ≤ 0, j = 1, 2, . . . , J.

(2)

After a long theoretical calculation, they suggested a procedureof computing the KKT proximity measure for an iterate (xk)as follows:

Minimize(εk,u) εkSubject to ‖∇f(xk) +

∑Jj=1 uj∇gj(x

k)‖2 ≤ εk,∑Jj=1 ujgj(x

k) ≥ −εk,

uj ≥ 0, ∀j,(3)

However, that study restricted the computation of the KKTproximity measure for feasible solutions only. For an infeasibleiterate, they simply ignored the computation of the KKTproximity measure. However, a little thought on the associatedKKT optimality conditions reveal the following two additionalconditions associated with equality constraints:

gj(xk) ≥ 0, j = 1, 2, . . . , J, (4)

ujgj(xk) = 0, j = 1, 2, . . . , J. (5)

The first constraint set makes sure that the iterate xk is feasibleand the second constraint set ensures that the complementaryslackness condition is satisfied. However, the complementaryslackness condition is relaxed in the above KKT proximitymeasure formulation (Equation 3). In the above problem, thevariables are (εk,u). The KKT proximity measure was thendefined as follows:

KKT Proximity Measure(xk) = ε∗k, (6)

where ε∗k is the optimal objective value of the problem statedin Equation 3.

IV. KKT PROXIMITY MEASURE FOR MULTI-OBJECTIVEOPTIMIZATION

For the M -objective optimization problem stated in Equa-tion 1, the Karush-Kuhn-Tucker (KKT) optimality conditionsare given as follows [19], [1]:

M∑k=1

λk∇fk(xk) +

m∑j=1

uj∇gj(xk) = 0, (7)

gj(xk) ≤ 0, j = 1, 2, . . . , J, (8)

ujgj(xk) = 0, j = 1, 2, . . . , J, (9)

uj ≥ 0, j = 1, 2, . . . , J, (10)λk ≥ 0, k = 1, 2, . . . ,M, and λ �= 0. (11)

Multipliers λk are non-negative, but at least one of themmust be non-zero. The parameter uj is called the Lagrangemultiplier for the j-th inequality constraint and they are alsonon-negative. Any solution xk that satisfies all the aboveconditions is called a KKT point [20]. Equation 7 is called theequilibrium equation. Equation 9 for every constraint is calledthe complimentary slackness equation, as mentioned above.Note that the conditions given in 8 ensure feasibility for xk

while the equation 10 arises due to the minimization natureof the problem given in equation 2.

A. A Naive Measure from KKT Optimality Conditions

The above KKT conditions can be used to naively definea KKT error measure. For a given iterate xk, the parametersλ-vector and u-vector are unknown. Different values of theseparameters will cause the above conditions to be either satis-fied completely or not. We can propose a method to identifysuitable λ and u-vectors so that the all inequality constraints(conditions 8, 9, 10 and 11) are satisfied and the equilibriumcondition (7) is violated the least. Thus, we formulate anoptimization procedure to find such optimal (λ, u)-vectors:

Minimize(λ,u) ‖∑Mk=1 λk∇fk(x

k) +∑m

j=1 uj∇gj(xk)‖,

Subject to gj(xk) ≤ 0, j = 1, 2, . . . , J,

ujgj(xk) = 0, j = 1, 2, . . . , J,

uj ≥ 0, j = 1, 2, . . . , J,λk ≥ 0, k = 1, 2, . . . ,M, and λ �= 0.

(12)The operator ‖ · ‖ can be the L2-norm. It is clear that if xk

is a KKT point, the norm would be zero for the associatedand feasible (λ,u)-vectors. For a non-KKT but feasible xk

point, the norm need not be zero, but we are interested ininvestigating if the violation of the KKT conditions can beused a convergence metric in the neighborhood of a KKTpoint. To achieve this, we attempt to find a suitable (λ,u)-vector that will minimize the above norm. We then explore ifthis minimum norm value can behave as an indicator to howclose a point is from a suitable Pareto-optimal point. Thus, aKKT Error for a feasible iterate xk is defined by solving theabove problem and computing the following error value:

KKT Error(xk) = ‖M∑k=1

λ∗k∇fk(x

k) +

m∑j=1

u∗j∇gj(x

k)‖,(13)

4

where λ∗k and u∗

j are optimal values of the problem given inEquation 12.

In order to investigate whether the above KKT error can beused as a metric for evaluating closeness of an iterate xk toa Pareto-optimal solution, we consider a simple bi-objectiveoptimization problem, given as follows:

P1 :

{Minimize(x,y) {x, 1+y

1−(x−0.5)2 },Subject to 0 ≤ (x, y) ≤ 1.

(14)

The Pareto-optimal solutions are given as follows: y∗ = 0 andx∗ ∈ [0, 0.5]. The efficient front can be represented as f∗

2 =1/(1− (f∗

1 − 0.5)2)). We attempt to compute the above KKTerror for any given point or iterate xk = (xk, yk)T exactly bysolving the following optimization problem, considering twoinequality constraints g1(x) = −xk ≤ 0 and g2(x) = −yk ≤0:

Min.(λ1,u1,u2) ε = ‖λ∇f1 + (1− λ1)∇f2 + u1∇g1 + u2∇g2‖,Subject to −xk ≤ 0, −yk ≤ 0,

−u1xk = 0, −u2y

k = 0,u1 ≥ 0, u2 ≥ 0,0 ≤ λ1 ≤ 1.

(15)The use of λ2 = 1− λ1 and restricting λ1 to lie within [0, 1]ensures that (i) both λ values are non-negative and (ii) bothare not zero at the same time. The optimal solution of theabove problem can be computed exactly, as follows:

1) Case 1: When xk > 0 and yk > 0: This means thatboth u1 and u2 are zero. Thus, the minimum ε∗ can becalculated as:

ε∗ =K2√

K22 + (1−K1)2

, (16)

where

K1 =2(xk − 0.5)(1 + yk)

(1− (xk − 0.5)2)2,

K2 =1

1− (xk − 0.5)2.

Corresponding λ∗1 =

K21+K2−K1

K22+(1−K1)2

.2) Case 2: When x > 0 and y = 0: This means that u1 =

0. The minimum error can be founds as ε∗ = 0 withλ∗1 = K1

K1−1 and u∗2 = K2

1−K1.

3) Case 3: When x = 0 and y > 0: This means u2 = 0. Theminimum error can be founds as ε∗ = 0 with λ∗

1 = 1and u∗

1 = 1.4) Case 4: When x = 0 and y = 0: The minimum error

can be computed as ε∗ = 0 with u∗1 = λ(1−K1) +K1

and u∗2 = (1 − λ1)K2 for any λ1 ∈ [0, 1].

We consider 10,000 equi-spaced points in the variable spaceand compute the above KKT error value for each of them. TheKKT error value is then plotted in Figure 2. It is clear that theKKT error is zero for all Pareto-optimal solutions (y = 0), butsurprisingly the KKT error value increases as the points getcloser to the Pareto-optimal solution, as shown in Figure 3.This figure is plotted for a fixed value of x: x = 0.2, butfor different values of y. The Pareto-optimal solution alongthis line corresponds to the point (0.2, 0)T . The KKT error at

Efficient front 0

0.2 0.3

0.4 0.5

0.6

0

f1f2

KKT Error

2.8 2.6

2.4 2.2

2 1.8 1.6

1.4 1.2

1 0.1

Fig. 2. KKT Error value increases towardsthe efficient front in Problem P1.

y=0 is Pareto−optimal solution

0

0.2

0.3

0.4

0.5

0.6

0 0.2 0.4 0.6 0.8 1

KK

T E

rror

(x=

0.2)

y

0.1

Fig. 3. KKT Error valueincreases towards the Pareto-optimal point along a line x =0.2 for Problem P1.

this point (0.2, 0)T is zero, as it is evident from Figure 3, butit jumps to a large value in its vicinity. Ironically, the figuredepicts that for points away from this Pareto-optimal point theKKT error gets smaller. A similar naive approach [21] was alsofound to produce near zero KKT error for feasible solutions faraway from the Pareto-optimal solutions, thereby making thenaive KKT error measure as an unreliable metric for estimatingconvergence pattern to the Pareto-optimal solutions.

The above phenomenon suggests that KKT optimality con-dition is a singular set of conditions valid only at the KKTpoints. A minimal violation of equilibrium KKT condition (wedescribe here as a ‘naive’ measure) cannot provide an ideaabout how close a solution is to a KKT point in general. Toinvestigate the suitability of the above KKT Error metric as apotential termination condition or as a measure for judgingcloseness of obtained non-dominated solutions to the truePareto-optimal solutions, we consider the problem stated inEquation 14 and attempt to solve it using NSGA-II procedure[12]. The problem has two variables, two objectives, and fourconstraints. NSGA-II is used with standard parameter settings:population size 20, SBX operator [22] with probability pc =0.9 and distribution index (ηc) of 20 and polynomial mutationoperator [2] with probability pm = 0.5 and index ηm = 50.The non-dominated solutions from each generation is extractedand the optimization problem stated in Equation 15 is solvedusing Matlab’s fmincon() routine. Non-dominated solutionsat various generations are shown in Figure 4. It can be seenthat with generations the non-dominated solutions approachthe efficient frontier. At generation 50, the population is wellspread over the entire efficient frontier.

0.4

Gen=10Gen=7Gen=5Gen=4Gen=3Gen=2Gen=1Gen=0

0.5

f2

f1

1

1.2

1.4

1.6

1.8

2

2.2

2.4

0 0.1 0.2 0.3

Gen=50

Fig. 4. NSGA-II obtained non-dominated solutions approach the ef-ficient frontier with generations forProblem P1.

0.4

0.2

0.6

0.8

1

0 20 40 60 80 100

KK

T E

rror

Generation Number

SmallestAverageMedianLargest

0

Fig. 5. Variation of smallest, aver-age, median and largest KKT Errormetric values for NSGA-II obtainednon-dominated points for ProblemP1.

5

The resulting KKT Error (ε∗) for each non-dominatedsolution is recorded and the minimum, mean, median, andmaximum KKT Error value is plotted in Figure 5 at everygeneration. It is interesting to observe that although the small-est KKT Error value reduces in the initial few generations, thevariation of the KKT Error value is too large even up to gen-eration 100 for it to be considered as any stable performancemetric. On the contrary, Figure 4 shows that the non-dominatedsolutions approach the true efficient front with generations.NSGA-II uses an elite preserving operator in emphasizingnon-dominated and less-crowded (based on the crowding dis-tance operator [12]) solutions, but the above figure suggeststhat although better non-dominated and distributed points arecontinuously found by NSGA-II (as Figure 4 also supports),their satisfaction of KKT optimality conditions need not beimproved with generations. This behavior of the KKT Errormeasure does not allow us to use it to either as a perfor-mance metric or as a termination condition. However, theKKT proximity measure suggested in earlier studies [9], [10]for single-objective optimization problems relaxed both theequilibrium condition and complementary slackness condition,thereby enabling us to derive a monotonic KKT proximitymeasure in that study. In the following section, we extend theKKT proximity metric concept to multi-objective optimizationand investigate if it can alleviate the difficulties with the KKTerror metric studied above.

B. Proposed KKT Proximity Measure

One of the common ways to solve the generic multi-objective optimization problem stated in Equation 1 is to solvea parameterized achievement scalarization function (ASF) op-timization problem repeatedly for different parameter values.The ASF approach was originally suggested by Wierzbicki[23]. For a specified reference point z and a weight vector w(parameters of the ASF problem), the ASF problem is givenas follows:

Minimize(x) ASF(x, z,w) = maxMi=1

(fi(x)−zi

wi

),

Subject to gj(x) ≤ 0, j = 1, 2, . . . , J.(17)

The reference point z ∈ RM is any point in the M -dimensional objective space and the weight vector w ∈ RM

is an M -dimensional unit vector for which every wi ≥ 0and ‖w‖ = 1. To avoid division by zero, we shall considerstrictly positive weight values. It has been proven that forabove conditions of z and w, the solution to the above problemis always a Pareto-optimal solution [1]. Figure 6 illustratesthe ASF procedure of arriving at a weak or a strict Pareto-optimal solution. For illustrating the working principle ofthe ASF procedure, we consider specific reference vector z(marked in the figure) and weight vector w (marked as winthe figure). For any point x, the objective vector f is computed(shown as F). Larger of two quantities ((f1 − z1)/w1 and(f2 − z2)/w2) is then chosen as the ASF value of the pointx. For the point G, the above two quantities are identical, thatis, (f1− z1)/w1 = (f2− z2)/w2 = p (say). For points on lineGH, the first term dominates and the ASF value is identical top. For points on line GK, the second term dominates and the

w

z

B

Obj1

A Iso−ASFline for B

(f1−z1)D

=(z1,z2)

=(w1,w2)

G

F=(f1,f2)

(f2−

z2)

K

H1/w1

1/w2

O

Obj2

Fig. 6. ASF procedure of finding a Pareto-optimal solution is illustrated.

ASF value is also identical to p. Thus, it is clear that for anypoint on lines GH and GK, the corresponding ASF value willbe the same as p. This is why an iso-ASF line for an objectivevector at F traces any point on lines KG and GH. For anotherpoint A, the iso-ASF line is shown in dashed lines passingthrough A, but it is important to note that its ASF value willbe larger than p, since its iso-ASF lines meet at the w-lineaway from the direction of minimum of both objectives. Sincethe ASF function is minimized, point F is considered betterthan A in terms of the ASF value. Similarly, point A willbe considered better than point B. A little thought will revealthat when the ASF problem is optimized the point O willcorrespond to the minimum ASF value in the entire objectivespace (marked with a shaded region), thereby making pointO (an efficient point) as the final outcome of the optimizationproblem stated in Equation 17 for the chosen z and w vectors.By keeping the reference point z fixed and by changing theweight vector w (treating it like a parameter of the resultingscalarization process), different points on the efficient frontcan be generated by the above ASF minimization process.

For our purpose of estimating a KKT proximity measurefor multi-objective optimization, we fix the reference point zto an utopian point: zi = zideali − εi for all i, where zideali isthe ideal point and is set as the optimal function value of thefollowing minimization problem:

Minimize(x) fi(x),Subject to gj(x) ≤ 0, j = 1, 2, . . . , J.

(18)

Thus, the above optimization task needs to be performedM times to obtain the M -dimensional z-vector. For all ourcomputations of this paper, we set εi = 0.01. Note that εi = 0can also be chosen.

Setting the reference point as an utopian vector, we are leftwith a systematic procedure for setting the weight vector w.For this purpose, we compute the direction vector from z to theobjective vector f = (f1(x), f2(x), . . . , fM (x))T computedat the current point x for which the KKT proximity measureneeds to be computed. The weight value for the i-th objectivefunction is computed as follows:

wi =fi(x)− zi√∑M

k=1(fk(x)− zk)2. (19)

6

In the subsequent analysis, we treat the above w-vector fixedfor a given iterate xk and set wk = w. Figure 6 illustratesthe reference point and the weight vector. Notice, how theASF optimization problem results in a corresponding Pareto-optimal solution. Thus, the idea of the above formulation ofthe ASF problem is that if xk under consideration is alreadya Pareto-optimal solution, the solution to the resulting ASFoptimization problem will be the same solution and the KKToptimality conditions are expected to be satisfied. However, ifthe point x is away from the efficient frontier, our proposedKKT proximity measure is expected to provide a monotonicmetric value related to the proximity of the point from thecorresponding ASF optimal solution.

To formulate the KKT proximity measure, first, we use areformulation of optimization problem stated in Equation 17into a smooth problem:

Min.(x,xn+1) F (x, xn+1) = xn+1

Subject to(

fi(x)−ziwk

i

)− xn+1 ≤ 0, i = 1, 2, . . . ,M,

gj(x) ≤ 0, j = 1, 2, . . . , J.(20)

Since we plan to use Matlab’s fmincon() optimizationalgorithm for solving the ASF minimization problem, thisconversion is helpful. A new variable xn+1 is added and Madditional inequality constraints arise to make the problemsmooth [1]. Thus, to find the corresponding Pareto-optimalsolution, the above optimization problem has (n+1) variables:y = (x;xn+1). For an ideal or an utopian point as z, theoptimal solution to the above problem is expected to makex∗n+1 ≥ 0.Since the above optimization problem is a single-objective

problem, we can use the KKT proximity metric discussedin the previous section to estimate a proximity measure forany point x. Note that above problem has M + J inequalityconstraints:

Gj(y) =

(fj(x)− zj

wkj

)− xn+1 ≤ 0, j = 1, . . . ,M,(21)

GM+j(y) = gj(x) ≤ 0, j = 1, 2, . . . , J. (22)

The KKT proximity measure can now be computed for agiven feasible xk as follows. Note that xk is known, but thevariable xn+1 is unknown. Thus, we first construct an (n+1)-dimensional vector y = (x;xn+1) and formulate the followingoptimization problem using Equation 20 to compute the KKTproximity measure:

Minimize(εk,xn+1,u) εk +∑J

j=1

(uM+jgj(x

k))2

,

Subject to ‖∇F (y) +∑M+J

j=1 uj∇Gj(y)‖2 ≤ εk,∑M+Jj=1 ujGj(y) ≥ −εk,

uj ≥ 0, j = 1, 2, . . . , (M + J),−xn+1 ≤ 0.

(23)The added term in the objective function allows a penaltyassociated with the violation of complementary slacknesscondition. If the iterate xk is Pareto-optimal or a KKT point,the complementary slackness condition will be zero (by eithergj(x

k) = 0 or uM+j = 0) and hence the above optimizationshould end up with a zero objective value. For any other

solution, the left side expressions of first two constraintsin Equation 23 are not expected to be zero, but the aboveminimization problem should drive the process to find min-imum possible value of εk and constraint violation. In somesense, the optimal εk value should provide us with an ideaof the extent of violation of KKT equilibrium condition andcomplementary slackness condition at every iterate xk. Theaddition of the penalty term in the objective function allowsto have a larger (and worse) KKT proximity measure value forfeasible but non-Pareto-optimal solutions. The final constraintis added to make sure a positive ASF value (xn+1) is assignedto every feasible point for an ideal or utopian reference pointz.

The optimal objective value (ε∗k) to the above problem willcorrespond to the proposed KKT proximity measure. Let usnow discuss the variables of the above optimization problem.They are εk, xn+1, and the Lagrange multiplier vector uj ∈RM+J . Thus, there are (M+J+2) variables and (M+2J+2)inequality constraints to the above problem. Let us describethe constraints next. The first constraint requires gradient ofF and G functions:∥∥∥∥∥∥∥∥∥∥∥∥∥∥

00...01

+

M∑i=1

ui

1wk

i

∂fi(xk)

∂x1

1wk

i

∂fi(xk)

∂x2

...1wk

i

∂fi(xk)

∂xn

−1

+J∑

j=1

uM+j

∂gj(xk)

∂x1

∂gj(xk)

∂x2

...∂gj(x

k)∂xn

0

∥∥∥∥∥∥∥∥∥∥∥∥∥∥≤ √

εk.

(24)Here, the quantity ∂fi(x

k)∂xj

is the partial derivative of objectivefunction fi(x) with respect to variable xj evaluated at thegiven point xk. A similar meaning is associated with the partialderivative of the constraint gj above. Using the L2-norm andsquaring both sides, we obtain the following constraint:

∥∥∥∥∥∥M∑j=1

uj

wkj

∇fj(xk) +

J∑j=1

uM+j∇gj(xk)

∥∥∥∥∥∥2

+

1−

M∑j=1

uj

2

−εk ≤ 0. (25)

The second constraint in Equation 23 can be rewritten asfollows:

−εk−M∑j=1

uj

(fj(x

k)− zj

wkj

− xn+1

)−

J∑j=1

uM+jgj(xk) ≤ 0.

(26)For infeasible iterates, we simply compute the constraint

violation as the KKT proximity measure. Therefore, the KKTproximity measure for any iterate xk is calculated as follows:

KKT Proximity Measure(xk) ={ε∗k, if xk is feasible,1 +

∑Jj=1

⟨gj(x

k)⟩2

, otherwise.(27)

An analysis of the optimization problem stated in Equation 23and the subsequent simplifications reveal an important result.The optimal KKT proximity measure for feasible solutions is

7

given as follows:

ε∗k = 1−M∑j=1

u∗j −

J∑j=1

(u∗M+jgj(x

k))2

. (28)

Since each u∗j ≥ 0 and third term is always positive, ε∗k

is always bounded in [0, 1] for any feasible solution xk.To make infeasible solutions to have worse KKT proximitymeasure value than that for any feasible solution, we add onein the expression of ε∗k value for an infeasible solution inEquation 27. For a KKT point xk = x∗, the complementaryslackness u∗

M+jgj(x∗) = 0 for all constraints and Equation 24

reduces to εk ≥(1−∑M

j=1 uj

)2. In such a case, along with

other conditions including the fact that each uj is independent,it can be shown that ε∗k = 0, meaning that at the KKT pointthe KKT proximity measure is always zero.

We now illustrate the working of the above KKT proximitymeasure on Problem P1, which was used in illustrating theworking of the naive KKT error metric earlier. The ideal pointfor this problem is zideal = (0, 1)T . The variable bounds areconverted into four inequality constraints and the optimizationproblem stated in Equation 23 is solved for 10,000 gridpoints in the variable space. Each optimization problem issolved using Matlab’s fmincon() optimization routine. Figure 7shows the KKT proximity surface and its contour plot on theobjective space. The efficient frontier can be clearly seen from

f2 0 0.2

0.4 0.6

0.8 1

1 1.2

1.4 1.6

1.8 2

0 0.1 0.2 0.3 0.4 0.5

f1

KK

T P

roxi

mity

Mea

sure

Fig. 7. KKT proximity measure is plottedfor Problem P1.

Pareto−optimalpoint

0.05

0.15

0.2

0.25

0 0.2 0.4 0.6 0.8 1

KK

T P

roxi

mity

Mea

sure

y

0

0.1

Fig. 8. KKT proximity measurereduces to zero at the Pareto-optimal solution along a linex = 0.2 for Problem P1.

the contour plot. Moreover, it is interesting to note how thecontour lines become almost parallel to the efficient frontier,thereby indicating that when a set of non-dominated solutionsclose to the efficient frontier is found, their KKT proximitymeasure values will be more or less equal to each other.Also, unlike in the naive approach (shown in Figure 2), amonotonic reduction of the KKT proximity measure towardsthe efficient frontier is evident from this figure and also fromFigure 8 plotted for a fixed value of x = 0.2 and spanningy in its range [0,1]. The proposed KKT proximity measuremonotonically reduces to zero at the corresponding Pareto-optimal point. A comparison of this figure with 3 clearlyindicates that the proposed KKT proximity measure can beused to assess closeness of points to Pareto-optimal pointsand can also be used to possibly determine a termination ofan EMO simulation run reliably.

To investigate further whether the proposed KKT Proximitymetric can be used as a proximity indicator of non-dominated

solutions to the true Pareto-optimal solutions, we calculatethe smallest, average, median and largest KKT proximityvalue for all NSGA-II non-dominated solutions generation-wise obtained earlier for Problem P1 and plot them in Figure 9.The inset figure shows a fairly steady decrease in the KKTproximity value with generation to zero, which is a true re-flection of the generation-wise NSGA-II non-dominated pointsshown in Figure 4. This figure clearly indicates the possibilityof using the proposed KKT proximity measure as a terminationcriterion or as a metric for evaluating convergence propertiesof an EMO algorithm.

0.15

2 4 6 8 10 12 14Generation Number

KK

T P

roxi

mity

Mea

sure

0.1

0.05

0

0

SmallestAverageMedianLargest

0.05

0.1

0.15

0.2

0 20 40 60 80 100

KK

T P

roxi

mity

Mea

sure

Generation Number

0.2

0

Fig. 9. Smallest, average, median and largest KKT proximity measure valuesare plotted for generation-wise NSGA-II obtained non-dominated points forProblem P1.

Next, we consider a constrained optimization problem (P2)having two inequality constraints, which may cause an set-based multi-objective optimization algorithm to face bothfeasible and infeasible points along the way of convergingto the Pareto-optimal set:

P2 :

Minimize(x,y) {x, y},Subject to g1(x) = (x− 1)2 + (y − 1)2 ≤ 0.81,

g2(x) = x+ y ≤ 2,0 ≤ (x, y) ≤ 2.

(29)The above problem introduces six constraints in total consid-ering the variable bounds as constraints. The Pareto-optimalsolutions lie on the first constraint boundary: y∗ = 1 −√0.81− (x∗ − 1)2 for x∗ ∈ [0.1, 1]. To investigate the KKT

proximity measure at different infeasible and feasible pointsin the search space, we compute the KKT proximity metricvalue for 10,000 points in the search space and plot the metricsurface in Figure 10. Both the surface and the contours ofthe KKT proximity measure indicates that it is zero at thePareto-optimal solutions and increases as the points moveaway from the efficient front. Once again, we observe that themetric value increases monotonically from the feasible regiontowards infeasible region and also inside the feasible region,the proximity metric value monotonically reduces to zero tothe efficient front.

NSGA-II is applied to Problem P2 with identical parametersettings as in Problem P1. To make the problem a bit moredifficult, upper bounds on both variables were chosen to be 10,instead of 2. Constrained non-dominated solutions are shown

8

POfront

regionFeasible

(a) (b)

2

f1f2

0

2

4

6

KK

T P

roxi

mity

Mea

sure

2

f1

1.5 1

0.5 0

f2

2 0 0.5 1 1.5 2 0

0.5

1

0

1.5

0.5 1

1.5

Fig. 10. KKT proximity measure is plotted for Problem P2. The figure (a)shows the feasible space and the contour plot of the KKT proximity measure.The figure (b) shows the KKT proximity measure surface.

in Figure 11. It is observed that up to the sixth generation,

A

Feasible space

4

Gen=2

f2

Gen=0

Gen=100Gen=20Gen=12Gen=9Gen=7Gen=5

0

0.5

1

1.5

2

2.5

3

3.5

4

0 0.5 1 1.5 2 2.5 3 3.5f1

Fig. 11. NSGA-II obtained con-strained non-dominated solutions ap-proach the efficient frontier with gen-erations for Problem P2.

Infeasible solutions before Gen=7

Due to Solution A

Smallest

Median

0

1

2

3

4

5

0 20 40 60 80 100

KK

T P

roxi

mity

Mea

sure

Generation Number 7

Average

Largest

Fig. 12. Variation of smallest,average, median and largest KKTproximity metric values for NSGA-IIobtained constrained non-dominatedpoints for Problem P2.

no feasible solution is present in the NSGA-II population.Thereafter, NSGA-II finds solutions in the feasible spaceand is able to find solutions close to the efficient frontierwith generations, as shown in the figure. KKT proximitymeasure values are then computed for the non-dominatedsolutions. When no feasible solution exist in a population, theconstrained domination principle declares the least constraintviolated solution as the constrained non-dominated solution.Figure 12 shows that the KKT proximity value reduces froma large value until generation 6, indicating that infeasiblesolutions get closer to the feasible region with generations.Thereafter, with more feasible solutions being found, the KKTproximity measure values reduce close to zero. At generation16, solution A (shown in Figure 11), which is a non-dominatedsolution to the rest, but is not a Pareto-optimal solution, iscreated. As can be seen from Figure 10(a), this point has arelatively large KKT proximity value, but when a near-Pareto-optimal solution is found to dominate Solution A at generation23, the largest KKT proximity measure value reduces. Afterabout 40 generations, all non-dominated points are close tothe true efficient front and the the variation in KKT proximityvalues among these solutions is also small and the KKTproximity values are very close to zero.

The KKT proximity measure on both the above test-problems indicates it can be used as a measure to performanceof a set-based multi-objective optimization algorithm. In thefollowing section, we compute the KKT proximity measurefor generation-wise sets of non-dominated solutions obtained

by different multi and many-objective evolutionary algorithmsfrom the literature.

C. Augmented KKT Proximity Measure to Avoid Weak Pareto-optimal Solutions

The ASF procedure calculated from an ideal or an utopianpoint may result in a weak Pareto-optimal solution [2], asshown in Figure 13. Consider a weak Pareto-optimal point A

w

zfor A

front

iso−AASF line

Obj1=(z1,z2)

=(w1,w2)Obj2

F=(f1,f2)

A Iso−AASFline for F

Weaklyefficeintfront

O

Strictlyefficeintfront

iso−ASF line for A

Strictlyefficeint

Fig. 13. The augmented achievement scalarizing function (AASF) approachis illustrated.

shown in the figure. The iso-ASF contour line passing throughthis point will have the same ASF value as for the strictefficient point O. A minimization of the ASF problem forthe chosen z and w-vectors will find any weak Pareto-optimalsolutions on the line OA. However, most EMO algorithms areexpected to find strict Pareto-optimal solutions (such as pointO only), which are not weak. As discussed elsewhere [1], theweak Pareto-optimal solutions can be avoided if an augmentedASF is used instead. We describe the AASF optimizationproblem as follows:

Minimize(x) ASF(x, z,w) = maxMi=1

(fi(x)−zi

wi

)+ρ∑M

j=1

(fi(x)−zi

wi

),

Subject to gj(x) ≤ 0, j = 1, 2, . . . , J.

(30)

Here, the parameter ρ takes a small value (∼ 10−4). Theadditional term on the objective function has an effect ofmaking the iso-AASF lines inclined to objective axes, asshown in the figure. The inclination angle is more for largervalue of ρ. The iso-AASF line for an arbitrary point F is shownin the figure. A little thought will reveal that with some positiveρ value, the optimal solution to the above AASF problem isthe point O, as point A or other weak Pareto-optimal pointswill make the AASF value larger than that at A, due to itsmeeting the w-line at a higher point. The difference betweenASF and AASF values at point A is clear from the iso-ASFand iso-AASF lines shown in the figure.

The above-mentioned calculation procedure of KKT prox-imity measure can be modified using the above AASF for-mulation, which is provided in the appendix of this paper. Weshall demonstrate the effect of using the AASF approach com-pared to the ASF approach on some test problems having weak

9

Pareto-optimal points. The next section discusses simulationresults using the above KKT proximity measures on severalmulti and many-objective problems from the literature.

V. RESULTS

In this section, we consider two and three-objective testproblems to demonstrate the working of the proposed KKTproximity measure with an EMO algorithm. Two-objectiveproblems are solved using NSGA-II [12] and three or moreobjective problems are solved using NSGA-III [5]. In allproblems, we use the SBX recombination operator [22] withpc = 0.9 and ηc = 30 and polynomial mutation operator [2]with pm = 1/n (where n is the number of variables) andηm = 20. Other parameters are mentioned in discussions onindividual problems.

A. Two-Objective ZDT Problems

First, we consider commonly used two-objective ZDT prob-lems. ZDT1 and ZDT2 has 30 variables and the efficient solu-tions occur for xi = 0 for i = 2, 3, . . . , 30. Before we evaluatethe performance of an EMO algorithm on these problems, wefirst consider ZDT1 problem and 30 variables are set using twoparameters: x = x1 and y = x2 = x3 = . . . = x30. Thereafter,we calculate the proposed KKT proximity measure for twoparameters x and y in the range [0, 1]. 10,000 equi-spacedpoints are chosen i x-y plane to make the plot. Since at x1 = 0,the derivative of f2 does not exist, we do not compute theKKT Proximity measure for such solutions. Figure 14 clearlyindicates the following two aspects:

1) The KKT proximity measure reduces to zero at theefficient frontier.

2) The KKT proximity measure increases almost parallelyto the efficient frontier in the local vicinity of theefficient frontier.

Efficient front 0

0.4 0.6

0.8 1

0 1

2 3

4 5

0 0.2 0.4 0.6 0.8

1

f1f2

KK

T P

roxi

mity

Mea

sure

0.2

Fig. 14. KKT proximity measure is plotted for Problem ZDT1.

We now apply the KKT proximity measure procedure withASF formulation to each non-dominated solution obtained byNSGA-II (with a population size of 100) at every generationuntil 200 generations. Thereafter, the smallest, median, andlargest KKT proximity measure value for all non-dominatedsolutions at each generation is plotted in Figure 15. If theproposed KKT proximity measure is used to terminate the

80 100 120

First Quartile

140 160

Median

180 200

Third Quartile

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 20 40 60

Smallest

Fig. 15. Generation-wise KKT proximity measure is computed for non-dominated solutions of NSGA-II populations for Problem ZDT1.

NSGA-II run, the run would have terminated at around 60-th generation, as the difference between minimum and me-dian KKT proximity measure for all non-dominated points isinsignificant thereafter. It is also interesting to note how theKKT proximity measure values reduce to zero with generation.Interestingly, EMO algorithms work without any derivativeinformation or without using any KKT optimality conditionsin its operations. Even then, the an EMO algorithm (NSGA-II here) is able to approach the theoretical efficient frontierwith increasing generations. The first population member toreach the efficient front took 32 generations, while 50%population members converge within 38 generations and all100 population members converge to the efficient front in just59 generations. Not only the KKT proximity measure depictsthe dynamics of convergence behavior of NSGA-II on ZDT1,it also indicates that generation span (27 generations) withinwhich a rapid convergence of all population members takeplace. The close variation of all five indicators in the figuresuggests that NSGA-II converges front-wise in the ZDT1problem.

Next, we consider 30-variable ZDT2 problem. Consideringx = x1 and y = x2 = x3 = . . . = x30, we compute KKTproximity measure for 10,000 points in the x-y space andresulting measure values are plotted on the ZDT2 objectivespace in Figure 16. ZDT2 has a non-convex efficient front, as

Efficient front 0

0.4 0.6

0.8 1

0 1

2 3

4 5

0 0.2 0.4 0.6 0.8

1

f1f2

KK

T P

roxi

mity

Mea

sure

0.2

Fig. 16. KKT proximity measure isplotted for Problem ZDT2.

80 100 120

First Quartile

140 160

Median

180 200

Third Quartile

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 20 40 60

Smallest

Fig. 17. Generation-wise KKTproximity measure is computedfor non-dominated solutions (exceptx1 = 0 solutions) of NSGA-IIpopulations for Problem ZDT2.

shown in the figure. Notice, how the KKT proximity measuremonotonically reduces to zero at the efficient frontier. Thecontour plots indicate almost parallel contour lines to the

10

efficient frontier, as the points move away from the efficientfrontier. The smoothness of the KKT proximity measuresurface is another interesting aspect which can be exploitedin various ways.

Now, we demonstrate how the non-dominated sets of pointsobtained from NSGA-II are evaluated by the KKT proximitymeasure in Figure 17. A population of size 100 is usedand a maximum of 200 generations are run. Here, we in-clude all non-dominated points for KKT Proximity measurecomputation except points having x1 = 0. This is becausex1 = 0, makes f1 = 0 and these points correspond to weakPareto-optimal points and thus they will be judged as KKTpoints having our KKT proximity measure value as zero.Figure 16 demonstrates this aspect clearly. It is clear thatwithout considering weak Pareto-optimal points having x1 = 0points (some of which are also strict Pareto-optimal points),the KKT proximity measure reduces to zero with generation.Finally, at around generation 39, there is one solution that lieson the true efficient front. Soon thereafter, all non-dominatedsolutions fall on the true efficient front.

If, however, all non-dominated solutions including x1 = 0are considered for the KKT proximity measure computation,the convergence to the zero KKT proximity measure value isfaster, as shown in Figure 18. The first efficient solution isfound at generation 1 itself and this point is a weak Pareto-optimal point.

80 100 120

First Quartile

140 160

Median

180 200

Third Quartile

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 20 40 60

Smallest

Fig. 18. Generation-wise KKTproximity measure is computed usingASF for all non-dominated solutionsof NSGA-II populations for ProblemZDT2.

80 100 120

First Quartile

140 160

Median

180 200

Third Quartile

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 20 40 60

Smallest

Fig. 19. Generation-wise KKTproximity measure is computed us-ing augmented ASF for all non-dominated solutions of NSGA-IIpopulations for Problem ZDT2.

Next, we consider the augmented ASF (or AASF) for theKKT proximity measure computation for all non-dominatedsolutions of NSGA-II runs on ZDT2 problem. Since weakPareto-optimal solutions will now not be optimal solutionsfor the augmented ASF computation, the weak Pareto-optimalsolutions will produce non-zero proximity measure value.Figure 19 shows the variation of the KKT proximity measurewith AASF. It takes 39 generations to find the first strictlyefficient solution. Once again, the measure reduces with gen-eration counter, but the convergence is a bit slower than thatin Figure 18. The reduction of KKT proximity measure tozero with generation is an indication that it can be usedas a termination condition or can be used as a measure ofproximity of non-dominated sets to the efficient frontier. Thefigure reveals that NSGA-II requires 43 generations to have50% population members to converge to the efficient front

and 50 generations for all 100 solutions to converge. SinceAASF formulation is capable of differentiating weak fromstrict Pareto-optimal points, we use the AASF formulationto compute the KKT proximity measure for the rest of theproblems here.

Now, we consider 30-variable ZDT3 problem. KKT proxim-ity measures for NSGA-II non-dominated solutions (obtainedwith a population size of 40 and run until 200 generations)using the AASF metric are shown in Figure 20. The firstPareto-optimal solution was found at generation 7 and half ofall Pareto-optimal solutions were found at generation 26, but ittook 159 generations to find all Pareto-optimal solutions. Thedisjoint efficient front in this problem makes NSGA-II to find100% population members to converge to the true efficientfront. The KKT proximity measure variation supports theoriginal test-problem construction philosophy for designingZDT3 problem, as it was then thought that the disconnectednature of the efficient sets could provide difficulties to a multi-objective algorithm [24].

Largest

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140 160 180 200

KK

T P

roxi

mity

Mea

sure

Generation Number

SmallestFirst Quartile

MedianThird Quartile

0

Fig. 20. Generation-wise KKTproximity measure is computed fornon-dominated solutions of NSGA-IIpopulations for Problem ZDT3.

Third QuartileMedian

Largest

First Quartile

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200 250 300

KK

T P

roxi

mity

Mea

sure

Generation Number

Smallest

Fig. 21. Generation-wise KKTproximity measure is computed fornon-dominated solutions of NSGA-IIpopulations for Problem ZDT4.

The 10-variable ZDT4 problem is multimodal and has about100 local efficient fronts and is more difficult to solve thanZDT1 and ZDT2. Using identical NSGA-II parameter valuesand with a population size of 100, Figure 21 shows howgeneration-wise variation of the KKT proximity measure takesplace for the non-dominated population members of a typicalNSGA-II run. Once again, a monotonic decrease in the KKTproximity measure indicates its suitability as a terminationcondition for an EMO algorithm. Here again, we do notconsider points having x1 = 0 to avoid points for which thederivative does not exist.

Finally, KKT proximity measure values for NSGA-II pop-ulation members (population size 40) for ZDT6 problem areshown in Figure 22. The first, half and all population membersrequire 21, 27, and 28 generations to converge to the efficientfront. This plot suggests almost simultaneous convergenceof 40 points on the true efficient set. Although a maximumgenerations of 200 were fixed for the NSGA-II run, the figureindicates that a maximum of 28 generations were enough tohave all solutions to converge on the efficient front for theabove simulation run.

11

Largest

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140 160 180 200

KK

T P

roxi

mity

Mea

sure

Generation Number

SmallestFirst Quartile

MedianThird Quartile

0

Fig. 22. Generation-wise KKT proximity measure is computed for non-dominated solutions of NSGA-II populations for Problem ZDT6.

B. Three-objective DTLZ Problems

DTLZ1 problem has a number of locally efficient frontson which some points can get stuck, hence it is relativelydifficult problem to solve to global optimality. Recently pro-posed NSGA-III procedure [5] is applied to DTLZ1 problemswith 92 population members for 1,000 maximum generations.Figure 23 shows the variation of KKT proximity measurevalues (with AASF formulation) versus the generation counter.Although a Pareto-optimal solution is found fairly early (68

0 100 200 300

First Quartile

400 500 600

Median

700 800 900

Third Quartile

1000

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

0

0.2

0.4

0.6

0.8

1

Smallest

Fig. 23. Generation-wise KKT proximity measure is computed for non-dominated solutions of NSGA-III populations for three-objective DTLZ1.

generations), half of the non-dominated set took more than 480generations to converge to the true efficient front, however all92 population members took an overall 731 generations forthe same purpose. The presence of multiple efficient frontscauses such a delayed overall convergence in this problem.

Next, we apply NSGA-III with a population size of 92 tothree-objective DTLZ2 problem, which has a concave efficientfront. NSGA-III was run for a maximum of 400 genera-tions. Figure 24 shows the KKT proximity measure values.Interestingly, the convergence of the entire non-dominatedset to the respective Pareto-optimal solutions happens almostat the same time in this problem, as dictated by the smalldifferences in smallest, median and largest KKT proximitymeasure values. The first, 50% and all 92 population memberstook 55, 83, and 157 generations, respectively, to converge tothe efficient front for the above particular run.

DTLZ5 problem has a degenerate efficient front. Althoughthe problem has three objectives, the efficient front is two-

100 150 200

First Quartile

250 300

Median

350 400

Third Quartile

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0 50

Smallest

Fig. 24. Generation-wise KKTproximity measure is computed fornon-dominated solutions of NSGA-III populations for three-objectiveDTLZ2.

0.8

0 100

First Quartile

200 300

Median

400 500

Third Quartile

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7 Smallest

Fig. 25. Generation-wise KKTproximity measure is computed fornon-dominated solutions of NSGA-III populations for three-objectiveDTLZ5.

dimensional. Figure 25 shows the KKT proximity measurevalues versus generation number for NSGA-III with a popula-tion size of 92. In this problem, a number of ‘redundant’ solu-tions remain as non-dominated with Pareto-optimal solutionsand these solutions cause a challenge to an EMO algorithmto converge to the true efficient front. The first populationmember took 80 generations to converge to the efficient front.Half the population took 110 generations to converge to theefficient front, but it tool 163 generations for all 92 populationmembers to converge to the efficient front.

C. Many-Objective Optimization Problems

As the number of objectives increase, EMO algorithms findDTLZ1 and DTLZ2 problems difficult to optimize as evidentfrom Figures 26 and 27. For five and 10-objective versions

0 100 200 300

First Quartile

400 500 600

Median

700 800 900

Third Quartile

1000

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

0

0.2

0.4

0.6

0.8

1

Smallest

Fig. 26. Generation-wise KKTproximity measure is computed fornon-dominated solutions of NSGA-III populations for five-objectiveDTLZ1.

KK

T P

roxi

mity

Mea

sure

Largest

Generation Number

SmallestFirst Quartile

Third Quartile

0

0.2

0.4

0.6

0.8

1

0 400 800 1200 1600 2000

Median

Fig. 27. Generation-wise KKT prox-imity measure is computed for non-dominated solutions of NSGA-IIIpopulations for 10-objective DTLZ1.

of DTLZ1 and DTLZ2, we have used 212 ad 276 populationmembers. While the five-objective DTLZ1 problem took 317and 959 generations to have half and all 212 populationmembers to reach the true efficient front, respectively, the10-objective problems takes about 215 and 1,875 generations,respectively.

Since DTLZ2 problem is relatively easier to solve to Pareto-optimality, KKT proximity measure values for five and 10-objective DTLZ2 problems are found to be steadily approach-ing to zero, as evident from Figures 28 and 29, respectively.The five-objective DTLZ2 problem requires 102 and 329 gen-erations for half and all 212 population members to converge

12

100 150 200

First Quartile

250 300

Median

350 400

Third Quartile

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 50

Smallest

Fig. 28. Generation-wise KKTproximity measure is computed fornon-dominated solutions of NSGA-III populations for five-objectiveDTLZ2.

100 150 200

First Quartile

250 300

Median

350 400

Third Quartile

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 50

Smallest

Fig. 29. Generation-wise KKT prox-imity measure is computed for non-dominated solutions of NSGA-IIIpopulations for 10-objective DTLZ2.

to the efficient front, while the 10-objective DTLZ2 problemrequires 215 and 348 generations, respectively, for half and all276 population members to converge to the efficient front.

The above results amply demonstrate the scalability of theproposed KKT proximity metric to many-objective optimiza-tion problems and behavior of NSGA-III in solving theseproblems.

D. Constrained Test Problems

We now consider some standard constrained test problems.First, we consider the problem TNK, which has two con-straints, two variables and two objectives, as shown below:

Min. f1(x) = x1,Min. f2(x) = x2,

s.t. g1(x) ≡ x21 + x2

2 − 1− 0.1 cos(16 arctan x1

x2

)≥ 0,

g2(x) ≡ (x1 − 0.5)2 + (x2 − 0.5)2 ≤ 0.5,0 ≤ x1 ≤ π, 0 ≤ x2 ≤ π.

(31)NSGA-II is run with 40 population members. For the specificrun, the initial population has only six feasible solutions, butsince they were away from the Pareto-optimal region, theirKKT proximity measure value was calculated to be large.As shown in Figure 30, thereafter, with iterations more andmore solutions became feasible and all 40 population membersbecomes feasible. Importantly, the non-dominated solutionsstarted to approach the efficient front. The corresponding KKT

40

MedianThird Quartile

Largest

60 80 100 5

10

15

20

25

30

35

40

KK

T P

roxi

mity

Mea

sure

# Fe

asib

le S

olut

ions

Generation Number

Smallest

0

0.1

0.2

0.3

0.4

0.5

0 20

First Quartile

Fig. 30. Generation-wise KKTproximity measure is computed fornon-dominated solutions of NSGA-IIpopulations for problem TNK.

Largest

Median

SmallestFirst Quartile

Third Quartile

0.8

1

0 50 100 150 200 250

KK

T P

roxi

mity

Mea

sure

Generation Number

0

0.2

0.4

0.6

Fig. 31. Generation-wise KKTproximity measure is computed fornon-dominated solutions of NSGA-IIpopulations for problem BNH.

proximity measure values (smallest, median and largest values

among the non-dominated solutions) reduce with generationand eventually at generation 48 all 40 population membersreach very close to the efficient front. If a termination criterionwas used based on KKT proximity measure, generation 48would have been the earliest opportunity to have all non-dominated solutions close to the efficient front for NSGA-IIto terminate. Without such information, it is difficult to decideon an appropriate termination point. Interestingly, NSGA-II ormost other EMO methods do not use any gradient informationor any KKT optimality conditions in their operators. Eventhen, the operator interactions of the algorithms allow thewhole population to continuously come closer to the efficientfront with iterations. This remains as an interesting matterwhich could encourage theoretical optimization researchers topay attention to the working procedure of EMO algorithms formulti-objective optimization.

Next, we consider problem BNH which has two variablesand two constraints:

Minimize f1(x) = 4x21 + 4x2

2,Minimize f2(x) = (x1 − 5)2 + (x2 − 5)2,subject to g1(x) ≡ (x1 − 5)2 + x2

2 ≤ 25,g2(x) ≡ (x1 − 8)2 + (x2 + 3)2 ≥ 7.7,0 ≤ x1 ≤ 5, 0 ≤ x2 ≤ 3.

(32)

Due to the known difficulty in solving this problem, NSGA-IIis run with 200 population members. Figure 31 shows the vari-ation of KKT proximity measure with generation for problemBNH. At generation 26, one of the population members fallson the efficient front, while half of the population requires212 generations. The remaining 50% of population membersdo not converge to the efficient front in 250 generations.

Next, we consider the consider the SRN problem, statedbelow:

Minimize f1(x) = 2 + (x1 − 2)2 + (x2 − 1)2,Minimize f2(x) = 9x1 − (x2 − 1)2,subject to g1(x) ≡ x2

1 + x22 ≤ 225,

g2(x) ≡ x1 − 3x2 + 10 ≤ 0,−20 ≤ x1 ≤ 20, −20 ≤ x2 ≤ 20.

(33)

The Pareto-optimal points were found to have the followingproperties: x1 = −2.5 and x2 ∈ [2.50, 14.79]. KKT prox-imity measure values computed for 160,801 uniformly-spacedgridded points in the search space (x = x1 and y = x2) areshown in a contour plot in Figure 32. It is clear that as thesolutions get closer to the above Pareto-optimal points, theKKT proximity measure value gets smaller. To illustrate thisaspect better, we have plotted the KKT proximity measurefor certain fixed y = x2 values in the range [2.50, 14.79]in Figure 33 for different x = x1 values. It is clear fromthe plot that closer a point is to x = −2.5, smaller is thecorresponding KKT proximity measure for each y value. Fory = 2.5, any x larger than −2.5 is infeasible and the KKTproximity measure value is also large. The penalty term inour KKT proximity measure expression (Equation 27) makesa more infeasible solution (having a larger g value) to havea larger KKT proximity measure, as also evident from thefigure.

Figure 34 plots the KKT proximity measure for generation-wise NSGA-II non-dominated solutions. A population of size

13

Pareto front

y

0.2 0.3 0.5 0.7 0.8 0.9 1

1.5 2

10 100 1000 5000

50000

−20−15−10 −5 0 5 10 15 20

−15

−10

−5

0

5

10

20

15

−20

x

Fig. 32. Contour plots of KKT proxim-ity measure on the constrained problemSRN shows how it monotonically re-duces for points close to the true Pareto-optimal solutions (x = −2.5 and y ∈[2.5, 14.79]).

Pareto−optimal point

KK

T P

roxi

mity

Mea

sure

1

10

100

−20 −15 −10 −5 0 5 10 15 20x

y=2.5y=5.0

y=10.0y=14.0

0.01

0.1

Fig. 33. KKT proximity mea-sure values are shown for certainfixed y values, clearly indicatingthe monotonic reduction close toPareto-optimal value (x = −2.5).

40 is used. Although one population member with a KKTproximity value less than 0.01 was first discovered at genera-tion 169, it took 363 generations for 25% population membersto converge to the efficient front. At 500-th generation, the50-th and 75-th percentile KKT proximity measure values are0.018 and 0.026, respectively. The reason for this wide spread

350

First QuartileMedian

Third QuartileLargest

400 450 500

KK

T P

roxi

mity

Mea

sure

Generation Number

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200 250 300

Smallest

Fig. 34. Generation-wise KKTproximity measure is computed fornon-dominated solutions of NSGA-IIpopulations for problem SRN.

−15

−10

−5

0

5

10

15

20

−20 −15 −10 −5 0 5 10 15 20x1

x2

−20

Fig. 35. Non-dominated NSGA-IIsolutions at generation 250 for prob-lem SRN.

in KKT proximity measure can be understood from Figure 35,which presents the non-dominated solutions in the x1-x2 planeat generation 500. It is clear that due to the lack of convergenceof all solutions on the exact Pareto-optimal set (x1 = −2.5 andx2 ∈ [2.5, 14.79] [2], shown by the dashed line) by NSGA-II procedure, the spread in KKT proximity measure values isalso large.

Next, we consider problem OSY, stated below:

Min. f1(x) = − [25(x1 − 2)2 + (x2 − 2)2 + (x3 − 1)2 + (x4 − 4)2 + (x5 − 1)2],

Min. f2(x) = x21 + x2

2 + x23 + x2

4 + x25 + x2

6,s.t. g1(x) ≡ x1 + x2 − 2 ≤ 0,

g2(x) ≡ x1 + x2 − 6 ≤ 0,g3(x) ≡ −x1 + x2 − 2 ≤ 0,g4(x) ≡ x1 − 3x2 − 2 ≤ 0,g5(x) ≡ (x3 − 3)2 + x4 − 4 ≤ 0,g6(x) ≡ −(x5 − 3)2 − x6 + 4 ≤ 0,0 ≤ x1, x2, x6 ≤ 10, 1 ≤ x3, x5 ≤ 5, 0 ≤ x4 ≤ 6.

(34)Pareto-optimal solutions for this problem were identified inanother study [2]. NSGA-II is run with 200 population size.

Figure 36 shows the KKT proximity measure values (smallest,25-th percentile, median, 75-th percentile and the largest forall non-dominated solutions) as they vary with generations. Asteady decrease in these values with the generation countercan be observed from the figure. The first population membercomes close to the efficient front at generation 33. Thereafter,50-th and 75-th percentile population members took 148 and173 generations, respectively, to converge to the efficient front.However, until generation 250, all 200 population memberscould not come close to the efficient front. To identify the

1

0 50

First Quartile

100 150

Median

200 250

Third Quartile

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

0

0.2

0.4

0.6

0.8Smallest

Fig. 36. Generation-wise KKTproximity measure is computed fornon-dominated solutions of NSGA-IIpopulations for problem OSY.

NSGA−II pts.

10

20

30

40

50

60

70

80

−300 −250 −200 −150 −100 −50 0

0

0.2

0.4

0.6

0.8

1

f2

KK

T P

roxi

mity

Mea

sure

f1

KKT Prox.

0

Fig. 37. 200 NSGA-II points on theobjective space and their respectiveKKT proximity measure values areshown for problem OSY at genera-tion 250.

reason for this lack of convergence, Figure 37 shows theNSGA-II points in the objective space and their correspondingKKT proximity measure values at generation 250. It is clearfrom the figure that a few NSGA-II points have a large KKTproximity measure value. Clearly, minimum f1 solutions haveconverged quickly, whereas minimum f2 solutions have stillnot converged to their true efficient points even until 250generations. This information about differential convergencepattern can be used to employ a local search to improve non-converged solutions, a matter we discuss in Section VI.

E. Engineering Design Problems

Next, we consider two multi-objective optimization prob-lems from practice. In both problems, we have used the AASFformulation for the KKT proximity computation.

1) Welded Beam Design Problem: The welded beam designproblem has two objectives and a few non-linear constraints[2]. This problem is solved using NSGA-II with 60 populationmembers, initially created at random. Figure 38 shows thevariation of the smallest, median and largest KKT proximitymeasure values with generation counter. The figure showsthat although the first Pareto-optimal solution was found atgeneration 26, half of the population came close to the efficientfront after 33 generations. Thereafter, in few more generationsmost non-dominated points came close to the efficient front.The 75-th percentile of the population converges at genera-tion 49, but the last population member converges at 496-thgeneration. In practical problems, such behavior is expected.The use of a local search method to improve the final fewnon-converged population members will constitute a fastercomputational approach.

14

200 250 300

First Quartile

350 400

Median

450 500

Third Quartile

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

0

0.2

0.4

0.6

0.8

1

0 50 100 150

Smallest

Fig. 38. Generation-wise KKTproximity measure is computed fornon-dominated solutions of NSGA-II populations for the welded-beamdesign problem.

0 50 100

First Quartile

150 200

Median

250 300

Third Quartile

KK

T P

roxi

mity

Mea

sure

Generation Number

Largest

0

0.1

0.2

0.3

0.4

0.5

Smallest

Fig. 39. Generation-wise KKTproximity measure is computed fornon-dominated solutions of NSGA-II populations for the car side impactproblem.

2) Car Side Impact Problem: Next, we consider a three-objective CAR cab design problem having seven variablesand 10 constraints and 14 variable bounds. NSGA-III [13] isapplied with 92 initial population members. Figure 39 showsthe variation of KKT proximity measure with generations. Thefirst Pareto-optimal solution took 23 generations to appear,thereafter, it took another 21 generations for 50% of thepopulation members to converge to the efficient front. Thefinal solution took a total of 274 generations to converge tothe efficient front. The behaviors of NSGA-III on this problemand NSGA-II on the welded beam design problem seem tosimilar.

3) WATER Problem: Finally, we consider a five-objectiveWATER problem having three variables and seven constraints.NSGA-III [13] is applied with 68 initial population membersand using identical other parameter values as in previousproblems. Figure 40 shows the variation of KKT proximitymeasure with generations. An interesting and unknown prop-erty of this problem becomes clear. Since there are threevariables and five objective functions, the efficient front isat most three-dimensional. Most random solutions lie on thePareto-optimal set in this problem. It is clear from the figurethat the 75-th percentile population member is on the efficientfront at the initial population. After 47 generations, the lastpopulation member reaches close (within ε∗ ≤ 0.01) to theefficient front. Since this problem is now found to be simple

1e−05

First QuartileMedian

Third QuartileLargest

0.0001

0.001

0.01

0.1

1

0 50 100 150 200

KK

T P

roxi

mity

Mea

sure

Generation Number

1e−06

Smallest

Fig. 40. Generation-wise KKT proximity measure is computed for non-dominated solutions of NSGA-II populations for the WATER problem.

to get to the efficient front easily, we do not recommend this

problem to be used as a test problem in many-objective studies.

VI. KKT PROXIMITY MEASURE FOR IMPROVED EMOAPPLICATIONS

The KKT proximity measure value for each non-dominatedsolution in an EMO population can also be used differentlyto either improve the convergence behavior or to set up atermination condition of an EMO. We discuss these possiblefuture studies in the following.

A. Termination Criterion

Above results amply demonstrate that the proposed KKTproximity measure can be used to terminate a simulation run,as the measure is able to indicate whether multiple trade-off points are truly all close to the efficient front or not. Inthis section, we demonstrate the possibility of using the KKTproximity measure as a termination criterion by presentingfurther simulation runs.

For each of the problems discussed above, we perform thefollowing study. To reduce computational burden, we computethe KKT proximity measure (KKTPM) values for all currentnon-dominated solutions in the population after every fivegenerations. When the median KKTPM value of the currentnon-dominated set reaches a threshold value or less (say, ηT ),we terminate the run. The procedure is repeated from 25 differ-ent initial populations, and minimum, median and maximumgenerations to termination are recorded. They are tabulated inTable I for two different threshold (ηT ) values. All two variable

TABLE IMINIMUM, MEDIAN AND MAXIMUM NUMBER OF GENERATIONS (OUT OF

25 RUNS) NEEDED FOR 50% OF NON-DOMINATED SOLUTIONS TOACHIEVE A KKT PROXIMITY MEASURE VALUE OF ηT = 0.01 AND 0.001

FOR TERMINATING AN EMO ALGORITHM. ‘II’ MEANS NSGA-II AND ‘III’MEANS NSGA-III.

Problem ηT = 0.01 ηT = 0.001Best Median Worst Best Median Worst

ZDT1 10 15 30 20 45 75ZDT2 10 20 35 20 30 50ZDT3 10 15 20 15 20 35ZDT4 45 60 125 50 80 145ZDT6 15 20 25 15 25 30TNK 0 5 5 5 10 15OSY 65 90 145 95(14) 180(14) 245(14)BNH (II) 350(1) 350(1) 350(1) – – –BNH (III) 45 70 165 330(11) 455(11) 490(11)SRN (II) 310(5) 350(5) 370(5) – – –SRN (III) 110 235 420 295(3) 325(3) 385(3)DTLZ1-3 300 370 430 400 440 480DTLZ1-5 310 360 400 430 470 520DTLZ1-10 220 250 300 520 550 600DTLZ2-3 20 25 30 60 80 90DTLZ2-5 60 70 80 190 230 265DTLZ2-10 80 100 130 200 290 380DTLZ5-3 15 15 25 30 40 45WELD 20 35 155 35(20) 95(20) 265(20)WATER 0 0 0 0 0 0CAR 15 25 40 35 60 120

problems are solved using NSGA-II [12] and all three andmany-objective problems are solved using NSGA-III [5]. The

15

numbers in brackets indicate the number of runs (out of 25) themedian KKT proximity measure value has reduced below thethreshold value of ηT in the event of not all 25 runs satisfyingthe specified termination condition. The table indicates thatthe KKT proximity measure can be used as a terminationcondition, as in most cases the same order of magnitude in theterminating generation number was found to be adequate overmultiple runs. For ZDT problems, the termination happenswithin a few tens of generations. For DTLZ1 problem, NSGA-III satisfies the termination condition within a few hundredsof generations consistently over multiple runs, but for DTLZ2problem, NSGA-III requires a few tens of generations. BNHand SRN problems were found to be difficult to solve to KKT-optimality using NSGA-II, but NSGA-III seems to performbetter on these two problems due to its external guidanceproperty. The WATER problem is found to be extremelyeasy and 50% non-dominated solutions were found to lie onthe efficient front right from the initial generation. In theabove experiments, KKT proximity measure was computedafter every 5 generations, but a future parametric study isneeded to establish a reasonable gap in generations for KKTproximity measure computation so as to make a good balanceof extra computations involved in computing it versus theextent of change in KKT proximity measure between two suchtermination checks.

B. Develop a More Convergent EMO

The KKT proximity measure values for different non-dominated population members provide a relative understand-ing of convergence pattern at various regions of the efficientfront. Refer to Figure 37 in which it was observed thatafter 250 generations, NSGA-II is able to find solutions veryclose to the true Pareto-optimal solutions for the minimum f1solutions, but many solutions near minimum f2 region werenot close to their respective Pareto-optimal solutions. The solu-tions having a larger KKT proximity measure value, indicatingtheir relatively slow convergence properties, can be recognizedearly on during a simulation run. This information can beutilized in an EMO in many different ways to improve EMO’soverall convergence behavior. For example, the region havingpoor KKT proximity values can be subjected to additionalfocused search, such as using a local search method, so thata faster convergence can be achieved for those regions. Sincethe variation of KKT proximity measure for different regionsof the trade-off frontier can now be available, the rate ofconvergence of different regions can also be easily calculatedand an adequate emphasis of search can be provided at regionshaving slower rate of convergence. This was not possible be-fore on non-test problems for which no knowledge of Pareto-optimal solutions are known beforehand. KKTPM allows usto get a comprehensive idea of relative convergence behaviorof different non-dominated solutions without knowing the truelocation of the Pareto-optimal solutions.

C. Other Types of Multi-Objective Optimization Problems

The KKT proximity measure can also be used for othertypes of multi-objective optimization studies, such as for

evaluating convergence behavior of preference-based EMOmethods – Reference point based EMO [25] or ‘light beam’based EMO [26], etc.

Although gradients of objective and constraint functions areneeded to compute the KKT proximity measure value at anypoint, the use of numerical gradients (from function value in-formation at two or more neighboring points) can be explored.This may allow the applicability of the proposed method to awide variety of problems. For this purpose, the subdifferentials[27] and related KKT optimality theories [19] can be used withour proposed KKTPM construction methodology and appliedto certain non-differentiable problems.

D. Develop an Efficient Island-based Parallel EMO

In an island-based parallel application of EMO, a differentEMO with separate cone-domination concept was suggested tobe to applied on a different processor [28]. A KKT proximitymeasure computation for each processor would provide theknowledge of fast and slow-converging islands (or processors).To have an overall efficient search method, best solutionsfrom a convergent processor can be migrated to other less-convergent processors in the hope of speeding up the overallconvergence of the algorithm to improve the performance ofisland-based parallel EMO algorithms.

VII. CONCLUSIONS

This paper has extended the concept of approximate KKTpoint definition proposed in an earlier study but for single-objective optimization problems to multi-objective optimiza-tion. For each point, an equivalent achievement scalarizingfunction is developed and a KKT proximity metric is definedto estimate the proximity of the solution from a correspondingPareto-optimal solution. The idea is novel and has beendemonstrated to provide a reducing proximity measure valuewith the solutions getting closer to the true efficient front onmany test problems and a few engineering design problems.

The KKT proximity measure of a set of non-dominatedsolutions of an EMO population can be utilized to constitutea faster and more convergent method. Some viable methodshave been discussed in this paper. One important outcome isthat the proposed metric can be used to terminate an EMOrun as an indicator of overall convergence behavior (and theirproximity) of non-dominated points to the the true Pareto-optimal solutions.

The computation of the metric requires gradients of theobjective and constraint functions at each point, hence themetric is computable only for differentiable problems at thisstage. Further studies are needed to extend the idea to non-differentiable and discrete problems. To reduce the compu-tational complexity of overall method, it is also suggestedto compute the KKT proximity metric value after every fiveor 10 generations. Despite these limitations, the connectionof EMO-obtained solutions being close to theoretical KKTpoints in a problem remains as a hallmark achievement of thispaper. Although gradients are not used in an EMO algorithmfor updating one population to its next, the fact that the

16

overall search process directs the population towards theoret-ical Pareto-optimal points remains interesting and intriguing.Certainly, such studies close the gap between theoretical andcomputational optimization studies and bring respects to bothmethods in utilizing tools of one approach to evaluate thebehavior of the other.

ACKNOWLEDGMENTS

Authors acknowledge the efforts of Mr. Haitham Saeda, aPhD student of Michigan State University, in providing us withresults of NSGA-III procedure on certain many-objective testproblems.

REFERENCES

[1] K. Miettinen, Nonlinear Multiobjective Optimization. Boston: Kluwer,1999.

[2] K. Deb, Multi-objective optimization using evolutionary algorithms.Chichester, UK: Wiley, 2001.

[3] L. While, P. Hingston, L. Barone, and S. Huband, “A faster algorithmfor calculating hypervolume,” IEEE Transactions on Evolutionary Com-putation, vol. 10, no. 1, pp. 29–38, 2006.

[4] J. Bader, K. Deb, and E. Zitzler, “Faster hypervolume-based search usingMonte Carlo sampling,” in Proceedings of Multiple Criteria DecisionMaking (MCDM 2008). Heidelberg: Springer, 2010, pp. 313–326,lNEMS-634.

[5] K. Deb and H. Jain, “An evolutionary many-objective optimizationalgorithm using reference-point based non-dominated sorting approach,Part I: Solving problems with box constraints,” IEEE Transactions onEvolutionary Computation, vol. 18, no. 4, pp. 577–601, 2014.

[6] G. Haeser and M. L. Schuverdt, “Approximate KKTconditions for variational inequality problems,” OptimizationOnline, 2009. [Online]. Available: http://www.optimization-online.org/DB HTML/2009/10/2415.html

[7] R. Andreani, G. Haeser, and J. M. Martinez, “On sequential optimalityconditions for smooth constrained optimization,” Optimization, vol. 60,no. 5, pp. 627–641, 2011.

[8] R. Andreani, J. M. Martinez, and B. F. Svaiter, “A new sequentialoptimality condition for constrained optimization and algorithmic con-sequences,” SIAM. J. OPT, vol. 20, pp. 3533–3554, 2010.

[9] J. Dutta, K. Deb, R. Tulshyan, and R. Arora, “Approximate KKTpoints and a proximity measure for termination,” Journal of GlobalOptimization, vol. 56, no. 4, pp. 1463–1499, 2013.

[10] R. Tulshyan, R. Arora, K. Deb, and J. Dutta, “Investigating ea solutionsfor approximate KKT conditions for smooth problems,” in Proceedingsof Genetic and Evolutionary Algorithms Conference (GECCO-2010).ACM Press, 2010, pp. 689–696.

[11] E. Zitzler, K. Deb, and L. Thiele, “Comparison of multiobjectiveevolutionary algorithms: Empirical results,” Evolutionary ComputationJournal, vol. 8, no. 2, pp. 125–148, 2000.

[12] K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, “A fast and elitistmulti-objective genetic algorithm: NSGA-II,” IEEE Transactions onEvolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.

[13] H. Jain and K. Deb, “An evolutionary many-objective optimizationalgorithm using reference-point based non-dominated sorting approach,Part II: Handling constraints and extending to an adaptive approach,”IEEE Transactions on Evolutionary Computation, vol. 18, no. 4, pp.602–622, 2014.

[14] V. Chankong and Y. Y. Haimes, Multiobjective Decision Making Theoryand Methodology. New York: North-Holland, 1983.

[15] A. Messac and C. A. Mattson, “Normal constraint method with guaran-tee of even representation of complete Pareto frontier,” AIAA Journal,in press.

[16] P. Shukla and K. Deb, “On finding multiple Pareto-optimal solutionsusing classical and evolutionary generating methods,” European Journalof Operational Research (EJOR), vol. 181, no. 3, pp. 1630–1652, 2007.

[17] E. Zitzler, M. Laumanns, and L. Thiele, “SPEA2: Improving thestrength Pareto evolutionary algorithm for multiobjective optimization,”in Evolutionary Methods for Design Optimization and Control withApplications to Industrial Problems, K. C. Giannakoglou, D. T. Tsahalis,J. Periaux, K. D. Papailiou, and T. Fogarty, Eds. Athens, Greece:International Center for Numerical Methods in Engineering (CIMNE),2001, pp. 95–100.

[18] Q. Zhang and H. Li, “MOEA/D: A multiobjective evolutionary algorithmbased on decomposition,” Evolutionary Computation, IEEE Transactionson, vol. 11, no. 6, pp. 712–731, 2007.

[19] C. R. Bector, S. Chandra, and J. Dutta, Principles of OptimizationTheory. New Delhi: Narosa, 2005.

[20] R. T. Rockafellar, Convex Analysis. Princeton University Press, 1996.[21] K. Deb, R. Tiwari, M. Dixit, and J. Dutta, “Finding trade-off solutions

close to KKT points using evolutionary multi-objective optimization,” inProceedings of the Congress on Evolutionary Computation (CEC-2007).Piscatway, NJ: IEEE Press, 2007, pp. 2109–2116.

[22] K. Deb and R. B. Agrawal, “Simulated binary crossover for continuoussearch space,” Complex Systems, vol. 9, no. 2, pp. 115–148, 1995.

[23] A. P. Wierzbicki, “The use of reference objectives in multiobjectiveoptimization,” in Multiple Criteria Decision Making Theory and Appli-cations, G. Fandel and T. Gal, Eds. Berlin: Springer-Verlag, 1980, pp.468–486.

[24] K. Deb, “Multi-objective genetic algorithms: Problem difficulties andconstruction of test problems,” Evolutionary Computation Journal,vol. 7, no. 3, pp. 205–230, 1999.

[25] K. Deb, J. Sundar, N. Uday, and S. Chaudhuri, “Reference point basedmulti-objective optimization using evolutionary algorithms,” Interna-tional Journal of Computational Intelligence Research (IJCIR), vol. 2,no. 6, pp. 273–286, 2006.

[26] K. Deb and A. Kumar, “Light beam search based multi-objective opti-mization using evolutionary algorithms,” in Proceedings of the Congresson Evolutionary Computation (CEC-07), 2007, pp. 2125–2132.

[27] F. H. Clarke, Optimization and nonsmooth analysis. Wiley-Interscience,1983.

[28] K. Deb, P. Zope, and A. Jain, “Distributed computing of Pareto-optimalsolutions using multi-objective evolutionary algorithms,” in Proceedingsof the Second Evolutionary Multi-Criterion Optimization (EMO-03)Conference (LNCS 2632), 2003, pp. 535–549.

APPENDIX

Here we present the computational procedure of the KKTproximity measure with the augmented achievement scalariz-ing function approach to avoid zero KKT proximity metricvalue for weak Pareto-optimal solutions.

The optimization problem described in Equation 20 is nowchanged for the augmented ASF approach, as follows:

Min.(x,xn+1) F (x, xn+1) = xn+1

s.t.(

fi(x)−ziwk

i

)+ ρ

∑Mj=1

(fi(x)−zi

wki

)− xn+1 ≤ 0, ∀i,

gj(x) ≤ 0, j = 1, 2, . . . , J.(35)

This makes the Gj(y) with y = (x;xn+1) updated as follows:

Gj(y) =(

fj(x)−zjwk

j

)+ ρ

∑Mj=1

(fi(x)−zi

wki

)− xn+1 ≤ 0,

j = 1, 2, . . . ,M.(36)

Thus, for a given iterate (or solution) xk (feasible or infeasi-ble), the optimization problem leading to the computation ofthe KKT proximity measure with the AASF approach remainsas in Equation 23 with the above-mentioned modificationin the Gj(y) function. The first constraint now becomes as

17

follows:∥∥∥∥∥∥∥∥∥∥∥∥∥∥

00...01

+

∑Mi=1 ui

1wk

i

∂fi(xk)

∂x1+ ρ

∑Mj=1

∂fj(xk)

wkj

1wk

i

∂fi(xk)

∂x2+ ρ

∑Mj=1

∂fj(xk)

wkj

...1wk

i

∂fi(xk)

∂xn+ ρ

∑Mj=1

∂fj(xk)

wkj

−1

+∑J

j=1 uM+j

∂gi(xk)

∂x1

∂gi(xk)

∂x2

...∂gi(x

k)∂xn

0

∥∥∥∥∥∥∥∥∥∥∥∥∥≤ √

εk.

(37)

The above inequality becomes as follows:∥∥∥∑Mi=1

(ui

wki

∇fi(xk) + ρui

∑Mj=1

∇fj(xk)

wkj

)+∑J

i=1 uM+i∇gi(xk)∥∥∥2

+(1−∑M

i=1 ui

)2− εk ≤ 0.

(38)The second constraint in problem described in Equation 23becomes as follows:

−εk −∑M

i=1 ui

(fi(xk)−zi

wki

+ ρ∑M

j=1fj(xk)−zj

wkj

− xn+1

)−∑J

j=1 uM+jgj(xk) ≤ 0.

(39)This completes the formulation of the optimization problemfor the AASF approach.

Like before, the optimization problem given in Equation 23can be used with the above modifications and an optimalsolution (ε∗, x∗

n+1,u∗) can be found. Then, the KTT proximity

measure for an iterate xk can be computed as follows:

Augmented KKT Proximity Measure(xk) ={ε∗k, if xk is feasible,1 +

∑Jj=1

⟨gj(x

k)⟩2

, otherwise.(40)