MULTI-OBJECTIVE OPTIMIZATION : Part I : Concurrent ... · Pareto optimality 2 Part II :...

Multiple-Gradient DescentAlgorithm

Jean-Antoine Désidéri

Part I : ConcurrentEngineering, the quest for thePareto front

Introduction and objective

Pareto optimality

Part II : Construction ofdescent algorithms indifferentiable optimization

Problematics

Svaiter’s method and mine

Two lemmas

Application : MGDA

Pareto-stationarity and two theorems

QP-formulation

Algorithm by HierarchicalOrthogonalization

Directional derivatives

Case of a linearly-independentfamily

Case of a linearly-dependent family

Application to parametricoptimization

MULTI-OBJECTIVE OPTIMIZATION :– Part I : Concurrent Engineering, the quest

for the Pareto front– Part II : Construction of descent

algorithms in differentiable optimization


INRIA, Sophia Antipolis Méditerranée Centerand ONERA - The French Aerospace Lab

http://www-sop.inria.fr/acumes

A practical introduction to algorithms in multiobjectiveoptimization, Wednesday 27, May 2015, Inria CRISAM

1 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline

1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality

2 Part II : Construction of descent algorithms in differentiableoptimization

ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization

2 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Multi-objective optimizationExamples in aerodynamic design in Aeronautics

• Why optimization in engg. sciences is usually multi-objective ?

• Multi-criterion (single-flow conditions)– e.g. lift and moments (stability/maneuverability)

• Multi-point (several flow conditions) e.g.:– drag reduction at several cruise conditions (towards “robust design”), or

– lift maximization at take-off or landing conditions, drag reduction at cruise

• Multi-discipline (Aerodynamics + others)– e.g. aerodynamic performance versus criteria related to:structural design, acoustics, thermal loads, etc

– Special case: ’preponderant’ or ’fragile’ discipline

• Parametric versus functional criterion

3 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Objective and notations

Objectiveintroduce cost-efficient algorithms able to determine appropriatetrade-offs between concurrent minimization problems associatedwith several criteria

General notationsAre given:

• Integers n and m, both ≥ 1

• An admissible open domain Ωa ⊂ Rn

• m objective-functions fj (x) (1≤ j ≤m, x ∈ Ωa)

4 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline




5 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Pareto optimality

Dominance in efficiency(see e.g. K. Miettinen: Nonlinear Multiobjective Optimization, Kluwer Academic Publishers)

Design-point x1 dominates design-point x2 in efficiency, x1 x2, iff

fj (x1)≤ fj (x2) (∀j = 1, . . . ,m)

and at least one inequality is strict. (relationship of partial order)

The designer’s Holy Grail : the Pareto setset of "non-dominated design-points" or "Pareto-optimal solutions"

The Pareto frontImage of the Pareto set in the objective function spaceIt bounds the domain of attainable performance

6 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






WARNING

Pareto-optimality is a global notion !!!

φ(t) =

0 if t ≤ 1

−exp

(1

1− t2

)exp(−(t−2)2) if t > 1 .

φ(t) = 0φ(t)→ 0

C∞

t = t?

Imagine a first Pareto set, bounded and contained in the ball B(x0,R).Replace each objective-function fj (x) by fj (x) = fj (x) + Aφ

(‖x−x0‖2 /R2

).

Modification is infinitely-smooth and bounded, and equal to 0 inside the ball.

For A > 0 and sufficiently large, the points over the circle‖x−x0‖= R

√t? outperform all points of original Pareto set =⇒

no solely-local criterion permits to conclude on Pareto-optimality.NEXT

7 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Pareto front identificationFavorable situation: continuous and convex Pareto frontGradient-based optimization can be used efficiently (see Part II)

Non-convex or discontinuous Pareto fronts do existEvolutionary strategies or GA’s most commonly-used forrobustness: NSGA-II (Nondominated Sorting Genetic Algorithm, Srinivas & Deb, 1994)

f1(x) = ∑n−1i=1

(−10exp

(−√

x2i +x2

i+15

))f2(x) = ∑

ni=1

(|xi |

45 + 5sinx3

i

)n = 3

(From: A Fast and Elitist Multiobjective GeneticAlgorithm: NSGA-II, K. Deb, A. Pratap, S.Agarwal, and T. Meyarivan, IEEE Transactions inEvolutionary Computation, Vol. 6, No. 2, April2002.)

Nondominated solutions with NSGA-II on KUR (testcase by F. Kursawe, 1990)See also :

• NPGA : Niched Pareto Genetic Algorithm, Goldberg et al, 1994

• MOGA : Multiobjective Genetic Algorithm, Fonseca et al, 1998

• SPEA : Strength Pareto Evolutionary Algorithm, Zitzler et al, 1999

• PAES : Pareto Archived Evolution Strategy, Knowles & Corne, 1999

8 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






NSGA II - Deb et al

GA relying on fitness function related to front index

fB

fA

9 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Example of airfoil shapeconcurrent optimization

fA: transonic- cruise pressure drag (minimization);fB : subsonic take-off or landing lift (maximization);

Euler equations; Marco et al, INRIA RR 3686 (1999).

0

0.1

0.2

0.3

0.4

0.5

0 0.1 0.2 0.3 0.4 0.5

J2

J1

FINE GRIDCOARSE GRID

0

0.1

0.2

0.3

0.4

0.5

0 0.1 0.2 0.3 0.4 0.5

J2

J1

FINE-GRID PARETO SETCOARSE-GRID PARETO SET

Accumulated populations and Pareto sets(independent simulations on a coarse and a fine meshes)

https://hal.inria.fr/inria-00072983

10 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Airfoil shapes ofPareto-equilibrium front

Non-dominated designs

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0 0.5 1 1.5 2 2.5 3 3.5

subsonichigh-lift

transoniclow-drag

11 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Numerical efficiency

• Principal merits

• Very rich unbiased information provided to designer• Very general : applies to non-convex, or discontinuous

Pareto-equilibrium fronts

• Main disadvantages

• Incomplete sorting (decision still to be made)• Very costly

12 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Alternatives to costlyPareto-front identification

1. Agglomerated criterion

Minimize agglomerated criterion

f (x) = ∑j

wj fj (x)

for some appropriate weighting constants wjHomogeneity of physical dimensions requires that : [wj ]∼ [fj (x)]−1

Unphysical, arbitrary, lacks of generality, ...

Similar alternative :• First, solve m independent single-objective minimizations :

f ?j = min fj (x) (∀j = 1, . . . ,m)

• Second, solve the following multi-constrained single-objectiveminimization problem :

min T subject to : fj ≤ f ?j + wj T 13 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Alternatives (cont’d)2. Pointwise determination of Pareto front

min fAs.t. fB = βj

βj

min fBs.t. fA = αi

αifA

fB

Shortcomings:

• Functional constraints• Logically complex in case of:

• numerous criteria• non-convex or discontinuous Pareto front

14 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Alternatives (cont’d)3. MDO, Meta-models

Multi-Disciplinary Optimization, MDO

• For each discipline A,B, ..., consider a hierarchy of models andcorresponding criteria based on a METAMODEL (POD, ANN,Kriging, surface response, interpolation, ...);

• Devise a multi-level strategy for multi-objective optimization in whichcomplexity is gradually introduced.

Examples

• Bi-Level Integrated System Synthesis, BLISS,Sobieszczanski-Sobieski et al, and variants

• Optimisation multidisplinaire en mécanqiue 1 et 2, Hermes Lavoisier,Paris (2009) (Proc. of French ANR Project, "OMD")

• Combination with parallel computing : see Prof. K. Giannakoglou’sweb site for acceleration techniques using parallel computing:http://velos0.ltt.mech.ntua.gr/research/

15 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Alternatives (end)4. Game strategies

• Symmetrical game:Nash

• Unsymmetrical or hierarchical game:Stackelberg (leader-follower)

e.g. : Multiobjective Design Optimization Using Nash Games, J.-A.Désidéri, R. Duvigneau and A. Habbal, in : ComputationalIntelligence in Aerospace Sciences, M. Vasile and V. M. BecerraEds., Progress in Astronautics and Aeronautics, volume 244, T. C.Lieuwen, Ed. in Chief, AIAA Publish. Reston VA (2014).

16 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline




17 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Setting

Are given :

• Two integers m and n, both ≥ 1

• An admissible open domain Ωa ⊂ Rn

• m differentiable objective-functions fj (x) (x ∈ Ωa; j = 1, . . . ,m)to be minimized

• An n×n real-symmetric positive-definite matrix An to definethe scalar product, and norm(

x,y)

= xt Any , ‖x‖=√

xt Anx (∀x,y ∈ Rn)

(superscript t : transposition).

18 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






What are the questions ?

Problem formulationGiven x0 ∈ Ωa, and the gradients ∇fj (x0), can we find a nonzerovector d such that :

∀j = 1, . . . ,m : ∇fj (x0)t d > 0 .

If yes, −d will be said to be a local descent direction common to allobjective-functions ?

We know the answer is ’yes’ if and only if x0 is not Pareto-optimal.Then, infinitely-many solutions exist. But :– Can we detect Pareto-optimality ?– If x0 is not Pareto-optimal, can we construct an appropriate vectord ?

19 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline




20 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Theoretical Optimal Descent

x(v) = x0 + v

Determine ’optimal v ’ as follows :

v? = argminv maxj

[fj(x(v)

)− fj(x0)]

︸︷︷︸<0 (∀j) for certain v ’s

and if the gradients, f ′j = ∇fi(x0), and Hessians,Hj = ∇2fj(x0) are known,[

fj(x(v)

)− fj (x0)

]=(f ′j ,v)

+ 12

(v ,Hjv

)+ . . .

and by neglecting ’. . . ’ :

v? .= argmin

vmax

j

[(f ′j ,v)

+ 12

(v ,Hjv

)](here

(., .)

denotes the usual Euclidean scalar product).21 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Svaiter et al ’s approach1

1. Steepest descent methods for multiciteria optimization,Fliege & Svaiter, Math Meth Oper Res (2000) 51:479-494

1. Characterize Pareto ’critical’ pointsby the condition :

range(J′)∩ (−R++)n = /0

where J′ is the Jacobian matrix,

J′ =

∂f1∂x1

∂f1∂x2

. . . ∂f1∂xN

∂f2∂x1

∂f2∂x2

. . . ∂f2∂xN

......

...∂fn∂x1

∂fn∂x2

. . . ∂fn∂xN

and R++ is the set of strictly-positive real numbers.

(This condition is equivalent to my "Pareto stationarity condition"to-come)

22 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Svaiter et al ’s approach2

2. Define d =−v? as the solution of min−maxproblem :

v? .= argmin

vfy(v) + 1

2 ||v ||2

where :fy(v) = max

(J′v)

i | i = 1, . . . ,n.

Justification : the quadratic term in the ’optimal v ’ acts as a penaltyterm; it is replaced by ||v ||2.

Fliege & Svaiter

1 established convergence to Pareto-critical points;

2 relaxed the condition using norms other than Euclidean.

Grana Drummond & Svaiter extended the construction in a moretechnical publication : 2. A steepest descent method for vector optimization, Grana

Drummond & Svaiter, J. Comput App Math (2005) 175:395-414

23 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






My approach : MGDA

1. Define the descent direction ω on the basis of apurely geometrical principle,i.e. let

v =−ρω

and determine first ω such that :(f ′j ,ω

)> 0 (∀j)

2. Adjust the step-size by a similar min−maxoptimization based on exact or approximate Hessians,i.e. optimize ρ through a min−max.

24 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline




25 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Convex hull

DefinitionGiven a family of m vectors uj (uj ∈ Rn, j = 1, . . . ,m), the convexhull of the family is the set of convex combinations of these vectors :

U ==

u ∈ Rn / u =

m

∑i=1

αiui ; αi ≥ 0 (∀i);m

∑i=1

αi = 1

.

(closed, bounded and convex set)

26 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Existence and uniqueness ofminimum-norm element

Lemma 1The convex hull U admits a unique minimum-norm element

ω = argminu∈U‖u‖ .

Proof :- Existence : U is closed and ‖u‖ is a continuous function.- Uniqueness : suppose that ω1 = ω2 = argminu∈U ‖u‖ and letρ = ‖ω1‖= ‖ω2‖. One has :(

ω2 + ω1,ω2−ω1)

= ‖ω2‖2−‖ω1‖2 = 0 .

Hence, if ω = 12 (ω1 + ω2) and ω12 = ω2−ω1 : ω⊥ ω12. Consequently :

ρ2 = ‖ω2‖2 =

∥∥ω + 12 ω12

∥∥2= ‖ω‖2 + 1

4 ‖ω12‖2 .

But ω ∈ U and ‖ω‖ ≥ ρ. Hence ω12 = 0.

27 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Fundamental property of theminimum-norm element

Lemma 2The minimum-norm element ω of the convex hull U is such that :

∀u ∈ U :(u,ω

)≥ ‖ω‖2 .

Proof : let u ∈ U, arbitrary. Let δ = u−ω; by convexity of U :

∀ε ∈ [0,1], (1− ε)ω + εu = ω + εδ ∈ U ,

and by definition of ω, ‖ω + εδ‖ ≥ ‖ω‖, that is :(ω + εδ,ω + εδ

)−(ω,ω

)= 2ε

(ω,δ) + ε

2 ‖δ‖2 ≥ 0 ,

and this requires that the coefficient of ε be non-negative.

28 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline




29 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Application1

Let the family uj be the gradients of the objective-functions atsome given starting point x0 :

uj = ∇fj (x0) (j = 1, . . . ,m).

Identify the vector ω of Lemmas 1 and 2, and let :

d = Anω

As a result :(uj ,ω

)= ut

j Anω = ∇fj (x0)t d︸︷︷︸directional derivative

≥ ||ω||2 ≥ 0 .

Theorem 1If at x = x0, the vector d 6= 0, the direction −d is a descent directioncommon to all objective-functions.

30 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Application2 : MGDA

Multiple-Gradient Descent Algorithmgeneralizes the classical steepest-descent method to themulti-objective differentiable minimization problem :

x(k+1) = x(k)−ρd

where ρ > 0 is the step-size.

ConvergenceUnder "natural" assumptions, this algorithm converges to a pointwhere ω = d = 0, that is, a "Pareto-stationary point" according tothe following definition.

31 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline




32 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Pareto-stationarity

Definition of Pareto stationarityWe say that the point x0 ∈ Ωa is Pareto-stationary, iff there exists aconvex combination of the gradients evaluated at x = x0 that isequal to 0 :

∃α = αj ∈ R+m tel que :m

∑j=1

αj∇fj (x0) = 0 ,m

∑j=1

αj = 1 .

(Equivalently : ω = d = 0, or 0 ∈ U(x0).)

33 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Converse resultTheorem 2If x0 is Pareto-optimal, and if the objective-functions are convex insome open ball B about x0, the point x0 is Pareto-stationary.

Proof :uj = ∇fj (x0). Without loss of generality, suppose that fj (x0) = 0 (∀j).

Since x0 is "Pareto-optimal", a single, arbitrary criterion cannot be improved (= diminished below0) under the constraint of no-degradation of the others. In particular, x0 is a solution to theproblem of minimizing fm(x) under the constraint that the other criteria are maintained ≤ 0; thatis :

x0 ∈

argminx

fm(x) / subject to : fj (x)≤ 0 (∀j ≤m−1)

(1)

where here "argmin" stands for the set of points realizing the minimum; this set is not necessarilyreduced to the single point x0.Let Um−1 be the convex hull of the m−1 gradients u1,u2, . . . ,um−1 and

ωm−1 = arg minu∈Um−1

‖u‖

The existence, uniqueness and the following property of this element have already beenestablished (Lemmas 1 and 2) :(

ui ,ωm−1)≥ ‖ωm−1‖2 (∀i ≤m−1) .

34 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Proof of Theorem 2 (end)Two situations are then possible :

1 Either ωm−1 = 0, and the Pareto-stationarity condition is satisfied at x = x0 with αm = 0.

2 Or ωm−1 6= 0. Then let φj (ε) = fj (x0− εωm−1) (j = 1, . . . ,m−1) so that φi (0) = 0 andφ′i (0) =−

(ui ,ωm−1

)≤−‖ωm−1‖2 < 0, and for sufficiently-small ε :

φj (ε) = fj (x0− εωm−1) < 0 (∀j ≤m−1) .

This result confirms that for the constrained minimization problem (1), Slater’sconstraint-qualification condition 1 is satisfied. Hence, optimality requires the satisfactionof the Karush-Kuhn-Tucker (KKT) condition, that is, the Lagrangian must be stationary :

L = fm(x) +m−1

∑i=1

λi fi (x)

and this gives

um +m−1

∑j=1

λj uj = 0

in which λi > 0 (∀i ≤m−1) by saturation of the constraints (fj (x0) = 0) and signconvention. Finally, µ = 1 + ∑

m−1i=1 λi > 1. Thus by dividing the above equation by µ 6= 0,

the result is achieved.

1See : Boyd, S.; Vandenberghe, L. (2004) (pdf). Convex Optimization. Cambridge UniversityPress. ISBN 978-0-521-83378-3. Retrieved October 3, 2011], ou l’annexe de ce document.

35 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Provisional conclusions

1 Pareto-stationarity is a natural extension to the multi-objectivedifferentiable-optimization context of the classical notion ofstationarity in single-objective differentiable-optimization, in thesense that in both contexts, optimality requires stationarity.

2 Pareto-stationarity is not a sufficient condition forPareto-optimality which is a global notion (recallpreviously-mentioned counter example).

3 MGDA extends the classical steepest-descent method to themulti-objective optimization context.

Thus we are led to determine ω ...

36 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline




37 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Quadratic-Programmingformulation

Since

ω =m

∑j=1

αjuj

one way to determine ω is to solve for the coefficient-vector

α = αjthe following QP-problem :

minα∈Rm

12 α

t Hα , subject to : αj ≥ 0 (∀j) andm

∑j=1

αj = 1

where H = Ut U and U =

...

......

u1 u2 . . . um...

......

– Possible procedures : quadprog (MATLAB) or qpsolve (Scilab).– Note : if ω is unique, α may not be. 38 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline




39 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






GuidelinesApply component-wise scalingto the "raw gradients" uj whose components may not be physically homogeneous. Ifuj = ui,j, define the scales si = maxj |ui,j | and the matrix S = Diag

(si), and for all j , replace

vector uj by :gj = S−1uj .

Apply the Gram-Schmidt orthogonalization processto the family gj (j = 1, . . . ,m) and get the orthogonal vectors vj (j = 1, . . . , r). The integer r iscomputed and it is usually equal to the rank of the family (and sometimes less to it).

At stage j of the GS process, we are to compute the nextorthogonal vector vj :An integer µ, initially set to m, is used to identify the gradients found to be in the span ofpreviously-used ones at some point in the process (then µ := µ−1).

• the vectors g1,g2, . . . ,gj−1 have been redefined by reordering; the orthogonal vectorsv1,v2, . . . ,vj−1 have been computed on the basis of them; if µ < m, the vectorsgµ+1, . . . ,gm have been put aside after being found in their span;

• then, one selects a specific element g` in gj ,gj+1, . . . ,gµ, and permutations of indicesare made accordingly; the choice is aimed at making the conical sector encompassed bythe sub-family g1,g2, . . . ,gj as large as possible;

• the Euclidean norm of the new vector vj is fixed in such as way that the directionalderivatives associated with the gradients g1,g2, . . . ,gj and a provisional estimate of ω

are equal; and this has certain geometrical consequences.

Details to be found in : Révision de l’algorithme de descente à gradients multiples (MGDA) parorthogonalisation hiérarchique, Rapport de Recherche Inria No. 8710, Avril 2015. 40 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Algorithm1

Three phases

Phase 1 : Initialization• Define index of first element

k = argmaxi

minj

(j 6=i)

(gi ,gj

)(gi ,gi

)

permuteg1 gk

and set first element of orthogonal basis :

v1 = g1 .

• Set upper bound on rank

rmax = min(m,n) .

41 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Algorithm2a

Phase 2 : Gram-Schmidt orthogonalization processInitial settings : r := 1; µ := m.For j = 2,3, . . . , rmax (at most), do :

1 Calculate j−1st column of coefficients :

ci,j−1 =

(gi ,vj−1

)(vj−1,vj−1

) (∀i = j, . . . ,µ)

and update the following diagonal elements of matrix C as cumulatedsums :

ci,i := ci,i + ci,j−1 = ∑k<j

ci,k (∀i = j, . . . ,µ)

2 Identify index of least cumulated sum ` = argmini ci,i / j ≤ i ≤ µand compare c`,` with 1−TOL :

• If c`,` ≥ 1−TOL : set a := c`,`; go to 3 (END).• Otherwise (1− c`,` > TOL) : calculate next orthogonal vector vj

(next slide)

3 END : interrupt the orthogonalization process and proceed with thenext phase : Calculation of a provisional vector ω.

42 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Algorithm2b

Calculation of the orthogonal vector vj

• Permutation of the information relative to indices j and ` :

vectors : gj g` ,

lines j and ` in matrix C ,

and corresponding cumulated sums cj,j c`,` .

• Set Aj = 1− cj,j , and assign it to cj,j := Aj , and calculate :

vj =gj −∑k<j cj,k vk

cj,j

If vj = 0 :• permute information relative to the vectors gj and gµ

(and to the corresponding lines in matrix C);• µ := µ−1;• if j ≤ µ, return to 2; otherwise go to 3 (END).

• r := r + 1, j := j + 1.• If j ≤ µ, return to 1; otherwise go to 3 (END).

43 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Algorithm3

Calculation of a provisional vector ω

ω =r

∑j=1

βjvj

βj =1

‖vj‖2∑

rk=1

1‖vk‖2

=1

1 + ∑k 6=j‖vj‖2

‖vk‖2

44 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Illustration of HierarchicalAlgorithm1

OU1 = G5

U2 = G1

U3 = G3

U4 = G4

U5 = G2

ω1

ω

45 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Illustration of HierarchicalAlgorithm2

OU1 = G2

U2 = G4

U3 = G3

U4 = G1

U5 = G5

ω1

46 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline




47 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Directional derivatives1

First r derivatives, relative to g1, . . . ,grIf j ≤ r, one has :(

vj ,ω)

= βj ‖vj‖2 =1

∑rk=1

1‖vk‖2

= σ > 0

(independently of j); then :

(gj ,ω

)=(∑k≤j

cj,k vk ,ω)

=

(∑k≤j

cj,k

)σ = σ .

Thus let :d = Anω ,

so that :

gtj d = σ (∀j ≤ r) .

For the criteria associated with the new indices from 1 to r, the directionalderivatives in the direction of vector d are equal to a strictly-positiveconstant σ.

48 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation







Next µ− r derivatives, relative to gr+1, . . . ,gµIf r + 1≤ j ≤ µ, one has :

gj =r

∑k=1

cj,k vk + wj

where wj ⊥ G := EVv1,v2, . . . ,vr; hence wj ⊥ ω and :

gtj d =

(gj ,ω

)=

r

∑k=1

cj,k(vk ,ω

)=

(r

∑k=1

cj,k

)σ = cj,j σ≥ aσ

where a≥ 1 is a constant yielded by the algorithm.

For the criteria associated with the indices j = r + 1, . . . ,µ, the directionalderivatives in the direction of vector d are at least equal toaσ≥ (1−TOL)σ, that is, generally strictly superior to σ.

49 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation







Tail derivatives, relative to gµ+1, . . . ,gm, if anyIf µ < m, the provisional direction of search ω is not a descentdirection for the criteria of tail indices (j = µ + 1, . . . ,m), and afurther treatment by solving a QP-problem is necessary.

50 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline




51 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Revisiting the case of alinearly-independent family1

The rank r = m = µ, and a descent direction is known :

ω =r

∑j=1

βj vj

where βj =

(1 + ∑k 6=j

‖vj‖2

‖vk‖2

)−1

assuming An = In.

Another direction can be computed as follows :

• Define arithmetic average of original gradients ("bisector") :

ωr =1r

r

∑j=1

gj

• Identify an alternate metrics, through An 6= In, for which the familyg1, . . . ,gr is orthogonal, and calculate associated descentdirection :

d = Anωr52 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation







Let G = Sp(g1, . . . ,gr) = Sp(v1, . . . ,vr) anddecompose Rn into G ⊕G⊥ :Let

x = y + z , x′ = y′+ z′ (y,y′ ∈ G) (z,z′ ∈ G⊥) .

Then :y = Gη , y′ = Gη

′ ,

where G is the n× r matrix whose column-vectors are gj (j = 1, . . . , r).Then, define the scalar product in Rn as follows :(

x,x′)

:= ηtη′+ zt z′ .

For this new scalar product, the family gj (j = 1, . . . , r) is orthonormal.Alternately, an orthonormal basis w.r.t. the standard scalar product is givenby the column-vectors of matrix :

V = V∆−12

where V is made of the column-vectors vj, and ∆ = Diag(vt

j vj).

53 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation







Hence, the projection matrix onto the sub-space G is :

Π =r

∑j=1

vj vtj = VVt = V∆−1Vt

and this permits us to identify y from x :

y = Πx = V∆−1Vt x

ThusGt y = Gt Gη ,

where the matrix Gt G is r× r and invertible. This gives :

η =(Gt G

)−1Gt y =

(Gt G

)−1Gt V∆−1Vt x := Wx , and z = (I−Π)x .

Finally : (x,x′

)= xt Anx

where :

An = Wt W + (I−Π)2 , W =(Gt G

)−1Gt V∆−1Vt .

54 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline




55 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Case of a linearly-dependentfamily

In particular, case where m n

We then propose to proceed as follows

• Apply Gram-Schmidt orthogonalization process as before;determine the rank r, the newly-ordered sub-family g1, . . . ,grand the corresponding orthogonal basis v1, . . . ,vr.

• Reformulate the QP-problem in the basis of the sub-familyg1, . . . ,gr, and identify the associated metrics (matrix An).

• Solve the reformulated QP-problem for vector ω, using aprocedure from MATLAB or Scilab, or any other appropriatelibrary. Many components are found equal to 0.

• Compute the descent direction d = Anω.

56 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Outline




57 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Application to parametricoptimization of control devices

Control of boundary layer over a flat plate by pulsatingjets (Régis Duvigneau)

each jet is defined by 2 parameters (amplitude and phase)=⇒ a total of 6 parameters ak

• Finite-volume simulation of the 2D time-dependentcompressible Navier-Stokes equns.

=⇒

flow : W (x ,y , t), and

outputs, e.g. drag : D(t) =∫

Γ µ ∂u∂y dx

• Simultaneous simulation of the time-dependent linearizedNavier-Stokes equns., so-called "Sensitivity equations"

=⇒

flow sensitivity : W ′(x ,y , t) := ∇aW , and

output sensitivity : g(t) := ∇aD(t) ∈ R6

58 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Drag over a time-periodInitial setting

59 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Drag sensitivities over atime-period

Initial setting

60 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Homogenized drag over atime-period

Initial setting

61 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Homogenized drag sensitivitiesover a time-period

Initial setting

62 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






One step of uniformoptimization of drag

63 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Homogenized drag over last40% of time-period

Initial setting

64 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Homogenized drag sensitivities over last40% of time-period

Initial setting

65 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






One optimization step focusedon final 40% of time-period

66 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Conclusion

Our tool permits us to conduct robust optimizationthat is, not through statistical criteria such as a weighted average,but by simultaneous control of each time slot.

In particular, this optimization can be conducted

• uniformly over the entire time-interval,

• or selectively over a sub-interval of specific interest.

67 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Summary1

Multi-Objective Optimization

• Pareto optimality : a global notionRequires semi-stochastic or hybrid searchMany robust methods : GA’s, EA’s and hybrids

• Differentiable optimization : Pareto-stationarityIdentified role of convex hull of gradientsProved Pareto-optimal points in open domain are Pareto stationary

• Svaiter’s formulation :Defines optimal step, but simplifies the Hessian quadratic term byEuclidean norm squared; examines substitution of norms

68 / 69





Pareto optimality


Problematics


Two lemmas

Application : MGDA


QP-formulation






Summary2

Multi-Objective Optimization (cont’d)

• My method : MGDA, Multiple-Gradient Descent Algorithm– Relies on a simple and general property of convex geometry toformulate a variational principle yielding a descent direction commonto all criteria in all situations except Pareto stationary points as thesolution of a QP-problem– An algorithm has been proposed and tested operating on the familyof gradients

• Linearly-independent family : explicit solution(s)• Linearly-dependent family : identified basis to simplify

formulation of QP-problem and easily calculate an appropriatedescent direction

• Parametric optimization of control devices in time-dependentNavier-Stokes flow (Régis Duvigneau)– 800 gradients calculated by the solution of the so-called "sensitivityequation", and homogenized to 20 for simplicity : obtained uniformreduction of criterion over the entire time-period, or over atime-segment of focused interest– to be generalized to other forms of robust parametric design

69 / 69

MULTI-OBJECTIVE OPTIMIZATION : Part I : Concurrent ... · Pareto optimality 2 Part II :...

Documents

Transcript of MULTI-OBJECTIVE OPTIMIZATION : Part I : Concurrent ... · Pareto optimality 2 Part II :...