Post on 18-Jul-2020
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
MULTI-OBJECTIVE OPTIMIZATION :– Part I : Concurrent Engineering, the quest
for the Pareto front– Part II : Construction of descent
algorithms in differentiable optimization
Jean-Antoine Désidéri
INRIA, Sophia Antipolis Méditerranée Centerand ONERA - The French Aerospace Lab
http://www-sop.inria.fr/acumes
A practical introduction to algorithms in multiobjectiveoptimization, Wednesday 27, May 2015, Inria CRISAM
1 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
2 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Multi-objective optimizationExamples in aerodynamic design in Aeronautics
• Why optimization in engg. sciences is usually multi-objective ?
• Multi-criterion (single-flow conditions)– e.g. lift and moments (stability/maneuverability)
• Multi-point (several flow conditions) e.g.:– drag reduction at several cruise conditions (towards “robust design”), or
– lift maximization at take-off or landing conditions, drag reduction at cruise
• Multi-discipline (Aerodynamics + others)– e.g. aerodynamic performance versus criteria related to:structural design, acoustics, thermal loads, etc
– Special case: ’preponderant’ or ’fragile’ discipline
• Parametric versus functional criterion
3 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Objective and notations
Objectiveintroduce cost-efficient algorithms able to determine appropriatetrade-offs between concurrent minimization problems associatedwith several criteria
General notationsAre given:
• Integers n and m, both ≥ 1
• An admissible open domain Ωa ⊂ Rn
• m objective-functions fj (x) (1≤ j ≤m, x ∈ Ωa)
4 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
5 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Pareto optimality
Dominance in efficiency(see e.g. K. Miettinen: Nonlinear Multiobjective Optimization, Kluwer Academic Publishers)
Design-point x1 dominates design-point x2 in efficiency, x1 x2, iff
fj (x1)≤ fj (x2) (∀j = 1, . . . ,m)
and at least one inequality is strict. (relationship of partial order)
The designer’s Holy Grail : the Pareto setset of "non-dominated design-points" or "Pareto-optimal solutions"
The Pareto frontImage of the Pareto set in the objective function spaceIt bounds the domain of attainable performance
6 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
WARNING
Pareto-optimality is a global notion !!!
φ(t) =
0 if t ≤ 1
−exp
(1
1− t2
)exp(−(t−2)2) if t > 1 .
φ(t) = 0φ(t)→ 0
C∞
t = t?
Imagine a first Pareto set, bounded and contained in the ball B(x0,R).Replace each objective-function fj (x) by fj (x) = fj (x) + Aφ
(‖x−x0‖2 /R2
).
Modification is infinitely-smooth and bounded, and equal to 0 inside the ball.
For A > 0 and sufficiently large, the points over the circle‖x−x0‖= R
√t? outperform all points of original Pareto set =⇒
no solely-local criterion permits to conclude on Pareto-optimality.NEXT
7 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Pareto front identificationFavorable situation: continuous and convex Pareto frontGradient-based optimization can be used efficiently (see Part II)
Non-convex or discontinuous Pareto fronts do existEvolutionary strategies or GA’s most commonly-used forrobustness: NSGA-II (Nondominated Sorting Genetic Algorithm, Srinivas & Deb, 1994)
f1(x) = ∑n−1i=1
(−10exp
(−√
x2i +x2
i+15
))f2(x) = ∑
ni=1
(|xi |
45 + 5sinx3
i
)n = 3
(From: A Fast and Elitist Multiobjective GeneticAlgorithm: NSGA-II, K. Deb, A. Pratap, S.Agarwal, and T. Meyarivan, IEEE Transactions inEvolutionary Computation, Vol. 6, No. 2, April2002.)
Nondominated solutions with NSGA-II on KUR (testcase by F. Kursawe, 1990)See also :
• NPGA : Niched Pareto Genetic Algorithm, Goldberg et al, 1994
• MOGA : Multiobjective Genetic Algorithm, Fonseca et al, 1998
• SPEA : Strength Pareto Evolutionary Algorithm, Zitzler et al, 1999
• PAES : Pareto Archived Evolution Strategy, Knowles & Corne, 1999
8 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
NSGA II - Deb et al
GA relying on fitness function related to front index
fB
fA
9 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Example of airfoil shapeconcurrent optimization
fA: transonic- cruise pressure drag (minimization);fB : subsonic take-off or landing lift (maximization);
Euler equations; Marco et al, INRIA RR 3686 (1999).
0
0.1
0.2
0.3
0.4
0.5
0 0.1 0.2 0.3 0.4 0.5
J2
J1
FINE GRIDCOARSE GRID
0
0.1
0.2
0.3
0.4
0.5
0 0.1 0.2 0.3 0.4 0.5
J2
J1
FINE-GRID PARETO SETCOARSE-GRID PARETO SET
Accumulated populations and Pareto sets(independent simulations on a coarse and a fine meshes)
https://hal.inria.fr/inria-00072983
10 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Airfoil shapes ofPareto-equilibrium front
Non-dominated designs
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0 0.5 1 1.5 2 2.5 3 3.5
subsonichigh-lift
transoniclow-drag
11 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Numerical efficiency
• Principal merits
• Very rich unbiased information provided to designer• Very general : applies to non-convex, or discontinuous
Pareto-equilibrium fronts
• Main disadvantages
• Incomplete sorting (decision still to be made)• Very costly
12 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Alternatives to costlyPareto-front identification
1. Agglomerated criterion
Minimize agglomerated criterion
f (x) = ∑j
wj fj (x)
for some appropriate weighting constants wjHomogeneity of physical dimensions requires that : [wj ]∼ [fj (x)]−1
Unphysical, arbitrary, lacks of generality, ...
Similar alternative :• First, solve m independent single-objective minimizations :
f ?j = min fj (x) (∀j = 1, . . . ,m)
• Second, solve the following multi-constrained single-objectiveminimization problem :
min T subject to : fj ≤ f ?j + wj T 13 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Alternatives (cont’d)2. Pointwise determination of Pareto front
min fAs.t. fB = βj
βj
min fBs.t. fA = αi
αifA
fB
Shortcomings:
• Functional constraints• Logically complex in case of:
• numerous criteria• non-convex or discontinuous Pareto front
14 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Alternatives (cont’d)3. MDO, Meta-models
Multi-Disciplinary Optimization, MDO
• For each discipline A,B, ..., consider a hierarchy of models andcorresponding criteria based on a METAMODEL (POD, ANN,Kriging, surface response, interpolation, ...);
• Devise a multi-level strategy for multi-objective optimization in whichcomplexity is gradually introduced.
Examples
• Bi-Level Integrated System Synthesis, BLISS,Sobieszczanski-Sobieski et al, and variants
• Optimisation multidisplinaire en mécanqiue 1 et 2, Hermes Lavoisier,Paris (2009) (Proc. of French ANR Project, "OMD")
• Combination with parallel computing : see Prof. K. Giannakoglou’sweb site for acceleration techniques using parallel computing:http://velos0.ltt.mech.ntua.gr/research/
15 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Alternatives (end)4. Game strategies
• Symmetrical game:Nash
• Unsymmetrical or hierarchical game:Stackelberg (leader-follower)
e.g. : Multiobjective Design Optimization Using Nash Games, J.-A.Désidéri, R. Duvigneau and A. Habbal, in : ComputationalIntelligence in Aerospace Sciences, M. Vasile and V. M. BecerraEds., Progress in Astronautics and Aeronautics, volume 244, T. C.Lieuwen, Ed. in Chief, AIAA Publish. Reston VA (2014).
16 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
17 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Setting
Are given :
• Two integers m and n, both ≥ 1
• An admissible open domain Ωa ⊂ Rn
• m differentiable objective-functions fj (x) (x ∈ Ωa; j = 1, . . . ,m)to be minimized
• An n×n real-symmetric positive-definite matrix An to definethe scalar product, and norm(
x,y)
= xt Any , ‖x‖=√
xt Anx (∀x,y ∈ Rn)
(superscript t : transposition).
18 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
What are the questions ?
Problem formulationGiven x0 ∈ Ωa, and the gradients ∇fj (x0), can we find a nonzerovector d such that :
∀j = 1, . . . ,m : ∇fj (x0)t d > 0 .
If yes, −d will be said to be a local descent direction common to allobjective-functions ?
We know the answer is ’yes’ if and only if x0 is not Pareto-optimal.Then, infinitely-many solutions exist. But :– Can we detect Pareto-optimality ?– If x0 is not Pareto-optimal, can we construct an appropriate vectord ?
19 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
20 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Theoretical Optimal Descent
x(v) = x0 + v
Determine ’optimal v ’ as follows :
v? = argminv maxj
[fj(x(v)
)− fj(x0)]
︸ ︷︷ ︸<0 (∀j) for certain v ’s
and if the gradients, f ′j = ∇fi(x0), and Hessians,Hj = ∇2fj(x0) are known,[
fj(x(v)
)− fj (x0)
]=(f ′j ,v)
+ 12
(v ,Hjv
)+ . . .
and by neglecting ’. . . ’ :
v? .= argmin
vmax
j
[(f ′j ,v)
+ 12
(v ,Hjv
)](here
(., .)
denotes the usual Euclidean scalar product).21 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Svaiter et al ’s approach1
1. Steepest descent methods for multiciteria optimization,Fliege & Svaiter, Math Meth Oper Res (2000) 51:479-494
1. Characterize Pareto ’critical’ pointsby the condition :
range(J′)∩ (−R++)n = /0
where J′ is the Jacobian matrix,
J′ =
∂f1∂x1
∂f1∂x2
. . . ∂f1∂xN
∂f2∂x1
∂f2∂x2
. . . ∂f2∂xN
......
...∂fn∂x1
∂fn∂x2
. . . ∂fn∂xN
and R++ is the set of strictly-positive real numbers.
(This condition is equivalent to my "Pareto stationarity condition"to-come)
22 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Svaiter et al ’s approach2
2. Define d =−v? as the solution of min−maxproblem :
v? .= argmin
vfy(v) + 1
2 ||v ||2
where :fy(v) = max
(J′v)
i | i = 1, . . . ,n.
Justification : the quadratic term in the ’optimal v ’ acts as a penaltyterm; it is replaced by ||v ||2.
Fliege & Svaiter
1 established convergence to Pareto-critical points;
2 relaxed the condition using norms other than Euclidean.
Grana Drummond & Svaiter extended the construction in a moretechnical publication : 2. A steepest descent method for vector optimization, Grana
Drummond & Svaiter, J. Comput App Math (2005) 175:395-414
23 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
My approach : MGDA
1. Define the descent direction ω on the basis of apurely geometrical principle,i.e. let
v =−ρω
and determine first ω such that :(f ′j ,ω
)> 0 (∀j)
2. Adjust the step-size by a similar min−maxoptimization based on exact or approximate Hessians,i.e. optimize ρ through a min−max.
24 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
25 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Convex hull
DefinitionGiven a family of m vectors uj (uj ∈ Rn, j = 1, . . . ,m), the convexhull of the family is the set of convex combinations of these vectors :
U ==
u ∈ Rn / u =
m
∑i=1
αiui ; αi ≥ 0 (∀i);m
∑i=1
αi = 1
.
(closed, bounded and convex set)
26 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Existence and uniqueness ofminimum-norm element
Lemma 1The convex hull U admits a unique minimum-norm element
ω = argminu∈U‖u‖ .
Proof :- Existence : U is closed and ‖u‖ is a continuous function.- Uniqueness : suppose that ω1 = ω2 = argminu∈U ‖u‖ and letρ = ‖ω1‖= ‖ω2‖. One has :(
ω2 + ω1,ω2−ω1)
= ‖ω2‖2−‖ω1‖2 = 0 .
Hence, if ω = 12 (ω1 + ω2) and ω12 = ω2−ω1 : ω⊥ ω12. Consequently :
ρ2 = ‖ω2‖2 =
∥∥ω + 12 ω12
∥∥2= ‖ω‖2 + 1
4 ‖ω12‖2 .
But ω ∈ U and ‖ω‖ ≥ ρ. Hence ω12 = 0.
27 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Fundamental property of theminimum-norm element
Lemma 2The minimum-norm element ω of the convex hull U is such that :
∀u ∈ U :(u,ω
)≥ ‖ω‖2 .
Proof : let u ∈ U, arbitrary. Let δ = u−ω; by convexity of U :
∀ε ∈ [0,1], (1− ε)ω + εu = ω + εδ ∈ U ,
and by definition of ω, ‖ω + εδ‖ ≥ ‖ω‖, that is :(ω + εδ,ω + εδ
)−(ω,ω
)= 2ε
(ω,δ) + ε
2 ‖δ‖2 ≥ 0 ,
and this requires that the coefficient of ε be non-negative.
28 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
29 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Application1
Let the family uj be the gradients of the objective-functions atsome given starting point x0 :
uj = ∇fj (x0) (j = 1, . . . ,m).
Identify the vector ω of Lemmas 1 and 2, and let :
d = Anω
As a result :(uj ,ω
)= ut
j Anω = ∇fj (x0)t d︸ ︷︷ ︸directional derivative
≥ ||ω||2 ≥ 0 .
Theorem 1If at x = x0, the vector d 6= 0, the direction −d is a descent directioncommon to all objective-functions.
30 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Application2 : MGDA
Multiple-Gradient Descent Algorithmgeneralizes the classical steepest-descent method to themulti-objective differentiable minimization problem :
x(k+1) = x(k)−ρd
where ρ > 0 is the step-size.
ConvergenceUnder "natural" assumptions, this algorithm converges to a pointwhere ω = d = 0, that is, a "Pareto-stationary point" according tothe following definition.
31 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
32 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Pareto-stationarity
Definition of Pareto stationarityWe say that the point x0 ∈ Ωa is Pareto-stationary, iff there exists aconvex combination of the gradients evaluated at x = x0 that isequal to 0 :
∃α = αj ∈ R+m tel que :m
∑j=1
αj∇fj (x0) = 0 ,m
∑j=1
αj = 1 .
(Equivalently : ω = d = 0, or 0 ∈ U(x0).)
33 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Converse resultTheorem 2If x0 is Pareto-optimal, and if the objective-functions are convex insome open ball B about x0, the point x0 is Pareto-stationary.
Proof :uj = ∇fj (x0). Without loss of generality, suppose that fj (x0) = 0 (∀j).
Since x0 is "Pareto-optimal", a single, arbitrary criterion cannot be improved (= diminished below0) under the constraint of no-degradation of the others. In particular, x0 is a solution to theproblem of minimizing fm(x) under the constraint that the other criteria are maintained ≤ 0; thatis :
x0 ∈
argminx
fm(x) / subject to : fj (x)≤ 0 (∀j ≤m−1)
(1)
where here "argmin" stands for the set of points realizing the minimum; this set is not necessarilyreduced to the single point x0.Let Um−1 be the convex hull of the m−1 gradients u1,u2, . . . ,um−1 and
ωm−1 = arg minu∈Um−1
‖u‖
The existence, uniqueness and the following property of this element have already beenestablished (Lemmas 1 and 2) :(
ui ,ωm−1)≥ ‖ωm−1‖2 (∀i ≤m−1) .
34 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Proof of Theorem 2 (end)Two situations are then possible :
1 Either ωm−1 = 0, and the Pareto-stationarity condition is satisfied at x = x0 with αm = 0.
2 Or ωm−1 6= 0. Then let φj (ε) = fj (x0− εωm−1) (j = 1, . . . ,m−1) so that φi (0) = 0 andφ′i (0) =−
(ui ,ωm−1
)≤−‖ωm−1‖2 < 0, and for sufficiently-small ε :
φj (ε) = fj (x0− εωm−1) < 0 (∀j ≤m−1) .
This result confirms that for the constrained minimization problem (1), Slater’sconstraint-qualification condition 1 is satisfied. Hence, optimality requires the satisfactionof the Karush-Kuhn-Tucker (KKT) condition, that is, the Lagrangian must be stationary :
L = fm(x) +m−1
∑i=1
λi fi (x)
and this gives
um +m−1
∑j=1
λj uj = 0
in which λi > 0 (∀i ≤m−1) by saturation of the constraints (fj (x0) = 0) and signconvention. Finally, µ = 1 + ∑
m−1i=1 λi > 1. Thus by dividing the above equation by µ 6= 0,
the result is achieved.
1See : Boyd, S.; Vandenberghe, L. (2004) (pdf). Convex Optimization. Cambridge UniversityPress. ISBN 978-0-521-83378-3. Retrieved October 3, 2011], ou l’annexe de ce document.
35 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Provisional conclusions
1 Pareto-stationarity is a natural extension to the multi-objectivedifferentiable-optimization context of the classical notion ofstationarity in single-objective differentiable-optimization, in thesense that in both contexts, optimality requires stationarity.
2 Pareto-stationarity is not a sufficient condition forPareto-optimality which is a global notion (recallpreviously-mentioned counter example).
3 MGDA extends the classical steepest-descent method to themulti-objective optimization context.
Thus we are led to determine ω ...
36 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
37 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Quadratic-Programmingformulation
Since
ω =m
∑j=1
αjuj
one way to determine ω is to solve for the coefficient-vector
α = αjthe following QP-problem :
minα∈Rm
12 α
t Hα , subject to : αj ≥ 0 (∀j) andm
∑j=1
αj = 1
where H = Ut U and U =
...
......
u1 u2 . . . um...
......
– Possible procedures : quadprog (MATLAB) or qpsolve (Scilab).– Note : if ω is unique, α may not be. 38 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
39 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
GuidelinesApply component-wise scalingto the "raw gradients" uj whose components may not be physically homogeneous. Ifuj = ui,j, define the scales si = maxj |ui,j | and the matrix S = Diag
(si), and for all j , replace
vector uj by :gj = S−1uj .
Apply the Gram-Schmidt orthogonalization processto the family gj (j = 1, . . . ,m) and get the orthogonal vectors vj (j = 1, . . . , r). The integer r iscomputed and it is usually equal to the rank of the family (and sometimes less to it).
At stage j of the GS process, we are to compute the nextorthogonal vector vj :An integer µ, initially set to m, is used to identify the gradients found to be in the span ofpreviously-used ones at some point in the process (then µ := µ−1).
• the vectors g1,g2, . . . ,gj−1 have been redefined by reordering; the orthogonal vectorsv1,v2, . . . ,vj−1 have been computed on the basis of them; if µ < m, the vectorsgµ+1, . . . ,gm have been put aside after being found in their span;
• then, one selects a specific element g` in gj ,gj+1, . . . ,gµ, and permutations of indicesare made accordingly; the choice is aimed at making the conical sector encompassed bythe sub-family g1,g2, . . . ,gj as large as possible;
• the Euclidean norm of the new vector vj is fixed in such as way that the directionalderivatives associated with the gradients g1,g2, . . . ,gj and a provisional estimate of ω
are equal; and this has certain geometrical consequences.
Details to be found in : Révision de l’algorithme de descente à gradients multiples (MGDA) parorthogonalisation hiérarchique, Rapport de Recherche Inria No. 8710, Avril 2015. 40 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Algorithm1
Three phases
Phase 1 : Initialization• Define index of first element
k = argmaxi
minj
(j 6=i)
(gi ,gj
)(gi ,gi
)
permuteg1 gk
and set first element of orthogonal basis :
v1 = g1 .
• Set upper bound on rank
rmax = min(m,n) .
41 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Algorithm2a
Phase 2 : Gram-Schmidt orthogonalization processInitial settings : r := 1; µ := m.For j = 2,3, . . . , rmax (at most), do :
1 Calculate j−1st column of coefficients :
ci,j−1 =
(gi ,vj−1
)(vj−1,vj−1
) (∀i = j, . . . ,µ)
and update the following diagonal elements of matrix C as cumulatedsums :
ci,i := ci,i + ci,j−1 = ∑k<j
ci,k (∀i = j, . . . ,µ)
2 Identify index of least cumulated sum ` = argmini ci,i / j ≤ i ≤ µand compare c`,` with 1−TOL :
• If c`,` ≥ 1−TOL : set a := c`,`; go to 3 (END).• Otherwise (1− c`,` > TOL) : calculate next orthogonal vector vj
(next slide)
3 END : interrupt the orthogonalization process and proceed with thenext phase : Calculation of a provisional vector ω.
42 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Algorithm2b
Calculation of the orthogonal vector vj
• Permutation of the information relative to indices j and ` :
vectors : gj g` ,
lines j and ` in matrix C ,
and corresponding cumulated sums cj,j c`,` .
• Set Aj = 1− cj,j , and assign it to cj,j := Aj , and calculate :
vj =gj −∑k<j cj,k vk
cj,j
If vj = 0 :• permute information relative to the vectors gj and gµ
(and to the corresponding lines in matrix C);• µ := µ−1;• if j ≤ µ, return to 2; otherwise go to 3 (END).
• r := r + 1, j := j + 1.• If j ≤ µ, return to 1; otherwise go to 3 (END).
43 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Algorithm3
Calculation of a provisional vector ω
ω =r
∑j=1
βjvj
βj =1
‖vj‖2∑
rk=1
1‖vk‖2
=1
1 + ∑k 6=j‖vj‖2
‖vk‖2
44 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Illustration of HierarchicalAlgorithm1
OU1 = G5
U2 = G1
U3 = G3
U4 = G4
U5 = G2
ω1
ω
45 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Illustration of HierarchicalAlgorithm2
OU1 = G2
U2 = G4
U3 = G3
U4 = G1
U5 = G5
ω1
46 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
47 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Directional derivatives1
First r derivatives, relative to g1, . . . ,grIf j ≤ r, one has :(
vj ,ω)
= βj ‖vj‖2 =1
∑rk=1
1‖vk‖2
= σ > 0
(independently of j); then :
(gj ,ω
)=(∑k≤j
cj,k vk ,ω)
=
(∑k≤j
cj,k
)σ = σ .
Thus let :d = Anω ,
so that :
gtj d = σ (∀j ≤ r) .
For the criteria associated with the new indices from 1 to r, the directionalderivatives in the direction of vector d are equal to a strictly-positiveconstant σ.
48 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Directional derivatives2
Next µ− r derivatives, relative to gr+1, . . . ,gµIf r + 1≤ j ≤ µ, one has :
gj =r
∑k=1
cj,k vk + wj
where wj ⊥ G := EVv1,v2, . . . ,vr; hence wj ⊥ ω and :
gtj d =
(gj ,ω
)=
r
∑k=1
cj,k(vk ,ω
)=
(r
∑k=1
cj,k
)σ = cj,j σ≥ aσ
where a≥ 1 is a constant yielded by the algorithm.
For the criteria associated with the indices j = r + 1, . . . ,µ, the directionalderivatives in the direction of vector d are at least equal toaσ≥ (1−TOL)σ, that is, generally strictly superior to σ.
49 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Directional derivatives3
Tail derivatives, relative to gµ+1, . . . ,gm, if anyIf µ < m, the provisional direction of search ω is not a descentdirection for the criteria of tail indices (j = µ + 1, . . . ,m), and afurther treatment by solving a QP-problem is necessary.
50 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
51 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Revisiting the case of alinearly-independent family1
The rank r = m = µ, and a descent direction is known :
ω =r
∑j=1
βj vj
where βj =
(1 + ∑k 6=j
‖vj‖2
‖vk‖2
)−1
assuming An = In.
Another direction can be computed as follows :
• Define arithmetic average of original gradients ("bisector") :
ωr =1r
r
∑j=1
gj
• Identify an alternate metrics, through An 6= In, for which the familyg1, . . . ,gr is orthogonal, and calculate associated descentdirection :
d = Anωr52 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Revisiting the case of alinearly-independent family2
Let G = Sp(g1, . . . ,gr) = Sp(v1, . . . ,vr) anddecompose Rn into G ⊕G⊥ :Let
x = y + z , x′ = y′+ z′ (y,y′ ∈ G) (z,z′ ∈ G⊥) .
Then :y = Gη , y′ = Gη
′ ,
where G is the n× r matrix whose column-vectors are gj (j = 1, . . . , r).Then, define the scalar product in Rn as follows :(
x,x′)
:= ηtη′+ zt z′ .
For this new scalar product, the family gj (j = 1, . . . , r) is orthonormal.Alternately, an orthonormal basis w.r.t. the standard scalar product is givenby the column-vectors of matrix :
V = V∆−12
where V is made of the column-vectors vj, and ∆ = Diag(vt
j vj).
53 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Revisiting the case of alinearly-independent family3
Hence, the projection matrix onto the sub-space G is :
Π =r
∑j=1
vj vtj = VVt = V∆−1Vt
and this permits us to identify y from x :
y = Πx = V∆−1Vt x
ThusGt y = Gt Gη ,
where the matrix Gt G is r× r and invertible. This gives :
η =(Gt G
)−1Gt y =
(Gt G
)−1Gt V∆−1Vt x := Wx , and z = (I−Π)x .
Finally : (x,x′
)= xt Anx
where :
An = Wt W + (I−Π)2 , W =(Gt G
)−1Gt V∆−1Vt .
54 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
55 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Case of a linearly-dependentfamily
In particular, case where m n
We then propose to proceed as follows
• Apply Gram-Schmidt orthogonalization process as before;determine the rank r, the newly-ordered sub-family g1, . . . ,grand the corresponding orthogonal basis v1, . . . ,vr.
• Reformulate the QP-problem in the basis of the sub-familyg1, . . . ,gr, and identify the associated metrics (matrix An).
• Solve the reformulated QP-problem for vector ω, using aprocedure from MATLAB or Scilab, or any other appropriatelibrary. Many components are found equal to 0.
• Compute the descent direction d = Anω.
56 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Outline
1 Part I : Concurrent Engineering, the quest for the Pareto frontIntroduction and objectivePareto optimality
2 Part II : Construction of descent algorithms in differentiableoptimization
ProblematicsSvaiter’s method and mineTwo lemmasApplication : MGDAPareto-stationarity and two theoremsQP-formulationAlgorithm by Hierarchical OrthogonalizationDirectional derivativesCase of a linearly-independent familyCase of a linearly-dependent familyApplication to parametric optimization
57 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Application to parametricoptimization of control devices
Control of boundary layer over a flat plate by pulsatingjets (Régis Duvigneau)
each jet is defined by 2 parameters (amplitude and phase)=⇒ a total of 6 parameters ak
• Finite-volume simulation of the 2D time-dependentcompressible Navier-Stokes equns.
=⇒
flow : W (x ,y , t), and
outputs, e.g. drag : D(t) =∫
Γ µ ∂u∂y dx
• Simultaneous simulation of the time-dependent linearizedNavier-Stokes equns., so-called "Sensitivity equations"
=⇒
flow sensitivity : W ′(x ,y , t) := ∇aW , and
output sensitivity : g(t) := ∇aD(t) ∈ R6
58 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Drag over a time-periodInitial setting
59 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Drag sensitivities over atime-period
Initial setting
60 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Homogenized drag over atime-period
Initial setting
61 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Homogenized drag sensitivitiesover a time-period
Initial setting
62 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
One step of uniformoptimization of drag
63 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Homogenized drag over last40% of time-period
Initial setting
64 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Homogenized drag sensitivities over last40% of time-period
Initial setting
65 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
One optimization step focusedon final 40% of time-period
66 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Conclusion
Our tool permits us to conduct robust optimizationthat is, not through statistical criteria such as a weighted average,but by simultaneous control of each time slot.
In particular, this optimization can be conducted
• uniformly over the entire time-interval,
• or selectively over a sub-interval of specific interest.
67 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Summary1
Multi-Objective Optimization
• Pareto optimality : a global notionRequires semi-stochastic or hybrid searchMany robust methods : GA’s, EA’s and hybrids
• Differentiable optimization : Pareto-stationarityIdentified role of convex hull of gradientsProved Pareto-optimal points in open domain are Pareto stationary
• Svaiter’s formulation :Defines optimal step, but simplifies the Hessian quadratic term byEuclidean norm squared; examines substitution of norms
68 / 69
Multiple-Gradient DescentAlgorithm
Jean-Antoine Désidéri
Part I : ConcurrentEngineering, the quest for thePareto front
Introduction and objective
Pareto optimality
Part II : Construction ofdescent algorithms indifferentiable optimization
Problematics
Svaiter’s method and mine
Two lemmas
Application : MGDA
Pareto-stationarity and two theorems
QP-formulation
Algorithm by HierarchicalOrthogonalization
Directional derivatives
Case of a linearly-independentfamily
Case of a linearly-dependent family
Application to parametricoptimization
Summary2
Multi-Objective Optimization (cont’d)
• My method : MGDA, Multiple-Gradient Descent Algorithm– Relies on a simple and general property of convex geometry toformulate a variational principle yielding a descent direction commonto all criteria in all situations except Pareto stationary points as thesolution of a QP-problem– An algorithm has been proposed and tested operating on the familyof gradients
• Linearly-independent family : explicit solution(s)• Linearly-dependent family : identified basis to simplify
formulation of QP-problem and easily calculate an appropriatedescent direction
• Parametric optimization of control devices in time-dependentNavier-Stokes flow (Régis Duvigneau)– 800 gradients calculated by the solution of the so-called "sensitivityequation", and homogenized to 20 for simplicity : obtained uniformreduction of criterion over the entire time-period, or over atime-segment of focused interest– to be generalized to other forms of robust parametric design
69 / 69