MAD S - M esh Ad apti ve D ire ct Sea rch fo r constr ...charlesa/PUB/mads2004jopt.pdf · (N LP )...
Transcript of MAD S - M esh Ad apti ve D ire ct Sea rch fo r constr ...charlesa/PUB/mads2004jopt.pdf · (N LP )...
-
1
MADS - Mesh Adaptive Direct Searchfor constrained optimization
Mark Abramson,Charles Audet,Gilles Couture,John Dennis,
www.gerad.ca/Charles.Audet/
Thanks to: ExxonMobil, AFOSR, Boeing, LANL,
FQRNT, NSERC, SANDIA, NSF.
Rice 2004
-
2
Outline
! Statement of the optimization problem
-
2
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
-
2
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
-
2
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class
-
2
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results
-
2
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results
! Discussion
Rice 2004
-
3
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results
! Discussion
Rice 2004
-
4
The target problem
(NLP ) minimize f(x)
subject to x ∈ Ω,
where f : Rn → R ∪ {∞} may be discontinuous and:
-
4
The target problem
(NLP ) minimize f(x)
subject to x ∈ Ω,
where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes,
-
4
The target problem
(NLP ) minimize f(x)
subject to x ∈ Ω,
where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes,
! the functions provide few correct digits and may faileven for x ∈ Ω
-
4
The target problem
(NLP ) minimize f(x)
subject to x ∈ Ω,
where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes,
! the functions provide few correct digits and may faileven for x ∈ Ω
! accurate approximation of derivatives is problematic
Rice 2004
-
5
minimize f(x)
subject to x ∈ X,
where f : Rn → R ∪ {∞} may be discontinuous and:
-
5
minimize f(x)
subject to x ∈ X,
where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes, often produced
by simulations or output of MDO codes
-
5
minimize f(x)
subject to x ∈ X,
where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes, often produced
by simulations or output of MDO codes
! the functions provide few correct digits and may faileven for x ∈ X
-
5
minimize f(x)
subject to x ∈ X,
where f : Rn → R ∪ {∞} may be discontinuous and:! the functions are expensive black boxes, often produced
by simulations or output of MDO codes
! the functions provide few correct digits and may faileven for x ∈ X
! accurate approximation of derivatives is problematic
! surrogate models s ≈ f and P ≈ X may be available
Rice 2004
-
6
Goals – or validation of the method
NOMAD Algo(NLP ) ! x̂!
-
6
Goals – or validation of the method
NOMAD Algo(NLP ) ! x̂!
if f is continuously differentiable then ∇f(x̂) = 0
-
6
Goals – or validation of the method
NOMAD Algo(NLP ) ! x̂!
if f is continuously differentiable then ∇f(x̂) = 0if f is convex then 0 ∈ ∂f(x̂)
-
6
Goals – or validation of the method
NOMAD Algo(NLP ) ! x̂!
if f is continuously differentiable then ∇f(x̂) = 0if f is convex then 0 ∈ ∂f(x̂)if f is Lipschitz near x̂ then 0 ∈ ∂f(x̂)
Rice 2004
-
7
Clarke Calculus – for f Lipschitz near x
! Clarke generalized derivative at x in the direction v:
f ◦(x; v) = lim supy→x, t↓0
f(y + tv)− f(y)t
.
-
7
Clarke Calculus – for f Lipschitz near x
! Clarke generalized derivative at x in the direction v:
f ◦(x; v) = lim supy→x, t↓0
f(y + tv)− f(y)t
.
! The generalized gradient of f at x is the set
∂f(x) := {ζ ∈ Rn : f ◦(x; v) ≥ vTζ for all v ∈ Rn}
-
7
Clarke Calculus – for f Lipschitz near x
! Clarke generalized derivative at x in the direction v:
f ◦(x; v) = lim supy→x, t↓0
f(y + tv)− f(y)t
.
! The generalized gradient of f at x is the set
∂f(x) := {ζ ∈ Rn : f ◦(x; v) ≥ vTζ for all v ∈ Rn}= co{lim∇f(yi) : yi → x and ∇f(yi) exists }.
-
7
Clarke Calculus – for f Lipschitz near x
! Clarke generalized derivative at x in the direction v:
f ◦(x; v) = lim supy→x, t↓0
f(y + tv)− f(y)t
.
! The generalized gradient of f at x is the set
∂f(x) := {ζ ∈ Rn : f ◦(x; v) ≥ vTζ for all v ∈ Rn}= co{lim∇f(yi) : yi → x and ∇f(yi) exists }.
! f ◦(x; v) can be obtained from ∂f(x) :f ◦(x; v) = max{vTζ : ζ ∈ ∂f(x)}.
Rice 2004
-
8
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results
! Discussion
Rice 2004
-
9
The two iterated phases of GPS and MADS
! The global search in the variable space is flexibleenough to allow user heuristics that incorporateknowledge of the driving simulation model and facilitatethe use of surrogate functions.
-
9
The two iterated phases of GPS and MADS
! The global search in the variable space is flexibleenough to allow user heuristics that incorporateknowledge of the driving simulation model and facilitatethe use of surrogate functions.
! The local poll around the incumbent solution is morerigidly defined, but it ensures convergence to a pointsatisfying necessary first order optimality conditions.
-
9
The two iterated phases of GPS and MADS
! The global search in the variable space is flexibleenough to allow user heuristics that incorporateknowledge of the driving simulation model and facilitatethe use of surrogate functions.
! The local poll around the incumbent solution is morerigidly defined, but it ensures convergence to a pointsatisfying necessary first order optimality conditions.
! This talk focusses on the basic algorithm, and theconvergence analysis. In the next talks, Alison, Markand Gilles will talk about surrogates in the search.
Rice 2004
-
10
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
-
10
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is the incumbent solution
-
10
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is the incumbent solution
•
-
10
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is the incumbent solution
•
•
-
10
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is the incumbent solution
•
•
Surrogate
-
10
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is the incumbent solution
•
•
Surrogate
,
,
Rice 2004
-
11
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is still the incumbent solution
•
•
•
-
11
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is still the incumbent solution
•
•
•
We poll near the incumbent solution
-
11
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is still the incumbent solution
•
•
•
We poll near the incumbent solution
••
•
-
11
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
∗ is still the incumbent solution
•
•
•
We poll near the incumbent solution
••
•
∗ is still the incumbent solution
•
••
Rice 2004
-
12
!
"
##
##
##
##
##
##
##
###$
f
x1
x2
∗
∗
New iteration from the same incumbentsolution, but on a finer mesh Rice 2004
-
13
Positive spanning sets and meshes
! A positive spanning set D is a set of vectors whosenonnegative linear combinations span Rn.
-
13
Positive spanning sets and meshes
! A positive spanning set D is a set of vectors whosenonnegative linear combinations span Rn.
! The mesh is centered around xk ∈ Rn and its fineness isparameterized by ∆mk > 0 as follows
Mk = {xk + ∆mk Dz : z ∈ N|D|}.
-
13
Positive spanning sets and meshes
! A positive spanning set D is a set of vectors whosenonnegative linear combinations span Rn.
! The mesh is centered around xk ∈ Rn and its fineness isparameterized by ∆mk > 0 as follows
Mk = {xk + ∆mk Dz : z ∈ N|D|}.
!xk
Ex: D=[I; −I]
Rice 2004
-
14
Basic pattern search algorithm forunconstrained optimization
Given ∆m0 , x0 ∈ M0 with f(x0) < ∞, and D,
-
14
Basic pattern search algorithm forunconstrained optimization
Given ∆m0 , x0 ∈ M0 with f(x0) < ∞, and D,for k = 0, 1, · · ·, do
1. Employ some finite strategy to try to choose xk+1 ∈ Mksuch that f(xk+1) < f(xk) and then set ∆mk+1 = ∆
mk or
∆mk+1 = 2∆mk (xk+1 is called an improved mesh point);
-
14
Basic pattern search algorithm forunconstrained optimization
Given ∆m0 , x0 ∈ M0 with f(x0) < ∞, and D,for k = 0, 1, · · ·, do
1. Employ some finite strategy to try to choose xk+1 ∈ Mksuch that f(xk+1) < f(xk) and then set ∆mk+1 = ∆
mk or
∆mk+1 = 2∆mk (xk+1 is called an improved mesh point);
2. Else if xk minimizes f(x) for x ∈ Pk, then set xk+1 = xkand ∆mk+1 = ∆
mk /2 (xk is called a minimal frame center).
Rice 2004
-
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk
-
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk !p1
!p2
!p3
!p4
-
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk !p1
!p2
!p3
!p4
∆mk+1 =12
"xk+1
-
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk !p1
!p2
!p3
!p4
∆mk+1 =12
"xk+1
!p1
!p2
!p3
!p4
-
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk !p1
!p2
!p3
!p4
∆mk+1 =12
"xk+1
!p1
!p2
!p3
!p4
∆mk+2 =14
"xk+2
-
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk !p1
!p2
!p3
!p4
∆mk+1 =12
"xk+1
!p1
!p2
!p3
!p4
∆mk+2 =14
"xk+2 !p1
!p2!p3
!p4
-
15
The Coordinate Search (CS) frame Pk
Pk = {xk + ∆mk d : d ∈ [I;−I]};2n points adjacent to xk in Mk.
∆mk = 1
"xk !p1
!p2
!p3
!p4
∆mk+1 =12
"xk+1
!p1
!p2
!p3
!p4
∆mk+2 =14
"xk+2 !p1
!p2!p3
!p4
Always the same 2n = 4 directions, regardless of ∆k.Rice 2004
-
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk
-
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk!!
!!
!!
!!
!!
!!p1
!p2
!p3
-
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk!!
!!
!!
!!
!!
!!p1
!p2
!p3
∆mk+1 =12
"xk+1
-
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk!!
!!
!!
!!
!!
!!p1
!p2
!p3
∆mk+1 =12
"xk+1""
""
""!p1
!p3
!p2
-
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk!!
!!
!!
!!
!!
!!p1
!p2
!p3
∆mk+1 =12
"xk+1""
""
""!p1
!p3
!p2
∆mk+2 =14
"xk+2
-
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk!!
!!
!!
!!
!!
!!p1
!p2
!p3
∆mk+1 =12
"xk+1""
""
""!p1
!p3
!p2
∆mk+2 =14
"xk+2!!
!
""
"
!p1!p2
!p3
-
16
The GPS frame Pk
Pk = {xk + ∆mk d : d ∈ Dk ⊂ D}; points adjacentto xk in Mk (wrt positive spanning set Dk).
∆mk = 1
"xk!!
!!
!!
!!
!!
!!p1
!p2
!p3
∆mk+1 =12
"xk+1""
""
""!p1
!p3
!p2
∆mk+2 =14
"xk+2!!
!
""
"
!p1!p2
!p3
Here, only 14 different ways of selecting Dk, regardless of ∆k.Rice 2004
-
17
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results
! Discussion
Rice 2004
-
18
Convergence results – unconstrained GPS
If all iterates are in a compact set, then lim infk
∆mk = 0
-
18
Convergence results – unconstrained GPS
If all iterates are in a compact set, then lim infk
∆mk = 0! The mesh is refined only at a minimal frame center.
-
18
Convergence results – unconstrained GPS
If all iterates are in a compact set, then lim infk
∆mk = 0! The mesh is refined only at a minimal frame center.
! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.
-
18
Convergence results – unconstrained GPS
If all iterates are in a compact set, then lim infk
∆mk = 0! The mesh is refined only at a minimal frame center.
! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.{xk}k∈K is called a refining subsequence.
-
18
Convergence results – unconstrained GPS
If all iterates are in a compact set, then lim infk
∆mk = 0! The mesh is refined only at a minimal frame center.
! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.{xk}k∈K is called a refining subsequence.
! f(xk) ≤ f(xk + ∆mk d) ∀ d ∈ Dk ⊂ D with k ∈ K.
-
18
Convergence results – unconstrained GPS
If all iterates are in a compact set, then lim infk
∆mk = 0! The mesh is refined only at a minimal frame center.
! There is a limit point x̂ of a subsequence {xk}k∈K ofminimal frame centers with {∆mk }k∈K → 0.{xk}k∈K is called a refining subsequence.
! f(xk) ≤ f(xk + ∆mk d) ∀ d ∈ Dk ⊂ D with k ∈ K.Let D̂ ⊆ D be the set of poll directions used infinitelyoften in the refining subsequence.D̂ is the set of refining direction.
Rice 2004
-
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
-
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
•xk2%
%%
%%
&&
&&
&
-
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
•xk2%
%%
%%
&&
&&
&
•xk3''
''
'
(((((
))
)))
-
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
•xk2%
%%
%%
&&
&&
&
•xk3''
''
'
(((((
))
)))
•xk4%
%%&
&&
-
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
•xk2%
%%
%%
&&
&&
&
•xk3''
''
'
(((((
))
)))
•xk4%
%%&
&&
•xk5%%&&
-
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
•xk2%
%%
%%
&&
&&
&
•xk3''
''
'
(((((
))
)))
•xk4%
%%&
&&
•xk5%%&&...•
x̂=lim xkk∈K
-
19
Set of refining directions D̂
%%
%%
%%
%%•xk1+∆
mk1
d2
&&
&&
&&
&&•xk1+∆
mk1
d1
•xk1+∆
mk1
d3
•xk1
•xk2%
%%
%%
&&
&&
&
•xk3''
''
'
(((((
))
)))
•xk4%
%%&
&&
•xk5%%&&...•
x̂=lim xkk∈K
%%
%%*
&&
&&+
,D̂
Rice 2004
-
20
Convergence results – unconstrained GPS
If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)
-
20
Convergence results – unconstrained GPS
If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)
! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.
-
20
Convergence results – unconstrained GPS
If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)
! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.this says that the Clarke derivatives are non-negative on a finite set
of directions that positively span Rn.f◦(x̂; d) := lim sup
y→x̂,t↓0
f(y + td)− f(y)t
≥ limk∈K
f(xk + ∆kd)− f(xk)∆k
-
20
Convergence results – unconstrained GPS
If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)
! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.this says that the Clarke derivatives are non-negative on a finite set
of directions that positively span Rn.f◦(x̂; d) := lim sup
y→x̂,t↓0
f(y + td)− f(y)t
≥ limk∈K
f(xk + ∆kd)− f(xk)∆k
! f is regular at x̂ ⇒ f ′(x̂; d) ≥ 0 for every d ∈ D̂.
-
20
Convergence results – unconstrained GPS
If all iterates are in a compact set, then for any refiningsubsequence (with limit x̂ and refining directions D̂)
! f is Lipschitz near x̂ ⇒ f ◦(x̂; d) ≥ 0 for every d ∈ D̂.this says that the Clarke derivatives are non-negative on a finite set
of directions that positively span Rn.f◦(x̂; d) := lim sup
y→x̂,t↓0
f(y + td)− f(y)t
≥ limk∈K
f(xk + ∆kd)− f(xk)∆k
! f is regular at x̂ ⇒ f ′(x̂; d) ≥ 0 for every d ∈ D̂.
! f is strictly differentiable at x̂ ⇒ ∇f(x̂) = 0.
Rice 2004
-
21
Limitations of GPS
GPS methods are directional, so the restriction to a finiteset of directions is a big limitation, particularly whendealing with nonlinear constraints.
-
21
Limitations of GPS
GPS methods are directional, so the restriction to a finiteset of directions is a big limitation, particularly whendealing with nonlinear constraints.
GPS with empty search: The iterates stall at x0.
#$
%
&
f(x) = ‖x‖∞ ! x0 = (1, 1)T$ #%
&
-
21
Limitations of GPS
GPS methods are directional, so the restriction to a finiteset of directions is a big limitation, particularly whendealing with nonlinear constraints.
GPS with empty search: The iterates stall at x0.
#$
%
&
f(x) = ‖x‖∞ ! x0 = (1, 1)T$ #%
&
Even with a C1 function, GPS may generate infinitelymany limit points, some of them non-stationary.
Rice 2004
-
22
GPS convergence to a bad solution
!
"
a
b"
""
""
""
""
""
""
""
""
""
""
"""
""
""""
""
"""
" ""
""
""
"""
""
""
""
""
""
""
""
""
"""
""
""
""
""
Level Sets
-
22
GPS convergence to a bad solution
!
"
a
b"
""
""
""
""
""
""
""
""
""
""
"""
""
""""
""
"""
" ""
""
""
"""
""
""
""
""
""
""
""
""
"""
""
""
""
""
Level Sets (0, 0) 4∈ ∂f(0, 0)
#
%
""
""
""
""
""
""
""
•
•
•
∂f(0, 0)
-
22
GPS convergence to a bad solution
!
"
a
b"
""
""
""
""
""
""
""
""
""
""
"""
""
""""
""
"""
" ""
""
""
"""
""
""
""
""
""
""
""
""
"""
""
""
""
""
Level Sets (0, 0) 4∈ ∂f(0, 0)
#
%
""
""
""
""
""
""
""
•
•
•
∂f(0, 0)
GPS iterates – with a bad strategy – converge to theorigin, where the gradient exists and is nonzero(f is differentiable at (0, 0) but not strictly differentiable).
Rice 2004
-
23
Outline
! Statement of the optimization problem
! The GPS and MADS algorithm classes
! GPS theory and limiting examples
! The MADS algorithm class• An implementable MADS instance• MADS theory• Numerical results
! Discussion
Rice 2004
-
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk.
-
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
-
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
-
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk
-
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk
-
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk"p2
"p1
"p3
''''''
((
((
((
(((
-
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk"p2
"p1
"p3
''''''
((
((
((
(((
∆mk =116,∆
pk =
12
"xk
-
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk"p2
"p1
"p3
''''''
((
((
((
(((
∆mk =116,∆
pk =
12
"xk
-
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk"p2
"p1
"p3
''''''
((
((
((
(((
∆mk =116,∆
pk =
12
"xk "p2"p1
"p3
))
)
***
-
24
The MADS frames
In addition to the mesh size parameter ∆mk , we introducethe poll size parameter ∆pk. Ex : ∆
pk = n
√∆mk .
∆mk = 1, ∆pk = 2
"xk
"p3
"p1
"p2
!!
!!
!!
!!
!!
!
∆mk =14,∆
pk = 1
"xk"p2
"p1
"p3
''''''
((
((
((
(((
∆mk =116,∆
pk =
12
"xk "p2"p1
"p3
))
)
***
Number of ways of selecting Dk increases as ∆pk gets smaller.Rice 2004
-
25
Barrier approach to constraints
To enforce Ω constraints, replace f by a barrier objective
fΩ(x) :={
f(x) if x ∈ Ω,+∞ otherwise.
This is a standard construct in nonsmooth optimization.
-
25
Barrier approach to constraints
To enforce Ω constraints, replace f by a barrier objective
fΩ(x) :={
f(x) if x ∈ Ω,+∞ otherwise.
This is a standard construct in nonsmooth optimization.
Then apply the unconstrained algorithm to fΩ.This is NOT a standard construct in optimizationalgorithms.
-
25
Barrier approach to constraints
To enforce Ω constraints, replace f by a barrier objective
fΩ(x) :={
f(x) if x ∈ Ω,+∞ otherwise.
This is a standard construct in nonsmooth optimization.
Then apply the unconstrained algorithm to fΩ.This is NOT a standard construct in optimizationalgorithms.
Quality of the limit solution depends the localsmoothness of f , not of fΩ.
Rice 2004
-
26
A MADS instance
note: GPS = MADS with ∆pk = ∆mk .
-
26
A MADS instance
note: GPS = MADS with ∆pk = ∆mk .
An implementable way to generate Dk:
-
26
A MADS instance
note: GPS = MADS with ∆pk = ∆mk .
An implementable way to generate Dk:
! Let B be a lower triangular nonsingular random integermatrix.
-
26
A MADS instance
note: GPS = MADS with ∆pk = ∆mk .
An implementable way to generate Dk:
! Let B be a lower triangular nonsingular random integermatrix.
! Randomly permute the lines of B
-
26
A MADS instance
note: GPS = MADS with ∆pk = ∆mk .
An implementable way to generate Dk:
! Let B be a lower triangular nonsingular random integermatrix.
! Randomly permute the lines of B
! Complete to a positive basis
• Dk = [B; −B] (maximal positive basis 2n directions). or• Dk = [B; −
∑b∈B b] (minimal positive basis n+1 directions).
• Use Luis’ talk to order the poll directions
Rice 2004
-
27
Dense polling directions
Theorem 1. As k →∞, MADS’s polling directions forma dense set in Rn (with probability 1).
-
27
Dense polling directions
Theorem 1. As k →∞, MADS’s polling directions forma dense set in Rn (with probability 1).
The ultimate goal is a way to be sure that the subset ofrefining directions D̂ is dense.
-
27
Dense polling directions
Theorem 1. As k →∞, MADS’s polling directions forma dense set in Rn (with probability 1).
The ultimate goal is a way to be sure that the subset ofrefining directions D̂ is dense.
Then the barrier approach to constraints promises strongoptimality under weak assumptions - the existence of ahypertangent vector, e.g., a vector that makes a negativeinner product with all the active constraint gradients.
Rice 2004
-
28
MADS convergence results
Let f be Lipschitz near a limit x̂ of a refining sequence.
Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).
-
28
MADS convergence results
Let f be Lipschitz near a limit x̂ of a refining sequence.
Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).
Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:
-
28
MADS convergence results
Let f be Lipschitz near a limit x̂ of a refining sequence.
Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).
Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:
f ◦(x̂; v) ≥ 0,∀v ∈ TClΩ (x̂).
-
28
MADS convergence results
Let f be Lipschitz near a limit x̂ of a refining sequence.
Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).
Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:
f ◦(x̂; v) ≥ 0,∀v ∈ TClΩ (x̂).
• In addition, it f is strictly differentiable at x̂ and if Ω isregular at x̂, then x̂ is a contingent KKT stationary pointof f over Ω
-
28
MADS convergence results
Let f be Lipschitz near a limit x̂ of a refining sequence.
Theorem 2. Suppose that D̂ is dense in Ω.• If either Ω = Rn, or x̂ ∈ int(Ω), then 0 ∈ ∂f(x̂).
Theorem 3. Suppose that D̂ is dense in THΩ (x̂) 4= ∅.• Then x̂ is a Clarke stationary point of f over Ω:
f ◦(x̂; v) ≥ 0,∀v ∈ TClΩ (x̂).
• In addition, it f is strictly differentiable at x̂ and if Ω isregular at x̂, then x̂ is a contingent KKT stationary pointof f over Ω : −∇f(x̂)Tv ≤ 0,∀v ∈ TCoΩ (x̂).
Rice 2004
-
29
A problem for which GPS stagnates
10 5 0 5 10
3
2
1
0
1
2
3
x
y
� 4 � 3.5 � 3 � 2.5 � 2
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
x
y
Rice 2004
-
30
Our results
Rice 2004
-
31
Results for a chemE pid problem
Rice 2004
-
32
Constrained optimization
A disk constrained problem
minx,y
x + y
s.t. x2 + y2 ≤ 6
How hard can that be?
-
32
Constrained optimization
A disk constrained problem
minx,y
x + y
s.t. x2 + y2 ≤ 6
How hard can that be?
Very hard for GPS and filter-GPS with the standard 2ndirections with an empty search
Rice 2004
-
33
50 100 150 200 250 300 350 400 450
!3.46
!3.44
!3.42
!3.4
!3.38
!3.36
Number of function evaluations
fdynamic 2n directions
GPS FilterGPS BarrierMADS (5 runs)
Rice 2004
-
34
Parameter fit in a rheology problemRheology is a branch of mechanics that studiesproperties of materials which determine theirresponse to mechanical force.
MODEL :Viscosity η of a material can be modelled as afunction of the shear rate γ̇i :
η(γ̇) = η0(1 + λ2γ̇2)β−1
2
A parameter fit problem.
Rice 2004
-
35
Observation Strain rate Viscosityi γ̇i (s−1) ηi (Pa · s)1 0.0137 32202 0.0274 21903 0.0434 1640 The unconstrained4 0.0866 1050 optimization problem :5 0.137 7666 0.274 4907 0.434 348 min g(η0,λ,β)8 0.866 223 η0,λ,β
9 1.37 163 with10 2.74 104
11 4.34 76.7 g =∑13
i=1 |η(γ̇)− ηi|12 5.46 68.113 6.88 58.2
Rice 2004
-
36
500 1000 1500 2000 2500
50
100
150
200
250
Number of function evaluations
f
Coordinate searchFixed CompleteFixed opportunistDynamic opportunistDynamic opportunist with cache
Rice 2004
-
37
500 1000 1500 2000 2500
50
100
150
200
250
Number of function evaluations
f
GPS with n+1 directionsNo searchLHS searchRandom search
Rice 2004
-
38
500 1000 1500 2000 2500
50
100
150
200
250
Number of function evaluations
f
MADS with n+1 directionsNo searchLHS searchRandom search
Rice 2004
-
39
Discussion
! MADS variant looks good as a 1st try
-
39
Discussion
! MADS variant looks good as a 1st try,and is more general than the instance shown here.
-
39
Discussion
! MADS variant looks good as a 1st try,and is more general than the instance shown here.
! Numerically, randomness is a blessing and a curse.
-
39
Discussion
! MADS variant looks good as a 1st try,and is more general than the instance shown here.
! Numerically, randomness is a blessing and a curse.
! MADS can handle oracular or yes/no constraints.
-
39
Discussion
! MADS variant looks good as a 1st try,and is more general than the instance shown here.
! Numerically, randomness is a blessing and a curse.
! MADS can handle oracular or yes/no constraints.
! The underlying mesh is finer in MADS than in GPS :Good for general searches and surrogates.
-
39
Discussion
! MADS variant looks good as a 1st try,and is more general than the instance shown here.
! Numerically, randomness is a blessing and a curse.
! MADS can handle oracular or yes/no constraints.
! The underlying mesh is finer in MADS than in GPS :Good for general searches and surrogates.
! MADS is the result of nonsmooth analysis pointing upthe weaknesses in GPS.
Rice 2004
-
40
Later today...
! This is a meeting about surrogates,but I did not talk about surrogates...Alison will present in the next talk the use of a surrogatein a specific mechanical engineering problem usingGPS/MADS.
-
40
Later today...
! This is a meeting about surrogates,but I did not talk about surrogates...Alison will present in the next talk the use of a surrogatein a specific mechanical engineering problem usingGPS/MADS.
! MADS replaces GPS in our NOMADm and NOMADsoftwares. Gilles and Mark will present a demo of thesesofwares after lunch.
Rice 2004