A.J. Keane & N.W. Bressloff, January 2020, Faculty of ...ajk/DSO/DSO Course Notes V2.pdf · case...
Transcript of A.J. Keane & N.W. Bressloff, January 2020, Faculty of ...ajk/DSO/DSO Course Notes V2.pdf · case...
Design Search and Optimisation Course Notes – January 2020
1
FEEG6009 MODULE TITLE: DESIGN SEARCH AND OPTIMISATION – PRINCIPLES, METHODS AND PARAMETERIZATIONS, 2019/20
A.J. Keane & N.W. Bressloff, January 2020, Faculty of Engineering and the Environment
1 BASIC INFORMATION Department responsible for the module Aeronautics and Astronautics Programme Part IV Timetable Semester 2 Credit Value 15 CATS points Pre-requisites none Contacts Prof. A.J. Keane, Building 176, Room 5013,
ext.22944, email: [email protected] Dr. D.J.J. Toal, Building 176, Room 5011, ext.22662, email: [email protected] Dr. I.I. Voutchkov, Building 176, Room 5025 ext.21276, email: [email protected] Prof. N.W. Bressloff, Building 176, Room 5031, ext.25473, email: [email protected]
Formal Contact Hours 33 Private Study Hours 117 Coursework Computer assignments Course Web Site http://www.soton.ac.uk/~ajk/DSO Course Module Profile https://www.southampton.ac.uk/courses/modules/f
eeg6009.page
2 DESCRIPTION
2.1 Aims The aims of this module are to: provide the background needed to analyse, develop and use algorithms for tackling design search and
optimization (DSO) problems of diverse kinds; equip students to become intelligent users of DSO methods. provide the experience needed to formulate approaches to the solution of problems in DSO; introduce how tools such as MATLAB can be used to support problem solving in DSO. NOTE: whilst
some MATLAB functionality will be demonstrated in this course, detailed MATLAB tuition will not be provided (see note below in the Resources section).
2.2 Objectives (planned learning outcomes) Knowledge and understanding Having successfully completed the module, you will be able to demonstrate knowledge and understanding of: the basic elements of single and multi-variable optimizers; the ways in which these simple elements can be combined to provide solutions to DSO problems; the ways in which problem parameters can be used to formulate design intent in DSO problems; the issues confronting engineers as they seek usable DSO approaches; the ways in which CAD tools can be used to formulate design intent in DSO problems; the ways in which various tools can be brought together to tackle realistic DSO problems via the use
of bespoke workflows; the issues confronting engineers as they seek practical DSO approaches.
Design Search and Optimisation Course Notes – January 2020
2
Intellectual skills Having successfully completed the module, you will be able to: more fully understand the components of a successful DSO approaches to design; make intelligent choices among the available DSO approaches; evaluate the utility and robustness of DSO produced designs.
Practical skills [where appropriate] Having successfully completed the module, you will be able to:
set-up and solve simple DSO problems using a range of software tools including FEA codes and Excel.
2.3 Topics covered Design Search and Optimization (DSO) (1 lecture)
Beginnings A Taxonomy of Optimization A Brief History of Optimization Methods The Place of Optimization in Design – Commercial Tools
Geometry Modelling & Design Parameterization (1 lecture)
The Role of Parameterization in Design Discrete and Domain Element Parameterizations NACA Airfoils Spline Based Approaches Partial Differential Equation and Other Analytical Approaches Basis Function Representation Morphing Shape Grammars Mesh Based Evolutionary Encodings CAD Tools v's Dedicated Parameterization Methods
Single Variable Optimizers – Line Search (1 lecture)
Unconstrained Optimization with a Single Real Variable Optimization with a Single Discrete Variable Optimization with a Single Non-Numeric Variable
Multi-Variable Optimizers (3 lectures)
Population versus Single Point Methods Gradient-based Methods
Newton's Method Conjugate Gradient Methods Quasi-Newton or Variable Metric Methods
Noisy/Approximate Function Values Non-Gradient Algorithms
Pattern or Direct Search Stochastic and Evolutionary Algorithms
Termination and Convergence Aspects
Constrained Optimization (2 lectures) Problem Transformations Lagrangian Multipliers Feasible Directions Method Penalty Function Methods Combined Lagrangian and Penalty Function Methods Sequential Quadratic Programming Chromosome Repair
Meta-models and Response Surface Methods (1 lecture)
Global versus Local Meta-models Meta-modelling Tools Simple RSM Examples
Design Search and Optimisation Course Notes – January 2020
3
Combined Approaches – Hybrid Searches, Meta-heuristics (1 lecture)
Glossy – a Hybrid Search Template Meta-heuristics – Search Workflows Visualization – understanding the results of DSO
Multi-objective Optimization (1 lecture)
Multi-objective Weight Assignment Techniques Methods for Combining Goal Functions, Fuzzy Logic & Physical Programming Pareto Set Algorithms Nash Equilibria
Robustness (3 lectures)
Robustness versus Nominal Performance Evolutionary Algorithms for Robust Design Robustness Metrics
Noisy Phenotype One -- Tsutusi and Ghosh's Method (NP) Noisy Phenotype Two -- Modified Tsutusi and Ghosh Method (NP2) Design of Experiment One -- One-at-a-time Experiments (OAT) Design of Experiments Two and Three -- Orthogonal Arrays (L64 & L81)
Comparison of Metrics Using Surrogates in robustness studies Krigs Co-Krigs Combined Krigs
Problem Classification (1 lecture)
Run-time Deterministic v's Probabilistic Analyses Number of Variables to be explored Goals and Constraints
Initial Search Process Choice (1 lecture) External Speakers (1 lecture)
2.4 Case studies Case study 1: The design of an encastre cantilever beam. (3 lectures)
This is based around simple Euler-Bernoulli beam theory and Excel to set up and solve a simple structures DSO problem. Each student pairing tackles a different set of boundary conditions and the whole class’s studies then allow a Pareto Front to be constructed illustrating which pairings have produced Pareto optimal designs and which have produced sub-optimal designs. This is a very simple case study just to get students used to the whole idea of DSO approaches.
Case study 2: Global versus local search methods (4 lectures) An airplane wing design problem will be used to demonstrate the differences between local and global search methods. A key element of this study concerns the fixed computational budget often faced in real engineering problems. It is not necessary to have to have an aerodynamics background to follow this design study.
Case study 3: Multi-objective design problem. (6 lectures) For this final case study, a multi-objective design problem will be described, which will have to be solved and presented during the course of the laboratory sessions. Students will be free to employ any method(s) learnt earlier in the course, or from elsewhere.
Case study 4: Medical Device Optimization. (4 lectures) In the UK alone, nearly 100,000 people have one or more coronary artery stents implanted, annually, to open up diseased, narrowed blood vessels that supply blood to the heart. For this case study, you will be introduced to the engineering characteristics and design requirements of stents in the first lab session. Then, in the second session, a simplified model of a coronary artery stent will be used to determine an optimal design. This will be conducted under exam conditions wherein you will be required to submit your optimal design at the end of the session. Further details will be provided in the first lab, including how submissions will be assessed.
Revision (3 lectures)
Design Search and Optimisation Course Notes – January 2020
4
2.5 Teaching and learning activities Teaching methods include
Lectures. Computer sessions.
Learning activities include Using the DSO capabilities of the Excel spreadsheet system.
Monday Thursday 11AM - 12PM 27 / 2001 (L/R 1) Weeks 18-20, 22-25, 30-33 AJK Lecture
11AM - 12PM 25 / 1009 (Computer Workstation) Weeks 18-20 AJK Coursework Sessions
11AM - 1PM 25 / 1009 (Computer Workstation) Weeks 22-25, 30-32 DJJT, IIV, NWB Double Coursework Sessions
11AM - 1PM 02A / 2077 (L/T J) Week 21 AJK Lecture
11AM - 1PM 02 / 1089 (L/T D) Week 33 AJK Double Revision Lecture
Assignments – Module Code: FEEG6009 Title: Design Search and Optimization # Type Title of
assessment Set date Due date Submission
method Feedback date
Feedback method
Weighted mark
Purpose
1 AJK
Excel Spreadsheet
Problem design – encastre beam
30/1/2020 23:00 on 13/2/2020
Submit spreadsheet via esubmissions
2/3/2020 In lecture and on request, sent by e-mail
5% To develop initial understanding of optimizers
2 DJJT
Computer Programming exercise
Light aircraft wing design optimisation
27/2/2020 23:00 on 12/3/2020
Submit report via esubmissions
26/3/2020 Via e-assignments
15% To develop an understanding of global, local and hybrid optimisation
3 IIV Computer Programming exercise
Multiobjective optimization of expensive design problems
12/3/2020 23:00 on 23/4/2020
Send required files by e-mail to [email protected] at end of class
7/5/2020 On request, sent by e-mail
15% To develop understanding of optimization using multiple expensive goals.
4 NWB
Computational modelling exercise
Medical Device Optimization
7/5/2020 13:00 on 7/5/2020 at the end of the class
Email results to [email protected]
15/5/2020 Email comments on work
15% Application of optimisation to an industrially relevant biomedical problem
2.6 Methods of assessment Assessment Method Number % contribution
to final mark 2-hour written closed-book examination. 1 50 Coursework 4 50
(5+15+15+15) Referral Method Number % contribution
to final mark 2-hour written closed-book examination. 1 50 Previous Coursework marks may be re-used or the 4 50
Design Search and Optimisation Course Notes – January 2020
5
entire set of coursework repeated. (5+15+15+15)
2.7 Feedback and student support during module study (formative assessment) Feedback will be provided through the following mechanisms: Class discussion based on notes and worked examples; Written feedback on marked independent and group assignments; Revision sessions and discussion of past exam papers.
2.8 Relationship between the teaching, learning and assessment methods and the planned learning outcomes The teaching and learning methods will provide students with the necessary material to set up DSO problems using both CAD and spreadsheets. They will also learn the essential background to all common DSO methods and how these impact on what can be achieved in practise. Written reports will be required for two pieces of coursework – including listings and descriptions of validation where appropriate – in order to assess their understanding of the nature of the tools they are using. Other issues will be assessed via a written examination.
2.9 Resources Core Text (include number in library or URL) (inc ISBN) A.J. Keane and P.B. Nair.: Computational Approaches to Aerospace Design: The Pursuit of Excellence, John Wiley, 2005. ISBN: 0-470-85540-1 http://onlinelibrary.wiley.com/book/10.1002/0470855487 .
TABLE OF CONTENTS
1 BASIC INFORMATION ............................................................................................................................ 1
2 DESCRIPTION ......................................................................................................................................... 1
2.1 AIMS .......................................................................................................................................................... 1 2.2 OBJECTIVES (PLANNED LEARNING OUTCOMES) ............................................................................................. 1 2.3 TOPICS COVERED ....................................................................................................................................... 2 2.4 CASE STUDIES ............................................................................................................................................ 3 2.5 TEACHING AND LEARNING ACTIVITIES ........................................................................................................... 4 2.6 METHODS OF ASSESSMENT ......................................................................................................................... 4 2.7 FEEDBACK AND STUDENT SUPPORT DURING MODULE STUDY (FORMATIVE ASSESSMENT) ................................. 5 2.8 RELATIONSHIP BETWEEN THE TEACHING, LEARNING AND ASSESSMENT METHODS AND THE PLANNED LEARNING
OUTCOMES ........................................................................................................................................................... 5 2.9 RESOURCES ............................................................................................................................................... 5
3 INTRODUCTION TO DSO ....................................................................................................................... 6
3.1 WHAT IS DESIGN AND HOW DOES DESIGN SEARCH & OPTIMISATION FIT INTO IT? ............................................. 7 3.2 THE NOWACKI BEAM PROBLEM. ................................................................................................................... 8 3.3 TAXONOMY OF OPTIMIZATION (METHODS) ..................................................................................................... 9 3.4 BRIEF HISTORY OF OPTIMIZATION .............................................................................................................. 10 3.5 GEOMETRY MODELLING & DESIGN PARAMETERIZATION ............................................................................... 11 3.6 EXCEL & NOWACKI BEAM .......................................................................................................................... 12
4 LINE SEARCH – SEARCHES WITH ONE VARIABLE ........................................................................ 14
4.1 DIFFERENTIAL CALCULUS .......................................................................................................................... 14 4.2 BRACKETING ............................................................................................................................................ 14 4.3 GOLDEN SECTION SEARCH ........................................................................................................................ 14 4.4 INVERSE PARABOLIC INTERPOLATION ......................................................................................................... 15 4.5 NEWTON’S METHOD .................................................................................................................................. 16
5 MULTI VARIABLE OPTIMIZERS .......................................................................................................... 18
5.1 STEEPEST DESCENT ................................................................................................................................. 18 5.2 CONJUGATE GRADIENT.............................................................................................................................. 19 5.3 NEWTON’S METHOD .................................................................................................................................. 20 5.4 QUASI-NEWTON METHODS ........................................................................................................................ 21 5.5 NON GRADIENT BASED SEARCH METHODS .................................................................................................. 21
Design Search and Optimisation Course Notes – January 2020
6
5.6 STOCHASTIC/EVOLUTIONARY SEARCH ....................................................................................................... 22 5.7 TERMINATION/CONVERGENCE ................................................................................................................... 25
6 CONSTRAINED OPTIMIZATION .......................................................................................................... 26
6.1 CONSTRAINT ELIMINATION BY CONSTRUCTION ............................................................................................ 26 6.2 LAGRANGE MULTIPLIERS ........................................................................................................................... 27 6.3 PENALTY FUNCTION METHODS ................................................................................................................... 29 6.4 COMBINED LAGRANGE AND PENALTY FUNCTION METHOD ............................................................................ 29 6.5 SEQUENTIAL QUADRATIC PROGRAMMING METHOD (SQP) ........................................................................... 30 6.6 (CHROMOSOME) REPAIR ........................................................................................................................... 30
7 META-MODELS + RSM ......................................................................................................................... 32
7.1 EXPLOITATION VERSUS EXPLORATION ........................................................................................................ 34 7.2 LOCAL TRUST REGION SEARCH .................................................................................................................. 35 7.3 SINGULAR VALUE DECOMPOSITION ............................................................................................................ 36 7.4 GLOBAL RSM SEARCH .............................................................................................................................. 37 7.5 META HEURISTICS .................................................................................................................................... 39 7.6 VISUALIZATION ......................................................................................................................................... 39
8 MULTI-OBJECTIVE OPTIMIZATION .................................................................................................... 42
8.1 METHODS FOR COMBINING GOAL FUNCTIONS .............................................................................................. 46 8.2 METHODS FOR FINDING PARETO FRONTS ................................................................................................... 49
9 ROBUSTNESS IN OPTIMIZATION AND UNCERTAINTY QUANTIFICATION (UQ) .......................... 51
9.1 MONTE-CARLO METHOD ............................................................................................................................ 53 9.2 DESIGN OF EXPERIMENT METHODS ............................................................................................................ 53 9.3 NOISY PHENOTYPE ................................................................................................................................... 53 9.4 RESPONSE SURFACE APPROACH ............................................................................................................... 53 9.5 STOCHASTIC SOLVERS .............................................................................................................................. 55 9.6 ROBUSTNESS ........................................................................................................................................... 55 9.7 A SIMPLE EXAMPLE .................................................................................................................................. 57 9.8 TWO VARIABLES AND TWO OBJECTIVES ...................................................................................................... 59 9.9 USING SURROGATES TO SUPPORT UNCERTAINTY QUANTIFICATION ............................................................ 61 9.10 ROBUST DESIGN OPTIMIZATION WITH BASIC SURROGATES ......................................................................... 62 9.11 ROBUST DESIGN OPTIMIZATION WITH ADVANCED SURROGATES ................................................................. 66
9.11.1 Co-Kriging .................................................................................................................................. 66 9.11.2 Combined Kriging ....................................................................................................................... 71
10 GETTING STARTED ............................................................................................................................. 77
3 INTRODUCTION TO DSO Why we want to do it and its place in design. Taxonomy of optimization, history of methods, commercial tools etc. First what is design – synthesis v’s analysis What is optimal design? Are all designs optimal? Might we ever deliberately accept sub-optimal design? To answer this we must be able to compare competing designs and say which we prefer and hopefully why. (Examples of cars, fridges – cost v’s performance, tradeoffs). Taxonomy of optimization Sketch out a picture illustrating taxonomy and another illustrating history. Also mention commercial tools
History of optimization Newton and classical gradients
Design Search and Optimisation Course Notes – January 2020
7
144,42 2 xxdx
dyxxy
42
2
dx
yd 2min y
Pattern search and which way is down methods Stochastic search – population DoE based search and RSM’s - Hybrids
3.1 What is design and how does design search & optimisation fit into it? Engineering or Analytically Led Design is the use of analysis to support synthesis so as to adequately define a product (or process). Note difference between analysis and synthesis. Synthesis involves decisions whereas analysis does not (ie decisions about the product) – how big, what material, what manufacturing method. Analysis provides the information for rational decisions to be made. DSO is a formalism for carrying forward such decision making and is normally thought of as an automated activity controlled by a computer code. It involves postulating a design, analysing it, deciding if the results are acceptable and if not deciding how to change it. If it is to be changed the process is repeated until we have an acceptable design or we run out of effort/time. To set this up we adopt the ideas of:
Objectives Design variables and their bounds Constraints and their limits Fixed parameters External noise/uncertain parameters Methods of analysis Schemes for linking design variables to analysis Schemes for linking analysis to objectives and constraints
Design Search and Optimisation Course Notes – January 2020
8
3.2 The Nowacki beam problem.
Design Search and Optimisation Course Notes – January 2020
9
3.3 Taxonomy of optimization (methods)
inputs x
some non-numeric
all numeric
continuous some discrete
problems
outputs
y
Single Multiple goals (Pareto fronts)
constraints
hg ,
unconstrained bounds constrained
method
LINEAR PROBLEMS LINEAR PROGRAMMING
OPTIMAL SELECTION INTEGER PROGRAMMING
THE REST
operational research
sorting/exchange methods
methods simplex methods search over vertex space
NOT DISCUSSED FURTHER
NOT DISCUSSED FURTHER
no gradients needed (zero order)
gradients needed (first or second order)
cope with constraints directly
only unconstrained
deterministic stochastic
population based one at a time
the rest of the course!
)(min xfy subject to
0)(,0)( xhxg
functions f
linear non-linear discontinuous
Design Search and Optimisation Course Notes – January 2020
10
3.4 Brief History of Optimization
First came classical calculus and Newton’s method for dealing with functions where we cannot solve explicitly.
From calculus we are familiar with 0)(' xf and VExf )('' for a minimum etc. Newton is basically root
searching for 0)(' xf and is covered in a subsequent lecture.
Next came Cauchy and steepest descent – ie find the downhill direction and move in that direction until we start
to go uphill again – slow in valleys:-
Then came conjugate gradient methods and quasi-Newton methods that exploit local curve fitting based on a
curvature. Various ways of holding information on the local shape. (The Hessian, or its inverse) ie
approximation based on CxbAxxxf TT 21)( where x is a vector and A is the Hessian.
Following these gradient based approaches were a series of Pattern searches which use Heuristics. These
include Hook & Jeeves and the simplex method.
Then a series of stochastic methods including SA, GA, ES and EP all using sequences of random moves and
schemes to exploit any gains made. Often working with population of designs.
Then come the explicit curve fitting methods based on designed experiments such as polynomial curve fits, RBF
schemes, Kriging etc. These can be either global with updates or local with move limits and trust regions.
Finally hybrids and meta searches built on these elements. Key considerations are:
Ability to work with black box codes Need for gradient information Robustness to poor, noisy or incomplete data or badly shaped functions Speed of convergence Ability to run calculations in parallel Repeatability/average performance of stochastic methods.
Staircase effect etc Can be used for analytical forms or numerically.
Design Search and Optimisation Course Notes – January 2020
11
3.5 Geometry modelling & design parameterization The need for parameterization design variables/design intent – the things we are free to change. Flexibility
versus number of variables & tacit knowledge of workable designs. Some examples – look at the Nowacki
beam & ask what are design choices? How do we encapsulate them? What about choice of section? How do
we parameterise a cross section? How many variables do we need?
1) a circle - just need radius – 1 variable
2) a square – just need side length – 1 variable
3) /4) an elipse or rectangle – 2 variables
5) an I section symmetric about N/A or a box L or T sections in all cases we need overall width and depth
plus thickness of web and flange – 4 variables
Can we make a single parameterisation span square, rectangle, box, L, T and I? If so how – do we use an
integer variable as an index plus 4 numbers or can we use the 4 numbers themselves? Clearly
square/rectangle/box can be linked:-
Not obvious how to deal with L, T or I together. If use an index then there is no obvious ranking so search not
simple. We can of course do
or or
As continuous sets Simplest combined form is
for etc
As the solids are when tf= depth/2 tw= width/2
Design Search and Optimisation Course Notes – January 2020
12
To make this generate all shapes we consider 4 rectangles
We make the widths of :
T and B to equal width overall
L and R to equal webt
Height of T equal to flangetopt
Height of B equal to flangebott
Height of L and R equal to height –flangetopt –
flangebott and then add offsets of L and R from the outer edges as
offsetl oroffsetr .
This needs seven variables but can now describe all our shapes as to get L or T we just set
fangebottomt to be
zero. Other forms of parameterization: Discrete and domain element modelling
NACA Airfoils Spline Based Approaches Partial Differential Equation and Other Analytical Approaches Basis Function Representation Morphing Shape Grammars Mesh Based Evolutionary Encodings CAD Tools v's Dedicated Parameterization Methods
Talk through the various figures in sect 2.1 of course book and see PPT of External shape of UAV
3.6 Excel & Nowacki Beam How do we set up searches in Excel?
Try 142 xx (min at 2x ) put this into excel and solve it numerically
4x 42 3x (min at 211x )
Then look at Nowacki beam problem (load the relevant excel sheet and describe).
T
L R
B
Design Search and Optimisation Course Notes – January 2020
13
Design Search and Optimisation Course Notes – January 2020
14
4 LINE SEARCH – SEARCHES WITH ONE VARIABLE Problem – minimize a problem of a single real variable without constraints such as
14)( 2 xxxf
42)( xxf
2)( xf
Simple approach: Take steps looking for change in gradient, need for step length to change accuracy/span small steps for
accuracy v’s large steps for speed. (Bracket a turning point) How to turn a bracket into a tight solution? Is the function smooth? Golden section search (0.61804) Fibonacci search Quadratic search – inverse parabolic interpolation.
Use integers as pointers to a list. (Discrete or integer variable) Bracket as before then Fibonacci or use Golden section/quadratics and round to nearest integer. Materials selection for example. (Non numeric)
4.1 Differential calculus Approach 1 – given a functional form use calculus, ie
1)( 2 Axxxf
42)( xxf
2)( xf , ie VE
4.2 Bracketing Approach 2 – Bracket the minimum between two values and search inwards.
Q – How do we find a bracket, ie a series of three values of )(xf such that )()( 12 xfxf and
?)()( 32 xfxf
A – Guess two values for x, calculate )(xf at these and then head downhill until the function starts to rise and
we have a bracket. If we have no knowledge then use 1,0 21 xx and 3x either -1 or 2 depending on the
gradient (if )1()0( ff then 23 x else -1). Given three points we use a quadratic curve fit and see if a
minimum is predicted (2nd diff is +VE) and if so jump to the minimum predicted and evaluate there. If a maximum is predicted we simply increase the step size (by say a factor of 1.6180 – golden section) and go on downhill, keeping the three lowest values of )(xf in either case.
See code in Numerical Recipes for example. Another approach is just to keep doubling the step size in the
downhill direction until a bracket appears.
4.3 Golden section search Approach 3 – Golden section search (linear convergence – no use of gradients) Q – Given an initial bracket how do we trap the minimum efficiently.
A – Given 21,xx and 3x such that )()( 12 xfxf
)3()( 2 xfxf
min at 2x
Design Search and Optimisation Course Notes – January 2020
15
We choose 4x so that it lies in the larger of the two intervals 1x to 2x and 2x to 3x and such that either
23
24
xx
xx
= 0.38197 or 12
14
xx
xx
= 1-0.38197 = 0.61803, i.e., 38.197% into the larger gap measured from the
centre
Even if the initial bracket is not in the ratio 0.38197:0.61803 this process rapidly settles on this ratio. This approach assumes nothing about the shape of the function and does not require gradients – it is not quite so good as Fibonacci but does not require us to fix the number of function calls a priori (which Fibonacci does).
4.4 Inverse parabolic interpolation Approach 4 – inverse parabolic interpolation, quadratic search or quadratic interpolation Q – Given an initial bracket and assume the function is smooth so that at its minimum it will behave quadratically.
A –.Here we fit a parabola to the bracket and use this to estimate the location of the minimum.
ie we assume ,)( 2 CBxAxxf so that A
Bx
2*
, Axf 2*)(
and we know 2211 )(,)( fxffxf and 33 )( fxf such that 3212 , ffff and 321 xxx
ie CBxAxf 1211
CBxAxf 2222
CBxAxf 3233
We solve these to get
)))(())(((2
))(())((*
23211223
22
2321
21
2223
xxffxxff
xxffxxffx
and
))(())((
))(())((
2321
2212
22
23
23211223
xxxxxxxx
xxffxxffA
(which is always +VE so hence minimum)
For example, consider the function
X1 X2 X4
X3
Ratio 0.38197:0.61803
X1 X4 X2
X3
Ratio 0.61803: 0.38197
Design Search and Optimisation Course Notes – January 2020
16
42)( 34 xxxf with initial data at 2,1,2
1
x so that 42,31,8125.3
2
1321
fff
, ie a bracket.
then
214286.1
)12)(38125.3()2
11)(34(2
)14)(38125.3()4
11)(34(
*
x
This may be compared to the analytical solution given from
064)( 23 xxxf when 0x or 211x
01212)( 2 xxxf at 0x inflexion
=9 at 211x minimum
So the solution is improved ,5932164.2)214286.1( f ie less than all points in initial bracket.
So next triple is 1, 1.214286, 2 with 3, 2.5932164, 4 this leads to
317823.2,465064.1,385534.2,364454.1 *4
*4
*2
*2 fxfx
313928.2,482046.1,334451.2,427849.1 *5
*5
*3
*3 fxfx
OBSV 1.
Now you may ask why not start with x=0,1,2. Trouble with this is f=4,3,4 which is symmetric so 1*1 x which
does not help!! OBSV 2. This search approaches our goal from one side only and so is rather slow – there are better methods! When dealing with discrete variables we use the integers as pointers and either use integer programming or in mixed problems simply round variables to the nearest integer. When our discrete variables have no natural order (ie materials selection) we in the end are forced towards enumeration.
4.5 Newton’s method Approach 5 – Newton’s method All will be familiar with the Newton-Raphson method for finding the root (or zero) of a function. If we apply this to the derivation of a function we can find turning points instead, ie
)(
)(1
i
iii xf
xfxx
NB this needs the second derivative.
Example: 42)( 34 xxxf starting at 2x 23 64)( xxxf
)1(12)( xxxf
)1(12
64 23
1
ii
iiii xx
xxxx
= ix )1(6
32 2
i
ii
x
xx
6
682
)12(6
64222 21
xx
Design Search and Optimisation Course Notes – January 2020
17
= 6667.13
21
52777.149
5
3
5
)3
2(6
5)9
25(2
3
213
x
*50000.1,50097.1 54 xxx
Design Search and Optimisation Course Notes – January 2020
18
5 MULTI VARIABLE OPTIMIZERS
We next consider multiple variables. Here in addition to finding the size of step to make we must also fix the
direction.
5.1 Steepest descent
Perhaps the simplest approach to multi variable optimizing is to identify the direction of steepest descent and go
in that direction until the function stops reducing (the optimal step length *i ) and then recomputed the direction
of steepest descent, ie
)(*1 iiii xfxx where )( ixf are the gradients at ix
Example – minimize 2221
212121 22),( xxxxxxxxf
starting at 1x =
0
0
1
1)(
221
241
/
/1
21
21
2
1 xfxx
xx
xf
xff
To get the optimal step length we minimize ))(( 111 xfxf with respect to 1 , ie set 0/ 1 ddf
21
21
2111
1
11111 22)
1
1
0
0())((
ffxfxf
= 121 2
122 *11
1
d
df
so that
1
1
1
11
0
02x
now
1
1
221
241)( 2xf
1
1)
1
1
1
1())((
2
22222
ffxfxf
Design Search and Optimisation Course Notes – January 2020
19
= 2222
2222 )1()1)(1(2)1(2)1()1(
= 22222422 222
222
22
= 425 222
51210 *22
2
d
df
2.1
8.0
1
151
1
13x and so on to
5.1
1as the answer.
5.2 Conjugate gradient
The problem with steepest descent is that unless our function has circular contours the direction of steepest
descent never points at the final optimum. The conjugate gradient approach seeks to improve over this with
improved directions.
We start as per the steepest descent but at the second step use a direction conjugate to the first, ie
)( 1*112 xfxx but afterwards use
iiii Sxx *1
NB 0jTi ASS if iS and
jS are conjugate directions for a quadratic problem of the form
CxBAxxxf TT 21)(
where 12
1
2
i
i
iii S
f
ffS
and )( 11 xfS
Here iS takes the place of if used in steepest descent. Notice that iS accumulates information from all
previous steps – this is good and bad – good as the direction is conjugate to all previous steps, bad as it can accumulate round off errors – in practice we restart from a steepest descent step after m steps where m is one more than the number of design variables. If our function is quadratic this process converges in as many steps as directions/dimensions in the problem.
Example: minimize 2221
212121 22),( xxxxxxxxf starting from
0
01x
Conjugate gradient search example.
Starting at
0
0and apply steepest descent gives as before
1
12x and 1*
1 and 21
1)(
2
1
fxf
So
1
11S 2
1
1)(
2
22
fxf
Design Search and Optimisation Course Notes – January 2020
20
2
0
1
1
2
2
1
12S
2
0
1
1 *23 x
To find *2 we minimise )
2
0
1
1()( 2222
fSxf
12
1
2f
2222 )21()21(22)21(1
124 222
So 4128 *
222
d
df
5.1
1
2
04
11
13x which is the solution
If we try another step we just find 2
3f is zero and so the process stops, ie 3f is zero at the minimum.
5.3 Newton’s method Newton’s method allows for direction and step size and is built on looking for the roots of ).(xf
First we approximate our function as a Taylor series.
)()(21)()()( 2 ii
Tii
Ti xxHxxxxfxfxf
Where here iH is the matrix of second partial derivatives and is called the Hessian.
Now we set 0)(
jx
xf j=1,2,…..n for n variables
So this gives 0)( iii xxHff
or iiii fHxx
11 this requires a non singular Hessian of course. (So the Hessian both modifies the
search direction and sets the step length).
Example – minimize 2221
212121 22),( xxxxxxxxf starting at
0
01x
1
22
2
12
221
2
21
2
1
xx
f
xx
fxx
f
x
f
H
=
22
24 for all ix
411
1 H
42
22=
121
21
21
ix
i xf
xff
2
1
/
/=
1
1
221
241
21
21
ixxx
xx
Design Search and Optimisation Course Notes – January 2020
21
So
12
12
12
1
0
02x
2
3
1
1
1
1
12f
2
4
0
0
3
3
Hence MINIMUM in 1 STEP This has converged in one step because )(xf is quadratic and so H is a constant. There are however
problems with this approach as we have to compute, invert and store H at each step and this is fraught with
difficulties on real problems. The most serious issue is obtaining the second derivatives as these are very rarely available directly.
5.4 Quasi-Newton methods The Quasi Newton methods work with an approximation of either the Hessian or its inverse. These are sometimes called variable metric methods We already have )(1
1 iiii xfHxx iH is the Hessian
Which we approximate by )(*
iiiii xfBxx
Here iB contains directional information and *i the optimal step length. Note that this is the steepest descent
method if iB = I
There are then a number of schemes for updating iB without using second derivatives but instead using
approximations. None is perfect and they are known by the names of those who proposed them such as BFGS – Broyden Fletcher Goldfarb-Shanno, which is
iT
i
iTii
iTi
Tiii
iTi
iiTi
iTi
Tii
ii gd
Bgd
gd
dgB
gd
gBg
gd
ddBB )1(1
where iii xxd 1
iii ffg 1
We do not pursue such methods further here, they are very popular however. See Figs 3.6,3.7 in book.
5.5 Non gradient based search methods
Pattern or direct search.
What do we do if we cannot calculate gradients (or do not wish to use finite differences – noise/speed).
This leads to Hook & Jeeves amongst others.
H&J method:
1 choose initial step length, set initial point to first base point
2 increase direction i by step and keep if better else decrease direction i by step and keep if better
3 loop over all directions, if none improve then half step size and repeat unless either step too small
or run out of time in which case stop
4 explore must have helped so set current point to new base point
5 make pattern move equal to vector from previous base point to new base point plus any previous
successful pattern move still in use
Design Search and Optimisation Course Notes – January 2020
22
6 if pattern move helps keep it if not go back to new base point and forget pattern move
7 repeat from step two
There are several themes here.
1 steps change in size for exploration
2 directions and steps change for exploitation – if the pattern moves help then they accumulate so
that moves get bigger and bolder until they fail. Siddall provides full details and code, as does
Schwefel.
5.6 Stochastic/Evolutionary search
Run through flying circus slides on simple GA’s.
Design Search and Optimisation Course Notes – January 2020
23
A basic GA flowchart.
Typical search patterns from a GA, Simulated Annealing, an Evolution Strategy and Evolutionary Programming.
Design Search and Optimisation Course Notes – January 2020
24
Design Search and Optimisation Course Notes – January 2020
25
5.7 Termination/Convergence
For local searches we stop at the optimum, ie when no further gains are being made-provided we can afford to
get that far.
For global search we use one of a fixed or limited number of iterations a fixed or limited elapsed time when the search has stalled after a given number of iterations when a given number of basins have been found and searched. We rank searches by steepness of gradient/rate of improvement, final result or a balance between the two.
Design Search and Optimisation Course Notes – January 2020
26
6 CONSTRAINED OPTIMIZATION
In most real world engineering problems the designer has to satisfy various constraints as well as meeting the
desire for improved performance. Indeed performance is often set as a constraint, ie reduce weight to below x,
reduce drag to less than Y etc. Thus we need search schemes to deal with constraints, ie
forxf )(min
nx
x
x 1
subject to bounds on ,x UL xxx
and constraints 0)( xgi (inequality constraints)
0)( xh j (equality constraints)
Here we describe a number of approaches.
6.1 Constraint elimination by construction
The simplest is to try and eliminate constraints by construction – ie transform problem variables using the
constraints.
Example: minimise the surface area of a box of given volume.
ie min )(2),,( HWWBBHWBHf
where WBHv is fixed
so let BHvW
we have
Minimize )(2),,( HBBH
VBHVBHf
B
V
H
VHB
222
So 02
22
H
VB
H
fwhen B
VHBVH 2, or B= 2H
V
02
22
B
VH
B
fwhen H
VB or HVB 2
combining gives
Design Search and Optimisation Course Notes – January 2020
27
34
2
VHH
V
H
V
3 VB
3 VW ie all sides of equal length as expected. Another way we deal with inequalities is by deciding if they will be active at the optimum or not. If so we replace by equality and if not we eliminate them. Often it is not possible to know which inequalities will be active or to eliminate using algebra even if we do! Nonetheless we should not ignore this. It can be done numerically sometimes, ie fixed LC calcs when angle of
attack is a design variable for a wing or aerofoil.
6.2 Lagrange multipliers
In just the same way as there are formal analytic solutions to unconstrained optimization problems the
equivalent constrained solutions are based on Lagrange multipliers. This approach essentially only works for
equality constraints so for inequality constraints a precursor step is to decide at any point if an inequality
constant will be active, and if so replace it with an equality.
So consider min ),( 21 xxf subject to 0),( 21 xxg
ie two variables and one equality constraint.
At a minimum it may be shown that
011
x
g
x
f
GEOM & ALPHA
CFD OPTIMISE with CL= fixed
CL,CD
GEOM
OPTIMISE Iterate on alpha
CFD
CL
OR
CD
Design Search and Optimisation Course Notes – January 2020
28
and 022
x
g
x
f
and 0),( 21 xxg
Here is the so called Lagrange Multiplier.
Now if we write gfL we get
111 x
g
x
f
x
L
all equal zero at the minimum
222 2 x
g
x
f
x
L
from the previous equations
gL
Thus if we seek the unconstrained minimum of L (more precisely, turning points of L) we can locate the solution
to the constrained problem. L is known as the Lagrange function.
For example minimize ),( yxf 2xy
k
Subject to 0),( 222 ayxyxg (ie circle of radius a)
Here )(),,( 2222
ayxxy
kgfyxL
0222 xykx
x
L 423
22
xy
k
yx
k
2
yx
0222
ayxL
Here ya
x ,3 3
2a
022 31 yykx
y
L
Design Search and Optimisation Course Notes – January 2020
29
Note however that we cannot simply minimize L as the approach would admit of saddlepoints or maxima for the gradients of L to be zero.
6.3 Penalty function methods
A more direct approach to dealing with constraints is via the use of penalty functions – we simply add penalties
to the objective function when constraints are violated. There are a number of ways of doing this, none of which
is perfect:-
FIXED PENALTIES Add a (very) large number to the objective if any constraint is broken
Add a (very) large number for each broken constraint
VARYING PENALTIES
FUNCTION OF DEGREE OF CONSTRAINT VIOLATION
Scales the penalties by the constraint violation
FUNCTION OF HOW LONG WE HAVE BEEN SEARCHING
Start with low penalties and gradually make more severe so that an essentially unconstrained search
becomes a fully constrained one
All these are taken to be exterior penalties, ie they only apply to broken constraints – we can also use interior
penalties which come into effect as the search nears a constraint and then gradually remove these as we
progress so as to ‘warn’ the search about nearby problems.
Sketch Penalty Types:
6.4 Combined Lagrange and penalty function method
It is possible to combine the Lagrange scheme with a penalty approach to overcome some of the difficulties of
pure Lagrange methods. This is sometimes called the Augmented Lagrange Multiplier method.
ie minimize )(xf subject to 0)( xh j pj ...2,1
)()(),(1
xhxfxL j
P
jj
is the Lagrangian
We augment this with an exterior penalty
OF
STEP
INTERNAL EXTERNAL X2
X1
OF OF
X1 X1 X1
Design Search and Optimisation Course Notes – January 2020
30
)()()(),,(1
2
1
xhrxhxfrxAP
djkj
P
jjk
It now turns out that minimizing A solves the original problem if we have the correct for any kr . However we
can apply an iterative scheme now that will allow j and kr to converge on a solution provided kk rr 1 and we
use kjb
kj
kj xhr *)()1( 2
ie the new s' are added to by the (scaled) amount of violation of the constraints at the previous minimum of A
This approach can also be extended to inequality constraints by setting up as follows; min )(xf subject to
mixgpjxhj ...1,0),(,...1,0)(
)()(),,(11
xhxfrxA j
p
jjm
m
iii
p
djk
m
iik xhrr
1
)(2
1
2
where i max
k
ii rxg 2),(
6.5 Sequential quadratic programming method (SQP)
The use of sophisticated Lagrangian processes is now at its most complex and powerful in the class of methods
known as SQP – these use typically Quasi Newton methods to solve a series of sub problems. They are the
most powerful methods available for local minimization of constrained smooth problems. Academic codes are
available from the web. They are less good for non-smooth functions and also they are local methods and so
cannot find the best basin of attraction to search.
6.6 (Chromosome) repair
Repair is the process of dealing with a constrained optimization problem by substituting feasible designs
whenever infeasible ones occur during search. To do this a repair process is invoked if any constraint is
violated to find the nearest feasible design. Here nearness is usually in the Euclidean sense of design
variables. Having located such a design (perhaps by a local search where the degree of infeasibility is set as a
revised objective) the objective function of the feasible design is used instead of that at the infeasible point and
also (optionally) the corrected design vector.
Design Search and Optimisation Course Notes – January 2020
31
Replacing the design vector absorbs most information but can cause problems with the search engine. This
approach is most favoured in evolutionary or other zeroth order methods where gradients are not used at all.
Design Search and Optimisation Course Notes – January 2020
32
7 META-MODELS + RSM
So far we have considered optimizers working with results coming from the evaluations of design & constraint
functions that have been presumed to be directly coupled to search codes. These codes then build up a
‘picture’ of how the function is changing with changes in the design and seek improvements. Their internal
models (we will call them meta-models to distinguish from the actual user supplied design models) are implicit in
their working.
We next consider schemes where the building & use of the meta model is explicit and directly controlled by the
user.
At its simplest this consists of running a few designs, collecting the results and curve fitting to these. Then the
curve fits can be used for design search. This would be a natural approach for working with data from previous
designs or from experiments or field trials – it can also be used with computer analysis codes.
We first plan where to run the code to generate data. This can aim to build either a local or a global model
depending on the range of the design variables. We use formal DoE (Design of Experiment) methods for this
(cf Taguchii). Having run the design points, often in parallel, we curve fit. Here again we decide if we need a
local (simple) fit or a global (complex) shape & also if we need to regress (discuss noisy data). Curve fitting can
be fast for simpler models or very slow for large accurate ones.
We call the curve fit a Response Surface Model (RSM) or meta model. Examples include Polynomial
regression, radial basis functions and kriging and neural nets.
Having built a model we check its accuracy with test data (separate) or cross validation. We then use it to
search for a better design. Having found new candidate designs we run the full computer code to check if they
are good. If so we might stop. More usually we add these to the curve fit & iterate – updating, until we run out
of effect or we get convergence etc.
To summarise: the basic steps are
1) plan an experiment to ‘sample’ the design space
2) run codes & build ‘data-base’ of results (possibly in parallel)
3) choose & apply a curve fit, with or without regression
3a) refine curve fit by some tuning process
4) predict new ‘interesting’ design points by searching the meta-model
5) run codes on new point(s) & update data-base (again possibly in parallel)
6) check the results from update points against predictions & then either stop or move back to step 3)
Design Search and Optimisation Course Notes – January 2020
33
Experience shows that for model building it can often take 10n initial designs to build a reliable global model
where n is the number of variables. There is also a trade between the cost of building a meta-model and the
usefulness of its predictions.
Design Search and Optimisation Course Notes – January 2020
34
7.1 Exploitation versus exploration
When constructing and using RSM’s thought must be given to the balance of effort used between exploiting and
exploring the problem. Exploration is the placing of new calculation points in regions so far unsampled – if the
problem under study has more than a few dimensions there will be many areas where fresh sampling may
reveal new trends. Equally if one does not exploit the available information by closing in on promising areas the
search may simply end up as a random walk. This balance between exploring new areas and exploiting known
results can be illustrated by considering using a few downhill searches starting from the most promising results
from a small random DoE. If the DoE is too small then the best areas for search may be missed. Conversely if
the downhill search is limited in scope it may not effectively reach the best design in a local region. One set of
methods that explicitly deal with this dichotomy are the so called probability of improvement methods that use
measures of uncertainty in the predictions generated by the RSM.
Design Search and Optimisation Course Notes – January 2020
35
7.2 Local trust region search
A very simple approach is to evaluate a small local experiments and then shift and shrink it until a certain effort
is used up.
1. choose initial area to search
2. sprinkle in 9 points use LP DoE
3. curve fit with quadratic regression polynomial†
4. search within area over RSM to get new candidate design
5. shift search region centre to new candidate point
6. shrink search region by say 10%
7. replace oldest design point with new result
8. go to step three unless run out of time
† to solve the regression we used SVD to get a least squares solution to the over constrained non square matrix equation
sAy where y are the function values, A are the design variable values and their powers and s are the
polynomial coefficients. So if the SVD of A is '.. VwU then it may be shown that
}{')]./1(diag.[}{ yUwVs j , see next section and also Matlab for simple examples of SVD.
Design Search and Optimisation Course Notes – January 2020
36
This simple scheme has a number of faults:
it has no way of expanding the trust region if the data suggests it should be – this means it may be
a), slow and b), fail to find a local minimum of the function.
The point being replaced, the oldest, may not be the most sensible one to discard – what about
discarding the worst point for example.
7.3 Singular Value Decomposition To solve a general regression problem we can use SVD to get a least squares solution to the relevant over constrained non square matrix equation. This is possible because any matrix with more rows than columns (and also any square matrix) can be decomposed into three matrices as follows:
𝐴 𝑈 .𝑤 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑤
. 𝑉 ,
that is ').(diag. VwU , where the prime indicates the transpose. The properties of these matrices are such that
IVVUU '.'. . That is U a column orthogonal matrix, V is square orthogonal matrix and w is a diagonal matrix. The inverse of A, or at least squares approximation to it if there are more rows than columns, may then be written as
𝐴 𝑉 .
1 𝑤 ⋯ 0⋮ ⋱ ⋮0 ⋯ 1 𝑤
. 𝑈 ,
or ')./1(diag. UwV j . This allows one to solve regression problems by setting up the problem as ysA . so that
the least squares solution becomes yUwV js ')./1(diag. .
Design Search and Optimisation Course Notes – January 2020
37
So if for example there are four data points given at x=-1,0,1,2 with function values of y=1,0,1,2 at these x values and the aim is to find the coefficients of the fitting parabola a, b and c, we set up the matrix equation
2
1
0
1
.
1222
1121
1020
1121
c
b
a or
2
1
0
1
.
124
111
100
111
c
b
a.
Then using the SVD of the non-square matrix we get
2
1
0
1
.15.045.055.015.0
05.035.015.055.0
25.025.025.025.0
c
b
a
so that a=0.5, b=-0.1 and c=0.3 and the fitting parabola is 3.01.025.0 xxy .
7.4 Global RSM search 1) Here we first use 100 points in an LP array to sample the design space.
2) Then we construct a krig (stochastic or Gaussian process) RSM which has hyper parameters which
we tune.
3) We then search for peaks and return 10 likely locations
4) We add these to the original 100 pts to get 110 and we rebuild and retune the krig.
5) We then return to step 3 and repeat 3 times, ending with 130 points and the final model.
Points to note
the use of a large initial DoE is warranted here because of the multi-modality of the problem (the 20
points we might otherwise use).
the updates are added in groups of 10 because we wish to improve the model globally and not just
in one location, also krig training is costly.
the final surface is reasonable but still far from exact.
Design Search and Optimisation Course Notes – January 2020
38
Design Search and Optimisation Course Notes – January 2020
39
7.5 Meta Heuristics
It is clear from the two previous searches that what we have done is combined components such as DoE
sampling, RSM building and various searches to build a composite or ‘meta-search’. It is of course possible to
build more and bigger complexities into such approaches and this leads to a whole family of ‘meta heuristics’
By way of example we can consider combing various gradient descent schemes & then observing which works
best and rewarding this with more of our finite budget of compute resource. The development of such schemes
is an art and also one must bear in mind that the ‘no free lunch’ theorem show that, averaged over all possible
problems, all searches are as good or bad as each other – so unless our efforts are based on tuning a method
to the current task they will be futile – moreover there will be a trade between performance on a specialised task
and general applicability.
7.6 Visualization
Visualization of results can be very important, particularly in unfamiliar problems and where there are more than two or three variables. Parallel axis and Hierarchical Axis Technique (HAT) plots can be very helpful as can polar graphing.
Design Search and Optimisation Course Notes – January 2020
40
0 .0 1
0 .0 2
0 .0 3
0 .0 4
0 .0 5
0 .0 6
0 .0 7
0 .0 8
5 0 1 0 0 1 5 0 2 0 0 2 5 0
5 0
1 0 0
1 5 0
2 0 0
2 5 0
CD
- k rig in g - 2 0 0 p o in t la t in h y p e rc u b e d o e , w ith c o n s t ra in t c o n to u rs
fa s t - , s lo w -
fast
- c
ambe
r, s
low
-
Design Search and Optimisation Course Notes – January 2020
41
0
0.5
1
1.5Wing area (m2)
Aspect ratio
Kink position
Sweep angle (degrees)
Inboard taper ratio
Outboard taper ratio
Root t/c
Kink t/cTip t/cTip washout (degrees)
Kink washout fraction
Wing weight (N)
Wing volume (m3)
Pitch up margin
U/C bay length (m)
D/q (m2)
Cost (£m)
InitialLow dragDoE Point 157Low cost
Design Search and Optimisation Course Notes – January 2020
42
8 MULTI-OBJECTIVE OPTIMIZATION
So far we have focussed on problems with a single goal or objective function. This is rarely how real design
problems occur, although it is quite common, even in industry, to treat them this way. In reality most real design
problems involve trading between multiple and conflicting goals. We therefore next turn to ways of tackling such
problems.
Perhaps the simplest approach (and that most commonly used in industry) is to set all goals except one in the
form of constraints, ie instead of aiming for low weight or stress we set upper limits on these goals and then
ensure that our designs meet them. The difficulty with this approach is deciding realistic but demanding targets:
if they are two severe we may not be able to satisfy them at all, if too loose they may not impact the design at
all.
The next most simple way of proceeding is to use an aggregate or combined objective. Typically we add all our
goals together with some suitable weighting functions and minimize this. This approach is a mimic of the
function of money – money is society’s way of allowing completely different things to be balanced against each
other (the cost of a holiday v a new car for example). It is the function of markets to establish the prices of items
and hence the weighting between them. Ideally the best approach to balancing competing goals to a business
engaged in design is to reduce all decisions to their impact on the company’s profits. Unfortunately this
calculation is almost never possible so some surrogate is used. This may be completely artificial or it may be
some physical quantity such as SFC (aero engine makers often use SFC).
It should be clear that if we consider two goals (say )(1 xf and ))(2 xf then depending on how we weight them
)()()( 21 xBfxAfxf
then our final optimum will vary. For example consider
min 21 )( xxf
min xxxf 4)( 22
then BxxBABxBxAxxf 4)(4)( 222
now we only really need one weight here ABC so we get
cxxcAxf 4)1()( 2
04)1(2)( cxcAxf when )1(2
4
c
cx
Design Search and Optimisation Course Notes – January 2020
43
So for each value of C we get a different solution and two different values of 1f and 2f :
The best designs in our search space are said to be the dominant ones and these are defined formally by the
dominance test: x1 dominates x2 if 1) solution x1 is no worse than x2 in all objectives and 2) solution x1 is strictly
better than x2 in at least one objective. Given a set of solutions, the non-dominated solution set is a set of all the
solutions that are not dominated by any member of the solution set. The non-dominated set of the entire
feasible decision space is called the Pareto-optimal set. The boundary defined by the set of all points mapped
from the Pareto optimal set is called the Pareto-optimal front, or Pareto Front for short. Points laying on the
Pareto Front are said to have dominance rank one. If these are removed from the data and a second Pareto
Front established these designs are said to have dominance rank two and so on until all points have been
classified by rank. The quickest way of find this set when there are only two objectives is to simply plot the
points out and mark of the points that lie on the best side of the data (lower left for a dual minimization problem).
Rank can be effectively used when constructing multi-objective search engines, the best known of which is the
Non-dominated Sorting Genetic Algorithm (NSGA).
When we have more than two objectives and possibly very many solutions we need an efficient algorithm to
establish the Pareto Front. A simple way to do this is the method proposed by Mishra and Harit:
1. Sort all the solutions (P1...PN) in decreasing order of their first objective function (F1) and create a sorted
list (O). If any solutions have the same value for the first objective then sort on the second objective to
order these designs, similarly if the first two are equal use the third to sort and so on.
2. Initialize a set S1 and add first element of the sorted list O to S1.
3. For every solution Oi,i≠1 of list O, compare solution Oi with the solutions of S1:
a. If any element of set S1 dominates Oi, delete Oi from the list and place in the set of dominated
designs;
b. If Oi dominates any solution of the set S1, delete that solution from S1 and place in the set of
dominated designs;
c. If Oi is non-dominated by set S1, then update set S1 = S1 U Oi;
C *x )(1 xf )(2 xf
0 0 0 0
1 1 1 -3
2 3
11 971 9
53
21 3
2 94 9
22
Good designs
2f
PARETO FRONT
1f
Design Search and Optimisation Course Notes – January 2020
44
d. If set S1 becomes empty add Oi to S1.
4. Print non-dominated set S1.
5. Repeat process with dominated designs to find next rank of designs to create sets S2, S3 and so on.
The following page shows this process as part of a spreadsheet, starting with 10 initial sample points of which
five lie on the Pareto front, two in the second rank, two in the third and one in the fourth.
Design Search and Optimisation Course Notes – January 2020
45
Sample number
x f1 f2 RankSample number
xSorted by f1
and then f2
Dominated removed
Sorted by f1
Dominated removed
Sorted by f1
Dominated removed
1 -0.75 0.5625 2.0625 4 4 0 0.0000 0.0000 0.0000 0.0000 0.0625 0.5625 0.0625 0.5625 0.2500 1.2500 0.2500 1.25002 -0.5 0.2500 1.2500 3 5 0.25 0.0625 -0.4375 0.0625 -0.4375 0.25 1.25 0.5625 2.06253 -0.25 0.0625 0.5625 2 3 -0.25 0.0625 0.5625 0.5625 2.0625 2.2500 -0.7500 2.2500 -0.75004 0 0.0000 0.0000 1 6 0.5 0.2500 -0.7500 0.2500 -0.7500 1.5625 -0.938 1.5625 -0.93755 0.25 0.0625 -0.4375 1 2 -0.5 0.2500 1.2500 2.25 -0.756 0.5 0.2500 -0.7500 1 7 0.75 0.5625 -0.9375 0.5625 -0.93757 0.75 0.5625 -0.9375 1 1 -0.75 0.5625 2.06258 1 1.0000 -1.0000 1 8 1 1.0000 -1.0000 1.0000 -1.00009 1.25 1.5625 -0.9375 2 9 1.25 1.5625 -0.9375
10 1.5 2.2500 -0.7500 3 10 1.5 2.2500 -0.7500
List S1int List S1 List S2int List S2 List S3int List S30 0 0.0000 0.0000 0.0625 0.5625 0.0625 0.5625 0.2500 1.2500 0.2500 1.2500
0.0625 -0.4375 1.5625 -0.9375 2.2500 -0.75000.2500 -0.75000.5625 -0.93751.0000 -1.0000
‐1.5
‐1
‐0.5
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2 2.5
f2(x)
f1(x)
f1(x) Vs f2(x)
Sample Points
Dominated removed
‐1.5
‐1
‐0.5
0
0.5
1
1.5
2
2.5
‐1 ‐0.5 0 0.5 1 1.5 2
f1(x) and f2(x)
x
Design Search and Optimisation Course Notes – January 2020
46
This process is used recursively by removing the dominant set from the data to establish the lower ranking sets.
It also works extremely quickly for problems with two objectives, simply requiring that data be kept in sorted
order of the first objective as the dominant solutions are identified.
8.1 Methods for combining goal functions
It will be clear from considering Pareto fronts and simple weighted sums of goals that deciding how to combine
goals will define the final designs selected. Before looking at more advanced schemes for finding the front we
briefly explore how a design team might decide on a combined objective, either to reduce the problem to a
single goal or to allow selections to be made from those designs that are found to lie on the Pareto front. All
such methods attempt to formalise the process of assigning importance to the goals under review.
a) simple voting schemes – each design team member ranks the goals, the goals are then given points from
1(lowest rank) to n (highest rank in n goals) and then these are summed across the team to result in a
weighting scheme. Before application all goals are divided by the designers ideal value (or the ideal is
subtracted from the goal) to allow for the units in use. This simply ensures that all voices are heard.
b) The eigenvector method. Each pair of goals is ranked on a matrix by being given a preference ratio, ie if
goal i is three times more important than goal j we set 3ijp . Then say goal j is twice as important
as goal k we set 2jkp etc. To be consistent we should of course say that 6ikp but in fact the
method does not require this. In any case we then form all the p values into the matrix P and seek W so
that wPw max , ie the eigenvectors of p are found.
We then take the eigenvector with the largest eigen value and use this as our weighting scheme.
If we have
121
61
2131
631
p
We get w=
111.0
222.0
667.0
The largest eigen value should equal the number of goals if the p’s are consistent as here (we get a value
of 3).
Say however we were not consistent and used
Design Search and Optimisation Course Notes – January 2020
47
121
51
2131
631
p
Then we get a largest eigen value of 3.004 and the weight vector becomes
122.0
230.0
648.0
w
That is we decrease the importance of goal 1 and increase that of goals 2 and 3.
This scheme is simple and easy to use up to around 10 goals at most and is quite useful for more than 4
goals where it is difficult to assign numerical values to the weights in a consistent fashion. It is still the
case however that the aggregate goal is a simple linear sum of the individual functions.
One way of combining goals that is more elaborate is via the use of fuzzy logic. Thus we define a series
of linguistic terms that describe our goal and map these to a score:-
Thus a bad value scores nothing and a good value scores 1 while those in the indifferent range have
intermediate scores. If we do this to all functions the resulting scores can then be combined by adding
(essentially an average) or multiplication (a geometric average). We then maximize the combined function.
Various shapes for the so called membership functions can be used but there seems little to be gained from
going above the linear form sketched here.
When used these functions essentially allow non-linear combinations of goal functions which clearly allow more
complex combined goals – however if taken too far they can obscure the overall problem!
Score
1
0
BAD INDIFFERENT GOOD
Design Search and Optimisation Course Notes – January 2020
48
Below is an example of Fuzzy Logic for functions f1=1/x and f2=x2, where we say both functions are considered
unacceptable when above 2 and acceptable when below ½. We then use a linear scaling of the two functions
between these limits, i.e., the membership is proportional to the function and equal to zero when the functions
are 2 and unity when they are equal to ½. Thus the equations for the varying part of the membership functions
are: memb=4/3 - 2f/3 (i.e., when f is 2 memb is 0 and when f is ½ memb is 1 with a linear variation – here they
are both the same simply because the two sets of limits on the functions are both the same).
1
xa xb
memb f1
1 0Score
x
Score
1
0
f1
f2
x
xa xc xb xd
score = overall goal
2
1
x
xa xc xb xd 1f
AND SIMILAR FOR dc xxf ,,2
Design Search and Optimisation Course Notes – January 2020
49
8.2 Methods for finding Pareto fronts Sometimes we do not wish to combine goals without first examining the Pareto front itself. Thus we need to
construct the front. We then have 3 goals.
1) the designs we study truly lie on the front, ie they are well optimised
2) the front has many designs that span its full extent, ie it is ‘well’ populated
3) the points are evenly spread on the front, ie we have smooth range of goals.
This is in fact an optimization task in its own right and may be tabled in a variety of ways.
A Perhaps the simplest is to construct a family of different combined goals with various weighting schemes
and then optimize these (including dealing with each goal on its own). Although this does not tackle point
3 above it focuses on 1 and gives as many points as desired for 2. It is however expensive and known to
fail to evenly populate the font, especially if the font is concave (using a linear sum of goals is equivalent
to finding the intersection of a target line and the front and targets only exist for convex fronts).
B The next best scheme is to use an optimizer to explore the design space placing any new non-dominated
points in an archive (and weeding out any dominated ones). Then all new design points are given a goal
value based on how much they improve the archive – ie how dominant they are. This means that the
objective function is non-stationary but provided our search is tolerant of this, this approach works fine (an
evolution strategy works quite well on this).
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0
1
2
3
4
5
6
7
0 0.5 1 1.5 2 2.5 3
function values f 1an
d f2
variable x
f1(x)=1/x
f2(x)=x^2
memb1
memb2
combinedupper limit on f(x)
lower limit on f(x)
Design Search and Optimisation Course Notes – January 2020
50
C Use a multi objective population based search (such as a GA). Here we aim to advance the whole front
in one go so the points in the population are scored against each other and those that dominate most
score most – this is capable of meeting all three goals if some pressure to spread out points is included.
Its weakest aspect is probably in finding the extreme ends of the front but these can be found from single
objective searches directly on each goal.
It is also possible to use response surface schemes to help reduce run times when finding Pareto fronts.
Design Search and Optimisation Course Notes – January 2020
51
9 ROBUSTNESS IN OPTIMIZATION AND UNCERTAINTY QUANTIFICATION (UQ)
Increasingly designers must also contend with the fact that their designs and analyses are subject to a range of
uncertainties and that designs optimized purely for nominal performance may suffer from significantly degraded
performance when subject to such variations. These uncertainties stem from a range of sources:
• limits on accuracy during manufacture,
• variations in operating conditions,
• wear and degradation in service,
• limited accuracy in the physics of the computational models invoked,
• limited convergence in any iterative numerical scheme used in computations,
• round-off and discretization errors.
It is therefore of increasing importance to study designs from a stochastic perspective and to formally quantify
the impact of such uncertainties on predicted design performance rather than relying on ad hoc factors of safety
and tolerance settings. One way of doing this is to invoke the formalism of Robust Design Optimization (RDO),
which leads naturally to a multi-objective problem where designers seek to improve the mean performance of a
design while guaranteeing that fall-off away from mean conditions is strictly controlled.
RDO can, however, be carried out in a range of ways and can additionally make very effective use of surrogate
approaches to model building. The following figures illustrate the basic idea of robustness.
Gaussian Noise
Design 1 is more sensitive than 2 even though f(x1) is better than f(x2)
Design Search and Optimisation Course Notes – January 2020
52
Uniform Noise
This kind of lack of robustness in design variables (design 1 above) is common when dealing with constrained
problems - designers typically stay well away from critical stress limits if they can for example to avoid
unexpected structural failures.
More controlled ways of working all involve trying to simulate uncertainty in the design calculations being used –
so called Uncertainty Quantification or UQ. A prerequisite is to know something of the real uncertainties
anticipated. This requires DATA.
If however we know something of the uncertainties inherent in the design, manufacture and operation we can
attempt to account for this.
N.b., the mean or expected value of a function is given by integrating the function multiplied by the probability
density function (PDF, shown in blue in the figures above) from minus infinity to plus infinity, i.e.,
mean
dxxPDFxfxfExf xf )().()]([)( )( . An approximation to the mean may often be calculated
by averaging over an ensemble of appropriately spaced values of the function, so for example a typical way of
approximating the mean is by assuming
n
iixf xfn
1)( )()/1(~ as n , the so called Monte Carlo
Design 1 is more sensitive than 2 even though f(x1) is better than f(x2)
Design Search and Optimisation Course Notes – January 2020
53
method. Note also that for a uniform distribution between lower limit a and upper limit b, the PDF has height
1/(b-a), mean (a+b)/2 and variance (1/12)(b-a)2.
9.1 Monte-Carlo method The most obvious, simple and direct scheme for UQ is the Monte Carlo approach. We simply generate a series
of scenarios using suitably biased random numbers and run our design calculations at each before working with
mean or worst case designs – unfortunately generating such means or worst cases requires 100’s of
simulations – this is usually way too expensive. Costs can, however, be reduced by adopting sampling plans
that are not purely random. Schemes such as LPτ sampling can give much better convergence that pure
random ones. It is also possible to design a sampling sequence specifically to match the problem being studied
so as to accelerate the convergence of statistics – the so called quasi-Monte Carlo approach.
9.2 Design of Experiment methods The next approach is to replace full Monte Carlo sampling with limited size DoE variations around each design
point to gain some idea of local sensitivities. Typically one uses between five and 30 variations at each design
to try and characterize the issues. Taguchii arrays are typical of this kind of approach and have been widely
used in industry.
9.3 Noisy Phenotype A third approach is the so called ‘noisy phenotype’. In this case a standard design search is carried out but at
each iteration noise is added to the design variables. Then the resulting perturbed design results are used to
characterise the design. This makes all derived qualities non-stationary during the search so a suitable method
must be used that is tolerant of this. Sometimes the nominal and perturbed designs are both evaluated and the
worst used as the characterising design. When used with a GA for example this tends to mean only robust
designs survive the evolutionary process.
9.4 Response surface approach A method increasingly popular in industry is to run a medium sized DoE and then build a global response
surface through the resulting data. This response surface is then used for large scale Monte Carlo sampling to
build models of robustness. The main weakness of this scheme is that the more unusual designs and events
that lead to extremes of behaviour may not be captured by this process. Therefore as design decisions focus in
on promising areas update points should be run and the RSM rebuilt and re-explored so that the surrogate
model is well set up where it needs to be.
N.b. as we have seen response surfaces can also be used to support optimization so these two approaches can
be used together for Robust Design Optimization (RDO) where quantities such as the mean performance of a
product and the variance in that performance are the goals being optimized. In such circumstances a number of
different ways can be used to assist the search with surrogates. The most obvious approach is to build
surrogates of surrogates, i.e., using surrogate models to speed up the process of estimating the mean and
variance of performance and then building a further set of surrogates to model how these (surrogate derived)
quantities change as design variables are altered, the aim, of course, being to find designs with good mean
performance and simultaneously low variance in that performance. Such nesting of surrogates may not be the
Design Search and Optimisation Course Notes – January 2020
54
most efficient use of computing effort, however, since it does not make use of the fact that uncertainty behaviour
is highly likely to be correlated between designs that differ only modestly in their configurations. It does however
offer simplicity and separates the two activities of DSO and UQ.
A better approach may be to treat the design (control) and uncertainty (noise) variables in a single response
surface that could then be repeatedly sampled over the noise variables at any given set of control variables to
establish response statistics. Since design intent tends map to large scale variations in the control variables and
significant consequent variations in performance, while uncertainty generally stems from small scale variations
in the noise variables and more limited changes in performance, this poses the problem of how best to sample
the available computer codes and to construct surrogates that maximize the effectiveness of the overall design
process. For example, many deterministic surrogate based optimization studies make use of only a few hundred
performance simulations in total, while it is quite common to use at least this many simulations to propagate
uncertainty for a single design configuration in assessing robustness. Should, therefore, the density of sampling
to build a combined surrogate be uniform over the combined control and noise space or should it favour the
noise space?
Other combinations of surrogates are possible, including using multiple levels of fidelity to address the issue of
correctly sampling uncertainty while not using excessive numbers of function calls. Additionally, in some cases
the robustness problem being tackled arises from the need to guard against an inability to guarantee that the
design variables set during product definition are achieved in practice. This can arise where manufacturing
variations impact on the chosen geometry. In such circumstances design variables play the role of control and
noise variables and may be studied both for their overall impact on mean performance and also for variation in
that performance.
Pots illustrating the convergence of Monte-Carlo sampling using random numbers (–), using LPτ pseudo
random numbers (--) and Krigs based on LPτ pseudo random numbers (-.) for the test functions of equations (1)
and (2) as sample sizes change, along with the actual values taken from the equations shown as horizontal
lines; left means, right standard deviations.
Design Search and Optimisation Course Notes – January 2020
55
9.5 Stochastic Solvers The most complex approach to robustness is to build so called ‘stochastic solvers’. These are codes like finite
element packages which instead of reading in deterministic problem statements for geometry and loadings can
accept these specified in probabilistic form. They then directly compute probability measures for the response
quantities of interest. Such methods are currently in their infancy but can be expected to become prevalent over
the next 10-20 years.
In whatever form robustness is considered it invariably leads to a multi-objective design problem because the
designer will desire good performance for the nominal geometry AND robustness to likely variations. This tends
to lead to Pareto fronts with mean and standard deviation as axes. Thus multi-objective search tools should be
used for robust design, rather than simply examining robustness as an afterthought.
9.6 Robustness
Robustness may be described as insensitivity to change; thus a design is said to be robust if its performance is
relatively unaffected by any uncertainty in its design, manufacture or use. This characteristic is often in direct
conflict with optimal nominal performance – a design that has been heavily optimized to operate well when
perfectly made and operated in ideal circumstances may turn out to be very near to non-linearities in its
performance – at the very least, if truly optimal, any change in manufacture or use will, by definition, result in
degradation in performance.
Robust design methods all involve trying to simulate uncertainty in the design calculations being used, so as to
evaluate the impact of these uncertainties on the designer’s goals. Here we characterize the quality of any
design by its location on the Pareto front and the nature of other designs lying on that front. A prerequisite is, of
course, to know something of the actual uncertainties likely to be encountered in practice, ideally using real
world data. If, however, we just know something of the sensitivities inherent in the design, manufacture and
operation, we can attempt to account for this in DSO, even in the absence of actual variability measurements –
in general sensitive design tend to be less robust than insensitive ones.
Design Search and Optimisation Course Notes – January 2020
56
Construction of Pareto front from variance analysis.
Dotted lines show variance
f(x))
xa xb
Range at x2
Robust
FragileRange at x1
Range at x2
Robust
Range at x1
Fragile
Design Search and Optimisation Course Notes – January 2020
57
Uniform Noise
Gaussian Noise
9.7 A Simple Example
Consider the performance of a manufacturing process to be characterised by the performance index f(x)=x+1/x
where x is a control variable set by the users in the range 0.75<x<2. If f(x) is to be as low as possible what is the
optimal setting of x? If x is subject to uniform random noise such that the probability density function of the noise
takes the form of a unit square centred at the nominal value, can we derive an expression for the mean value of
the performance index and hence determine what value of x that will give the lowest mean value? What
percentage deterioration in nominal performance must be accepted when using this optimal setting?
First consider the PDF in use here – this is a rectangle centred at the design point of interest of width unity and
height unit (so that the area under the PDF is one). So to establish the mean value at any design point we have
to integrate from 0.5 below that point to 0.5 above it, i.e., the mean of f(x) varies with x and is defined by
𝜇 𝑥 𝑥1𝑥
d𝑥.
.
𝑥 ln 𝑥 0.5 / 𝑥 0.5
Which we can plot out as:
Locus of expected value and variance as x increases
PARETO FRONT
PARETO FRONT
Locus of expected value and variance as x increases
Design Search and Optimisation Course Notes – January 2020
58
Notice that the mean performance is always worse (higher) than the nominal and more importantly its minimum
is in a different place. The minimum value of the nominal design curve is at (1, 2.0125) while that for the mean
curve is at (1.1180, 2.0805), i.e., some 3.4% worse if the correct, noise adjusted, design choice is made or
4.3% worse if no adjustment is made for the uncertainty in the system.
We can see this effect more clearly by plotting the Pareto front of standard deviation versus mean behaviour for
the system.
In this case there is a single design that dominates all other choices so there is no need to consider trade-offs
by comparing different points along the front. This is because there is a single minimum in the nominal and
mean performance curves.
Design Search and Optimisation Course Notes – January 2020
59
9.8 Two variables and two objectives
When we have more than one design variable the problem becomes more involved: consider a twin objective
problem made up from the Branin function
10cos8
11106
5
4
1.51
2
12
21
2
x
xxx
(1)
augmented by a second function of the form
2
2
2
1
2
2212
2
2414
78
218
78
cos18
cos78
cos18
cos201
3
50000
xx
xxxx
. (2)
These two functions are illustrated as contour maps below. The Branin function has three equal minima at x* =
(-π , 12.275), (π , 2.275), (9.42478, 2.475), where f1(x*) = 0.3978 while equation (2) has a single minimum at x*
= (5.1116, 8.0054), where f2(x*) = 11.1484. Also shown on these two plots are the results of a series of multi-
objective searches which are discussed later. It turns out that these functions can be used to construct an RDO
test problem where exact results can be obtained for the uncertainty propagation, thus allowing a true Pareto
front of mean performance versus standard deviation in that performance to be constructed.
Contour maps illustrating the test functions of equations (1) and (2), along with the results from exhaustive
searches based on direct use of equations (1) and (2) – solid line; and the results of multi-objective searches: +
– direct NSGA2 search, × – Kriging based NSGA2 search
Robustness almost invariably leads to a multi-objective design problem because the designer will desire good
performance for the nominal or mean design and robustness to likely variations. As already shown, this leads to
Pareto fronts, typically with mean and standard deviation as axes and it is rare for these to show just a single
Design Search and Optimisation Course Notes – January 2020
60
dominating design. Thus multi-objective search tools should be used for robust design, rather than simply
examining robustness as an afterthought.
To illustrate these ideas consider the random process given by
10cos8
11106
5
4
1.51
2
1
2
21
2
x
xxx
+ 2
2
2
1
2
2212
2
2414
78
218
78
cos18
cos78
cos18
cos20
1
50000
xx
xxb
xxa
(3)
where a and b are independent random variables uniformly distributed in the range ±1. It is relatively easy to
show that this function has a mean value given by equation (1) and a standard deviation given by equation (2),
i.e., as already illustrated in the figure above. Note that the function of equation (2) therefore shows how much
variation we expect in the design performance as the two design variables change – essentially a central area of
modest uncertainty which rises rapidly towards the edges of the design ranges, as often occurs in real
manufacturing processes or operational environments. If we carry out a robust design analysis of this problem
the two extremes of the Pareto front now represent a design with best mean performance at one end and the
smallest standard deviation at the other (the two ends of the solid line in the figures). In this case the best mean
performance can only be achieved at the expense of moving outside of the central area of low variance in
performance – again a not unrealistic outcome.
If the same LPτ pseudo random number sequences are used when sampling all locations in the design space
following equation (3) (i.e., with the mean and standard deviations changing as per equations (1) and (2)) it is
possible to plot out the errors in the estimated response statistics as contour maps, see figures below, which are
drawn for an ensemble size of five. Note that these errors are merely typical results and would differ for each
random number sequence used in their construction and for differing sample sizes.
Design Search and Optimisation Course Notes – January 2020
61
Contour maps illustrating the errors (left – mean, right – standard deviation) in pseudo Monte-Carlo models
based on five LPτ samples of equation (3), along with the results of a multi-objective exhaustive direct search
with varying weights on equations (1) and (2).
The consequences of the errors inherent in quantifying uncertainty with limited sample sizes are seen when one
attempts to carry out robust design optimization. If uncertainties are not estimated accurately any search may
be misled and the resulting Pareto fronts lie away from the true results, thus leading designers to make poor
choices when deciding robustness trade-offs. Consequently we next consider the various ways in which
surrogates can be used to help support such work, first in uncertainty quantification and then to speed up design
search and optimization runs.
9.9 Using Surrogates to Support Uncertainty Quantification
Although low-discrepancy sequence sampling plans can often speed up the process of estimating uncertainty
statistics they still do not capitalize on all the information present in the sample data being used. In particular,
when used naively, no account is taken of the locations of the samples with respect to each other when deriving
the statistical moments. An alternative is to build a surrogate that relates the desired performance quantity of
interest to the noise parameters. Then the surrogate can be integrated in lieu of the original problem to calculate
the required moments, either in closed form or via very dense sampling across the cheap-to-evaluate surrogate.
In the example just introduced there are two noise parameters (a, b) and these impact linearly and additively on
the performance function f(x1, x2, a, b) of equation (3). So if we take sets of five LPτ samples and instead build a
Krig relating f(x1, x2, a, b) to a and b at each individual value of x1 and x2 we can establish a much more
accurate model of the uncertainties in the problem, see figures below. Now the errors in the response statistics
are between one and two orders of magnitude less than for the simple direct calculations, albeit that a separate
Krig has been built and tuned for every point evaluated in these plots; in this case as 51x51 sets of samples
have been used, this means 2601 Krigs have been constructed and tuned in total, each of which has then been
sampled 500 times to establish the mean and standard deviation values.
Clearly such an approach can only be justified if the costs involved in building, tuning and sampling the Krigs
are substantially less than for evaluating the function itself1. This in turn depends on the complexity of the
relationship between the noise variables and the performance function being studied. If this relationship is
straightforward and involves relatively few variables the cost of Krig construction is low and the results can be
startling accurate. If, however, this is not the case and a significant number of function evaluations are needed
to characterize the relationship over many noise variables these costs may become unaffordable, even
compared to the cost of complex CFD or structural analysis. Alternative, faster modelling methods such as GPU
trained neural networks may then be the only affordable approach.
1 Here we tune each Krig by first running a global genetic algorithm or ant colony search of the log likelihood and then take the best result and improve it with a local gradient based search – each step in the searches requiring a solution of the dense matrix equations containing the sample data, (see Toal et al (23)).
Design Search and Optimisation Course Notes – January 2020
62
Contour maps illustrating the errors (left – mean, right – standard deviation) in Krig models based on five LPτ
samples of equation(3), along with the results of a multi-objective exhaustive direct search with varying weights
on equations (1) and (2).
9.10 Robust Design Optimization with Basic Surrogates
We next examine the effects of these various ways of carrying out uncertainty quantification on multi-objective
searches to establish the Pareto front that trades mean performance against standard deviation in that
performance. To begin with, and to establish appropriate datum results, we show three search methods applied
to the analytical expressions for the mean and standard deviation defined by equations (1) and (2).
First we establish the true Pareto front using a series of single objective searches where the two functions are
simply added with varying weights (this produces the solid line in the previous figures). Here we use some
10,000 separate search runs (involving around one million functions calls in total). This exhaustive search
guarantees the correct front in this case because it is non-concave throughout and so a simple sum produces a
continuous series of correct results. Clearly such an approach is not practical in real-world problems and is not
so reliable on more complex functions.
Having established the true Pareto fronts by exhaustive search we consider a number of more realistic and
affordable approaches, see the figure below. In each case this requires the evolution of the front using a multi-
objective search engine. Here our searches are all based on the NSGA2 paradigm. NSGA2 is a long
established global multi-objective search method that has a good track record of being able to establish high
quality Pareto fronts. Our first approach is a direct NSGA2 search of equations (1) and (2) (the + markers in the
figures) that uses 30 generations with a population size of 100 and therefore 3,000 evaluations of the equations.
NSGA2 is clearly capable of correctly recovering the Pareto front as would be expected given a suitable budget
and access to the analytically correct functions for the objectives.
Design Search and Optimisation Course Notes – January 2020
63
Left, Pareto fronts found from exhaustive searches based on direct use of equations (1) and (2) – dashed line;
along with the results of multi-objective searches: + – direct NSGA2 search, × – Kriging based NSGA2 search.
Right, convergence metrics of direct NSGA2 and Kriging based NSGA2 searches.
Then we use a Kriging based NSGA2 search (the × markers in the figures). To do this an initial surrogate model
of the exact mean versus standard deviation space is established by selecting design variables according to a
30 point LPτ sequence, evaluating equations (1) and (2) at each combination. This is followed by five updates in
batches of 10 keeping no more than the most promising 50 designs for surrogate construction, i.e., a total of 80
evaluations of equations (1) and (2). Note that the response surface is not being used to help quantify
uncertainty (i.e., to estimate mean and standard deviation), rather it is used to model how the uncertainties
given by the exact expressions of equations (1) and (2) vary with design changes. The power of the Kriging
based approach in this search is immediately clear in that it is directly competitive with the direct NSGA2 search
that uses nearly 40 times as many evaluations of these equations.
Of course, in real problems one does not have the luxury of closed form equations for the mean and standard
deviation; instead some form of uncertainty quantification as set out in the previous sections must be employed.
So next we repeat our searches but using five-point LPτ sampling to establish the response statistics from
equation (3), again using an exhaustive search, NSGA2 directly or with surrogates at each design point and
NSGA2. The results of these searches in terms of the Pareto front trade-offs can be seen in the figure below.
Now the searches all trend to an incorrect model of the Pareto front.
Design Search and Optimisation Course Notes – January 2020
64
Pareto fronts found from exhaustive searches based on five point LPτ sampling of equation (3) – solid line; and
equations (1) and (2) – dashed line; along with the results of multi-objective searches on the five point sampling
combined with direct noise variabilities a and b: + – direct NSGA2 search, × – Kriging based NSGA2 search.
It might be concluded from this that there is little point in using such limited sample sizes during robust design
searches. However, one must recall that the purpose of robust design is not to build accurate response
prediction models per se – rather the aim is to choose values of the free design variables that give robust
behaviour in practice. Just because the limited sampling fails to yield a completely accurate model, this does not
mean that design values based on these models will fail to behave as desired, since the search process has
sampled a considerable quantity of useful data. To see this one needs to take designs from the Pareto fronts in
the previous figure and insert the relevant values of x into equations (1) and (2) to establish whether or not
these designs have any merit. The figure below shows the resulting Pareto fronts and it is seen that in fact the
results of the searches are useful in locating the correct Pareto optimal designs, despite the prediction errors in
the small sample sizes used for UQ.
Design Search and Optimisation Course Notes – January 2020
65
Pareto fronts found from exhaustive searches based on five point LPτ sampling of equation (3) – solid line; and
equations (1) and (2) – dashed line; along with the results of multi-objective searches on the five point sampling
but evaluated using equations (1) and (2): + – direct NSGA2, × – Kriging based NSGA2 search.
Lastly, before moving on to more advanced techniques we add in the use of Krigs to help propagate uncertainty
with our two basic search methods (direct NSGA2 and surrogate assisted NSGA2), see figure below. In this
case the approach completely repairs the damage caused by limited sample size for the direct noise
variabilities. However, training the surrogate for each set of design variables considerably adds to the cost of
working in this way – essentially if the uncertainty behavior is benign and not dependent on too many noise
variables using surrogates for UQ may be worthwhile, otherwise more advanced methods will be needed, which
we will turn to next. Even when it does help, we need to recognize that we are training surrogates on the
behavior of lower level surrogates in the third of these searches, rather suggesting that more sophisticated
methods should offer greater promise.
Design Search and Optimisation Course Notes – January 2020
66
Pareto fronts found from exhaustive searches based on Krigs built from five point LPτ sampling of equation (3) –
solid line; and equations (1) and (2) – dashed line; along with the results of multi-objective searches on Krigs
built from the five point sampling combined with direct noise variabilities a and b: + – direct NSGA2 search, × –
Kriging based NSGA2 search.
9.11 Robust Design Optimization with Advanced Surrogates
We finally introduce two more advanced ways of dealing with the combined problem of uncertainty propagation
and Pareto front search.
9.11.1 Co-Kriging
Our first advanced method makes use of the formalism of co-Kriging where results with multiple levels of fidelity
can be combined during the search. To do this we combine the results from limited sample size UQ with those
for more expensive UQ with many more samples. To make this approach worthwhile we can only use the high-
fidelity calculation very sparingly – here we start the response surface based search with a DoE of 30 design
vectors but we then calculate the 100 point LPτ sample results for just the first four of these design points while
we calculate five-sample results at all 30. We then build a pair of multi-fidelity co-Krigs (one for mean and one
for standard deviation) with all 34 results and use these to estimate the response statistics of the functions being
searched, see figure below.
Design Search and Optimisation Course Notes – January 2020
67
Flow chart illustrating the update sequences used in co-Krig approaches.
In co-Krigs the inputs to the low fidelity (cheap) and high fidelity (expensive) calculations, xc and xe, are taken to
be related to the outputs (responses) yc and ye by computational functions, yc=fc(xc) and ye=fe(xe). The
responses resulting from the DoE over these codes are used to construct an approximation
)(ˆ)(ˆˆ xx dce ffy (4)
which is the sum of two Gaussian process models, each of which depends on the distances between the
sample data used to construct them. Here the hat symbols indicate the models are approximations, the
subscript d indicates a model of the differences between the low and high fidelity functions (all the high fidelity
evaluations are carried out at locations where low fidelity calculations have already been run) and ρ is a scaling
parameter. The distance measure used here is
Design Search and Optimisation Course Notes – January 2020
68
hpjh
ih
k
hh
ji xxd )(),( )()(
1
)()(
xx (5)
where h and ph are hyper-parameters tuned to the data in hand and k is the number of dimensions in the
problem. The correlation between points )( ix and )( jx is then given by
ijjiji d )],(exp[),( )()()()( xxxxR (6)
where is a regularization constant that governs the degree of regression in the model (when set to zero the
Krig strictly interpolates the data supplied) and ij is the Dirac delta function. When the response at a new point
xnew is required, a vector of correlations between the new point and those used in the DoE is formed
),(),(
),()(
222
2
newedd
newecc
newcccnew
xxRxxR
xxRxc
(7)
where the σ2 are the variances in the cheap and difference Gaussian models. The prediction is then given by
)1()(ˆ 1 yCcx Tnewey (8)
where 1C1yC1 11 / TT and .),(),(),(
),(),(2222
22
eeddcecccecc
eccccccc
xxRxxRxxR
xxRxxRC
When building
co-Krigs it is still necessary to carefully tune the sets of hyper-parameters to match the data in use – for co-Krigs
this tuning is applied to the low fidelity data, data representing the differences between the low and high fidelity
series and the ratio ρ parameter to link the various data sets. Fortunately for the small numbers of results
typically available in such work this is not overly expensive.
Design Search and Optimisation Course Notes – January 2020
69
Simple comparison of a Krig through expensive data and co-Krig built on a combination of cheap and expensive
data.
The resulting co-Krigs are then typically searched with the NSAG2 algorithm and updated, say 10 times using
10 new design vectors taken from the approximated Pareto front at each stage, but of these 10 using only one
result midway through the update set analysed using both levels of fidelity (sample sizes). Thus at the end of
the search in this case we have evaluated the low-fidelity model 130 times but now we have also used the high
fidelity model 14 times (four in the original DoE and one during each of the 10 update cycles), leading to a total
number of individual design calculations of 130×5+14×100=2,050. This needs to be compared to the previous
Kriging search where only low-fidelity calculations were used leading to 80×5=400 calls or one based
completely on high fidelity sampling of say 80×100=8,000 calls. The aim is to achieve results comparable to
using 8,000 calls with costs more comparable to working solely at the lower fidelity – this approach is intended
to mitigate the problem that, when working solely with limited sized (Monte-Carlo) uncertainty samples on real
world problems, searches typically returned designs that failed to fulfil their promise when evaluated with larger
sample sizes.
Using the co-Kriging approach, the results for the test problem are as shown in the figures below (a typical final
Pareto front is shown along with the convergence behaviour of a series of independent runs of the search).
When searching the base formulas of equations (1) and (2), the figure shows that the final Pareto points are
now almost as tightly clustered around the true Pareto front as and already shown above – similar HV metrics
are obtained with the budgets used. Of particular interest are the high fidelity results shown in the figure where
now these samples accurately reflect the true trade-off between mean and standard deviation, while the low
fidelity results continue to track the low fidelity front, i.e., the co-Kriging approach has achieved the desired
Design Search and Optimisation Course Notes – January 2020
70
outcome, identifying the location of the knee in the Pareto front and ensuring that the correct trade-off
information is being used.
Pareto front found from exhaustive searches on equations (1) and (2) – dashed line; along with the results of
multi-objective searches on the five / 100 point LPτ sampling of equation (3) combined with direct noise
Design Search and Optimisation Course Notes – January 2020
71
variabilities a and b: co-Krig based NSGA2 search – solid line; + – low fidelity points, × – high fidelity points.
Pareto front normalized hypervolume metric convergence for reference point (100, 200) based on high fidelity
points only, for nine independent runs.
9.11.2 Combined Kriging
The second advanced approach we demonstrate requires a slightly more intrusive change to the problem
handling: we build a single combined Krig that is used to both support search and carry out uncertainty
quantification, an approach we term combined Kriging. Clearly, if the low level Krig can accurately model the
underlying functions it will allow very accurate statistics to be computed albeit at the cost of sampling the Krig
multiple times. So to begin with, an LPτ DoE is carried out where both the design variables and noise variables
are all varied simultaneously. Next a four dimensional Krig is constructed through this data. Then 500 point low
discrepancy sequence sampling over the two noise variables is carried out on the combined Krig, at any desired
pair of design variables, so as to return the desired predictions of mean and standard deviations of
performance, see figure below. Notice, that when evaluating new update points, the search engine can no
longer simply specify pairs of design variables for each evaluation; it also has to manage the noise sampling at
the same time – as each pair of design variables is added, information also has to be supplied on what values
the noise variables should take for these update samples – here we use a space filling approach where we
carry out a short search to best place each new noise sample in the existing data set, maximizing its Euclidean
distance from the existing samples. Also, that once the search is completed it will be necessary to confirm the
Design Search and Optimisation Course Notes – January 2020
72
statistics of the final Pareto front with UQ as the combined Krig will be unlikely to be able to supply completely
accurate statistics directly.
Flow chart illustrating the update sequences used in combined (level-1) Krig approaches.
This approach increases the number of design variable combinations being sampled as compared to the
number of noise variables. Since it is realistic to assume that design variables play a greater part in influencing
final performance as compared to noise variables this approach has obvious appeal. But it does, again, depend
on how non-linear any noise effects are. Moreover, the sizes of the Krigs being built becomes significantly larger
as now they contain sample information on both design and noise variable changes. Since the cost of tuning
and sampling Krigs rapidly rises both with the number of variables and the number of samples, this can become
the limiting factor in using the combined approach. Here the Krigs are limited to just 840 data points at any one
time, 30 points in an initial DoE sample plus 20 updates each of 40 more points. This is less than half the total
Build Data‐base
30(100)pt DoE
Initial Geometry
CFD CFD CFD … CFDCFDCFD
CFD CFD
CFD
… … …
CFD CFD CFD
Individual LPτ Sampling
CFD CFD CFD … CFDCFDCFD
CFD CFD CFD
… … …
CFDCFDCFD
500 point LPτ Krig Sampling
YesNo
Integrate to estimate
SD
Combined Krig
Construct
Combined Krig
Evaluate
Search using combined Krig and NSGA2
Combined Krig
Tuning
Finished ?
Final Pareto Front
Select 40(50) update points
Integrate to estimate
MN
Estimated Pareto Front
100 point LPτ Sampling
Design Search and Optimisation Course Notes – January 2020
73
number of function evaluations used when using co-Krigs to represent the design variables and uncertainty
quantification separately, but for this case, is sufficient to converge the process. It is, however, an order of
magnitude slower than the co-Krig approach when using simple test functions. Having completed the search
process, it will typically be necessary to re-evaluate any final designs being selected by additional UQ; here 12
such points can be checked using 100 point sampling when using the same budget as the co-Krig process (for
the test problem we can, of course, simply use the exact equations to evaluate the solutions).
The figures below show the results from adopting this approach. As expected the figures show that the accuracy
of the UQ depends on the total number of samples taken. For the simple case studied here the combined Krig
becomes reasonably accurate once around 200 samples have been taken over the four dimensional space. The
resulting Pareto fronts are significantly better than achieved using the co-Krig approach, being almost
indistinguishable from the exact solutions. It turns out that this very much depends on the dimensions of the
problem being studied as well as its inherent non-linearities – for this low dimensional problem the combined
Krig would clearly be the best way to proceed, although as already noted, it is much more expensive to carry out
in terms of surrogate construction and sampling.
Pareto front found from exhaustive searches on equations (1) and (2) – dashed line; along with the results of
multi-objective searches based on four-dimensional combined Kriging built from LPτ sampling of equation (3)
and direct noise variabilities a and b: + – initial sample points, × – update points, solid line – final Pareto front.
Design Search and Optimisation Course Notes – January 2020
74
Pareto front normalized hypervolume metric convergence for reference point (100, 200) based on equations (1)
and (2), for nine independent runs.
These approaches have been applied to a number of industrial strength case studied of varying complexity. The
following two graphs are for CFD analysis of a 2D compressor section subject to manufacturing uncertainty and
in=service degradation. Clearly the rate of convergence is much reduced but nonetheless, useful results are
obtained which show worthwhile improvement over the initial design.
Design Search and Optimisation Course Notes – January 2020
75
Combined Krig assisted NSGA2 results for a 2D compressor blade CFD problem: estimated results for 500-
point LPτ pseudo Monte-Carlo sampling on the combined Krig, * final generation, + initial generation and ×
intermediate generations; along with the estimated Pareto front (solid line), “true” Pareto front (dotted blue line)
and ○ initial base-line design.
Design Search and Optimisation Course Notes – January 2020
76
Combined Krig assisted NSGA2 results for a 2D compressor blade CFD problem: Pareto front normalized
hypervolume metric convergence for reference point (5, 0.5) based on the estimate of the “true” Pareto front, for
nine independent runs.
Design Search and Optimisation Course Notes – January 2020
77
10 GETTING STARTED
Assuming one has a reasonable toolkit of search methods, a parameterisation scheme and an automated (or at
least mechanistically repeatable) design analysis process it is then possible to make some plans as to how to
proceed. These will be dominated by the number of designer chosen variables and the run time to evaluate a
design. Other important aspects will be the number of goals, the number and type of constraints and whether or
not stochastic measures of merit must be constructed using an essentially deterministic code. The following
diagram gives initial advice.
Design Search and Optimisation Course Notes – January 2020
78
)