A.J. Keane & N.W. Bressloff, January 2020, Faculty of ...ajk/DSO/DSO Course Notes V2.pdf · case...

Design Search and Optimisation Course Notes – January 2020

1

FEEG6009 MODULE TITLE: DESIGN SEARCH AND OPTIMISATION – PRINCIPLES, METHODS AND PARAMETERIZATIONS, 2019/20

A.J. Keane & N.W. Bressloff, January 2020, Faculty of Engineering and the Environment

1 BASIC INFORMATION Department responsible for the module Aeronautics and Astronautics Programme Part IV Timetable Semester 2 Credit Value 15 CATS points Pre-requisites none Contacts Prof. A.J. Keane, Building 176, Room 5013,

ext.22944, email: [email protected] Dr. D.J.J. Toal, Building 176, Room 5011, ext.22662, email: [email protected] Dr. I.I. Voutchkov, Building 176, Room 5025 ext.21276, email: [email protected] Prof. N.W. Bressloff, Building 176, Room 5031, ext.25473, email: [email protected]

Formal Contact Hours 33 Private Study Hours 117 Coursework Computer assignments Course Web Site http://www.soton.ac.uk/~ajk/DSO Course Module Profile https://www.southampton.ac.uk/courses/modules/f

eeg6009.page

2 DESCRIPTION

2.1 Aims The aims of this module are to: provide the background needed to analyse, develop and use algorithms for tackling design search and

optimization (DSO) problems of diverse kinds; equip students to become intelligent users of DSO methods. provide the experience needed to formulate approaches to the solution of problems in DSO; introduce how tools such as MATLAB can be used to support problem solving in DSO. NOTE: whilst

some MATLAB functionality will be demonstrated in this course, detailed MATLAB tuition will not be provided (see note below in the Resources section).

2.2 Objectives (planned learning outcomes) Knowledge and understanding Having successfully completed the module, you will be able to demonstrate knowledge and understanding of: the basic elements of single and multi-variable optimizers; the ways in which these simple elements can be combined to provide solutions to DSO problems; the ways in which problem parameters can be used to formulate design intent in DSO problems; the issues confronting engineers as they seek usable DSO approaches; the ways in which CAD tools can be used to formulate design intent in DSO problems; the ways in which various tools can be brought together to tackle realistic DSO problems via the use

of bespoke workflows; the issues confronting engineers as they seek practical DSO approaches.


2

Intellectual skills Having successfully completed the module, you will be able to: more fully understand the components of a successful DSO approaches to design; make intelligent choices among the available DSO approaches; evaluate the utility and robustness of DSO produced designs.

Practical skills [where appropriate] Having successfully completed the module, you will be able to:

set-up and solve simple DSO problems using a range of software tools including FEA codes and Excel.

2.3 Topics covered Design Search and Optimization (DSO) (1 lecture)

Beginnings A Taxonomy of Optimization A Brief History of Optimization Methods The Place of Optimization in Design – Commercial Tools

Geometry Modelling & Design Parameterization (1 lecture)

The Role of Parameterization in Design Discrete and Domain Element Parameterizations NACA Airfoils Spline Based Approaches Partial Differential Equation and Other Analytical Approaches Basis Function Representation Morphing Shape Grammars Mesh Based Evolutionary Encodings CAD Tools v's Dedicated Parameterization Methods

Single Variable Optimizers – Line Search (1 lecture)

Unconstrained Optimization with a Single Real Variable Optimization with a Single Discrete Variable Optimization with a Single Non-Numeric Variable

Multi-Variable Optimizers (3 lectures)

Population versus Single Point Methods Gradient-based Methods

Newton's Method Conjugate Gradient Methods Quasi-Newton or Variable Metric Methods

Noisy/Approximate Function Values Non-Gradient Algorithms

Pattern or Direct Search Stochastic and Evolutionary Algorithms

Termination and Convergence Aspects

Constrained Optimization (2 lectures) Problem Transformations Lagrangian Multipliers Feasible Directions Method Penalty Function Methods Combined Lagrangian and Penalty Function Methods Sequential Quadratic Programming Chromosome Repair

Meta-models and Response Surface Methods (1 lecture)

Global versus Local Meta-models Meta-modelling Tools Simple RSM Examples


3

Combined Approaches – Hybrid Searches, Meta-heuristics (1 lecture)

Glossy – a Hybrid Search Template Meta-heuristics – Search Workflows Visualization – understanding the results of DSO

Multi-objective Optimization (1 lecture)

Multi-objective Weight Assignment Techniques Methods for Combining Goal Functions, Fuzzy Logic & Physical Programming Pareto Set Algorithms Nash Equilibria

Robustness (3 lectures)

Robustness versus Nominal Performance Evolutionary Algorithms for Robust Design Robustness Metrics

Noisy Phenotype One -- Tsutusi and Ghosh's Method (NP) Noisy Phenotype Two -- Modified Tsutusi and Ghosh Method (NP2) Design of Experiment One -- One-at-a-time Experiments (OAT) Design of Experiments Two and Three -- Orthogonal Arrays (L64 & L81)

Comparison of Metrics Using Surrogates in robustness studies Krigs Co-Krigs Combined Krigs

Problem Classification (1 lecture)

Run-time Deterministic v's Probabilistic Analyses Number of Variables to be explored Goals and Constraints

Initial Search Process Choice (1 lecture) External Speakers (1 lecture)

2.4 Case studies Case study 1: The design of an encastre cantilever beam. (3 lectures)

This is based around simple Euler-Bernoulli beam theory and Excel to set up and solve a simple structures DSO problem. Each student pairing tackles a different set of boundary conditions and the whole class’s studies then allow a Pareto Front to be constructed illustrating which pairings have produced Pareto optimal designs and which have produced sub-optimal designs. This is a very simple case study just to get students used to the whole idea of DSO approaches.

Case study 2: Global versus local search methods (4 lectures) An airplane wing design problem will be used to demonstrate the differences between local and global search methods. A key element of this study concerns the fixed computational budget often faced in real engineering problems. It is not necessary to have to have an aerodynamics background to follow this design study.

Case study 3: Multi-objective design problem. (6 lectures) For this final case study, a multi-objective design problem will be described, which will have to be solved and presented during the course of the laboratory sessions. Students will be free to employ any method(s) learnt earlier in the course, or from elsewhere.

Case study 4: Medical Device Optimization. (4 lectures) In the UK alone, nearly 100,000 people have one or more coronary artery stents implanted, annually, to open up diseased, narrowed blood vessels that supply blood to the heart. For this case study, you will be introduced to the engineering characteristics and design requirements of stents in the first lab session. Then, in the second session, a simplified model of a coronary artery stent will be used to determine an optimal design. This will be conducted under exam conditions wherein you will be required to submit your optimal design at the end of the session. Further details will be provided in the first lab, including how submissions will be assessed.

Revision (3 lectures)


4

2.5 Teaching and learning activities Teaching methods include

Lectures. Computer sessions.

Learning activities include Using the DSO capabilities of the Excel spreadsheet system.

Monday Thursday 11AM - 12PM 27 / 2001 (L/R 1) Weeks 18-20, 22-25, 30-33 AJK Lecture

11AM - 12PM 25 / 1009 (Computer Workstation) Weeks 18-20 AJK Coursework Sessions

11AM - 1PM 25 / 1009 (Computer Workstation) Weeks 22-25, 30-32 DJJT, IIV, NWB Double Coursework Sessions

11AM - 1PM 02A / 2077 (L/T J) Week 21 AJK Lecture

11AM - 1PM 02 / 1089 (L/T D) Week 33 AJK Double Revision Lecture

Assignments – Module Code: FEEG6009 Title: Design Search and Optimization # Type Title of

assessment Set date Due date Submission

method Feedback date

Feedback method

Weighted mark

Purpose

1 AJK

Excel Spreadsheet

Problem design – encastre beam

30/1/2020 23:00 on 13/2/2020

Submit spreadsheet via esubmissions

2/3/2020 In lecture and on request, sent by e-mail

5% To develop initial understanding of optimizers

2 DJJT

Computer Programming exercise

Light aircraft wing design optimisation

27/2/2020 23:00 on 12/3/2020

Submit report via esubmissions

26/3/2020 Via e-assignments

15% To develop an understanding of global, local and hybrid optimisation

3 IIV Computer Programming exercise

Multiobjective optimization of expensive design problems

12/3/2020 23:00 on 23/4/2020

Send required files by e-mail to [email protected] at end of class

7/5/2020 On request, sent by e-mail

15% To develop understanding of optimization using multiple expensive goals.

4 NWB

Computational modelling exercise

Medical Device Optimization

7/5/2020 13:00 on 7/5/2020 at the end of the class

Email results to [email protected]

15/5/2020 Email comments on work

15% Application of optimisation to an industrially relevant biomedical problem

2.6 Methods of assessment Assessment Method Number % contribution

to final mark 2-hour written closed-book examination. 1 50 Coursework 4 50

(5+15+15+15) Referral Method Number % contribution

to final mark 2-hour written closed-book examination. 1 50 Previous Coursework marks may be re-used or the 4 50


5

entire set of coursework repeated. (5+15+15+15)

2.7 Feedback and student support during module study (formative assessment) Feedback will be provided through the following mechanisms: Class discussion based on notes and worked examples; Written feedback on marked independent and group assignments; Revision sessions and discussion of past exam papers.

2.8 Relationship between the teaching, learning and assessment methods and the planned learning outcomes The teaching and learning methods will provide students with the necessary material to set up DSO problems using both CAD and spreadsheets. They will also learn the essential background to all common DSO methods and how these impact on what can be achieved in practise. Written reports will be required for two pieces of coursework – including listings and descriptions of validation where appropriate – in order to assess their understanding of the nature of the tools they are using. Other issues will be assessed via a written examination.

2.9 Resources Core Text (include number in library or URL) (inc ISBN) A.J. Keane and P.B. Nair.: Computational Approaches to Aerospace Design: The Pursuit of Excellence, John Wiley, 2005. ISBN: 0-470-85540-1 http://onlinelibrary.wiley.com/book/10.1002/0470855487 .

TABLE OF CONTENTS

1 BASIC INFORMATION ............................................................................................................................ 1

2 DESCRIPTION ......................................................................................................................................... 1

2.1 AIMS .......................................................................................................................................................... 1 2.2 OBJECTIVES (PLANNED LEARNING OUTCOMES) ............................................................................................. 1 2.3 TOPICS COVERED ....................................................................................................................................... 2 2.4 CASE STUDIES ............................................................................................................................................ 3 2.5 TEACHING AND LEARNING ACTIVITIES ........................................................................................................... 4 2.6 METHODS OF ASSESSMENT ......................................................................................................................... 4 2.7 FEEDBACK AND STUDENT SUPPORT DURING MODULE STUDY (FORMATIVE ASSESSMENT) ................................. 5 2.8 RELATIONSHIP BETWEEN THE TEACHING, LEARNING AND ASSESSMENT METHODS AND THE PLANNED LEARNING

OUTCOMES ........................................................................................................................................................... 5 2.9 RESOURCES ............................................................................................................................................... 5

3 INTRODUCTION TO DSO ....................................................................................................................... 6

3.1 WHAT IS DESIGN AND HOW DOES DESIGN SEARCH & OPTIMISATION FIT INTO IT? ............................................. 7 3.2 THE NOWACKI BEAM PROBLEM. ................................................................................................................... 8 3.3 TAXONOMY OF OPTIMIZATION (METHODS) ..................................................................................................... 9 3.4 BRIEF HISTORY OF OPTIMIZATION .............................................................................................................. 10 3.5 GEOMETRY MODELLING & DESIGN PARAMETERIZATION ............................................................................... 11 3.6 EXCEL & NOWACKI BEAM .......................................................................................................................... 12

4 LINE SEARCH – SEARCHES WITH ONE VARIABLE ........................................................................ 14

4.1 DIFFERENTIAL CALCULUS .......................................................................................................................... 14 4.2 BRACKETING ............................................................................................................................................ 14 4.3 GOLDEN SECTION SEARCH ........................................................................................................................ 14 4.4 INVERSE PARABOLIC INTERPOLATION ......................................................................................................... 15 4.5 NEWTON’S METHOD .................................................................................................................................. 16

5 MULTI VARIABLE OPTIMIZERS .......................................................................................................... 18

5.1 STEEPEST DESCENT ................................................................................................................................. 18 5.2 CONJUGATE GRADIENT.............................................................................................................................. 19 5.3 NEWTON’S METHOD .................................................................................................................................. 20 5.4 QUASI-NEWTON METHODS ........................................................................................................................ 21 5.5 NON GRADIENT BASED SEARCH METHODS .................................................................................................. 21


6

5.6 STOCHASTIC/EVOLUTIONARY SEARCH ....................................................................................................... 22 5.7 TERMINATION/CONVERGENCE ................................................................................................................... 25

6 CONSTRAINED OPTIMIZATION .......................................................................................................... 26

6.1 CONSTRAINT ELIMINATION BY CONSTRUCTION ............................................................................................ 26 6.2 LAGRANGE MULTIPLIERS ........................................................................................................................... 27 6.3 PENALTY FUNCTION METHODS ................................................................................................................... 29 6.4 COMBINED LAGRANGE AND PENALTY FUNCTION METHOD ............................................................................ 29 6.5 SEQUENTIAL QUADRATIC PROGRAMMING METHOD (SQP) ........................................................................... 30 6.6 (CHROMOSOME) REPAIR ........................................................................................................................... 30

7 META-MODELS + RSM ......................................................................................................................... 32

7.1 EXPLOITATION VERSUS EXPLORATION ........................................................................................................ 34 7.2 LOCAL TRUST REGION SEARCH .................................................................................................................. 35 7.3 SINGULAR VALUE DECOMPOSITION ............................................................................................................ 36 7.4 GLOBAL RSM SEARCH .............................................................................................................................. 37 7.5 META HEURISTICS .................................................................................................................................... 39 7.6 VISUALIZATION ......................................................................................................................................... 39

8 MULTI-OBJECTIVE OPTIMIZATION .................................................................................................... 42

8.1 METHODS FOR COMBINING GOAL FUNCTIONS .............................................................................................. 46 8.2 METHODS FOR FINDING PARETO FRONTS ................................................................................................... 49

9 ROBUSTNESS IN OPTIMIZATION AND UNCERTAINTY QUANTIFICATION (UQ) .......................... 51

9.1 MONTE-CARLO METHOD ............................................................................................................................ 53 9.2 DESIGN OF EXPERIMENT METHODS ............................................................................................................ 53 9.3 NOISY PHENOTYPE ................................................................................................................................... 53 9.4 RESPONSE SURFACE APPROACH ............................................................................................................... 53 9.5 STOCHASTIC SOLVERS .............................................................................................................................. 55 9.6 ROBUSTNESS ........................................................................................................................................... 55 9.7 A SIMPLE EXAMPLE .................................................................................................................................. 57 9.8 TWO VARIABLES AND TWO OBJECTIVES ...................................................................................................... 59 9.9 USING SURROGATES TO SUPPORT UNCERTAINTY QUANTIFICATION ............................................................ 61 9.10 ROBUST DESIGN OPTIMIZATION WITH BASIC SURROGATES ......................................................................... 62 9.11 ROBUST DESIGN OPTIMIZATION WITH ADVANCED SURROGATES ................................................................. 66

9.11.1 Co-Kriging .................................................................................................................................. 66 9.11.2 Combined Kriging ....................................................................................................................... 71

10 GETTING STARTED ............................................................................................................................. 77

3 INTRODUCTION TO DSO Why we want to do it and its place in design. Taxonomy of optimization, history of methods, commercial tools etc. First what is design – synthesis v’s analysis What is optimal design? Are all designs optimal? Might we ever deliberately accept sub-optimal design? To answer this we must be able to compare competing designs and say which we prefer and hopefully why. (Examples of cars, fridges – cost v’s performance, tradeoffs). Taxonomy of optimization Sketch out a picture illustrating taxonomy and another illustrating history. Also mention commercial tools

History of optimization Newton and classical gradients


7

144,42 2 xxdx

dyxxy

42

2

dx

yd 2min y

Pattern search and which way is down methods Stochastic search – population DoE based search and RSM’s - Hybrids

3.1 What is design and how does design search & optimisation fit into it? Engineering or Analytically Led Design is the use of analysis to support synthesis so as to adequately define a product (or process). Note difference between analysis and synthesis. Synthesis involves decisions whereas analysis does not (ie decisions about the product) – how big, what material, what manufacturing method. Analysis provides the information for rational decisions to be made. DSO is a formalism for carrying forward such decision making and is normally thought of as an automated activity controlled by a computer code. It involves postulating a design, analysing it, deciding if the results are acceptable and if not deciding how to change it. If it is to be changed the process is repeated until we have an acceptable design or we run out of effort/time. To set this up we adopt the ideas of:

Objectives Design variables and their bounds Constraints and their limits Fixed parameters External noise/uncertain parameters Methods of analysis Schemes for linking design variables to analysis Schemes for linking analysis to objectives and constraints


8

3.2 The Nowacki beam problem.


9

3.3 Taxonomy of optimization (methods)

inputs x

some non-numeric

all numeric

continuous some discrete

problems

outputs

y

Single Multiple goals (Pareto fronts)

constraints

hg ,

unconstrained bounds constrained

method

LINEAR PROBLEMS LINEAR PROGRAMMING

OPTIMAL SELECTION INTEGER PROGRAMMING

THE REST

operational research

sorting/exchange methods

methods simplex methods search over vertex space

NOT DISCUSSED FURTHER

NOT DISCUSSED FURTHER

no gradients needed (zero order)

gradients needed (first or second order)

cope with constraints directly

only unconstrained

deterministic stochastic

population based one at a time

the rest of the course!

)(min xfy subject to

0)(,0)( xhxg

functions f

linear non-linear discontinuous


10

3.4 Brief History of Optimization

First came classical calculus and Newton’s method for dealing with functions where we cannot solve explicitly.

From calculus we are familiar with 0)(' xf and VExf )('' for a minimum etc. Newton is basically root

searching for 0)(' xf and is covered in a subsequent lecture.

Next came Cauchy and steepest descent – ie find the downhill direction and move in that direction until we start

to go uphill again – slow in valleys:-

Then came conjugate gradient methods and quasi-Newton methods that exploit local curve fitting based on a

curvature. Various ways of holding information on the local shape. (The Hessian, or its inverse) ie

approximation based on CxbAxxxf TT 21)( where x is a vector and A is the Hessian.

Following these gradient based approaches were a series of Pattern searches which use Heuristics. These

include Hook & Jeeves and the simplex method.

Then a series of stochastic methods including SA, GA, ES and EP all using sequences of random moves and

schemes to exploit any gains made. Often working with population of designs.

Then come the explicit curve fitting methods based on designed experiments such as polynomial curve fits, RBF

schemes, Kriging etc. These can be either global with updates or local with move limits and trust regions.

Finally hybrids and meta searches built on these elements. Key considerations are:

Ability to work with black box codes Need for gradient information Robustness to poor, noisy or incomplete data or badly shaped functions Speed of convergence Ability to run calculations in parallel Repeatability/average performance of stochastic methods.

Staircase effect etc Can be used for analytical forms or numerically.


11

3.5 Geometry modelling & design parameterization The need for parameterization design variables/design intent – the things we are free to change. Flexibility

versus number of variables & tacit knowledge of workable designs. Some examples – look at the Nowacki

beam & ask what are design choices? How do we encapsulate them? What about choice of section? How do

we parameterise a cross section? How many variables do we need?

1) a circle - just need radius – 1 variable

2) a square – just need side length – 1 variable

3) /4) an elipse or rectangle – 2 variables

5) an I section symmetric about N/A or a box L or T sections in all cases we need overall width and depth

plus thickness of web and flange – 4 variables

Can we make a single parameterisation span square, rectangle, box, L, T and I? If so how – do we use an

integer variable as an index plus 4 numbers or can we use the 4 numbers themselves? Clearly

square/rectangle/box can be linked:-

Not obvious how to deal with L, T or I together. If use an index then there is no obvious ranking so search not

simple. We can of course do

or or

As continuous sets Simplest combined form is

for etc

As the solids are when tf= depth/2 tw= width/2


12

To make this generate all shapes we consider 4 rectangles

We make the widths of :

T and B to equal width overall

L and R to equal webt

Height of T equal to flangetopt

Height of B equal to flangebott

Height of L and R equal to height –flangetopt –

flangebott and then add offsets of L and R from the outer edges as

offsetl oroffsetr .

This needs seven variables but can now describe all our shapes as to get L or T we just set

fangebottomt to be

zero. Other forms of parameterization: Discrete and domain element modelling

NACA Airfoils Spline Based Approaches Partial Differential Equation and Other Analytical Approaches Basis Function Representation Morphing Shape Grammars Mesh Based Evolutionary Encodings CAD Tools v's Dedicated Parameterization Methods

Talk through the various figures in sect 2.1 of course book and see PPT of External shape of UAV

3.6 Excel & Nowacki Beam How do we set up searches in Excel?

Try 142 xx (min at 2x ) put this into excel and solve it numerically

4x 42 3x (min at 211x )

Then look at Nowacki beam problem (load the relevant excel sheet and describe).

T

L R

B


13


14

4 LINE SEARCH – SEARCHES WITH ONE VARIABLE Problem – minimize a problem of a single real variable without constraints such as

14)( 2 xxxf

42)( xxf

2)( xf

Simple approach: Take steps looking for change in gradient, need for step length to change accuracy/span small steps for

accuracy v’s large steps for speed. (Bracket a turning point) How to turn a bracket into a tight solution? Is the function smooth? Golden section search (0.61804) Fibonacci search Quadratic search – inverse parabolic interpolation.

Use integers as pointers to a list. (Discrete or integer variable) Bracket as before then Fibonacci or use Golden section/quadratics and round to nearest integer. Materials selection for example. (Non numeric)

4.1 Differential calculus Approach 1 – given a functional form use calculus, ie

1)( 2 Axxxf

42)( xxf

2)( xf , ie VE

4.2 Bracketing Approach 2 – Bracket the minimum between two values and search inwards.

Q – How do we find a bracket, ie a series of three values of )(xf such that )()( 12 xfxf and

?)()( 32 xfxf

A – Guess two values for x, calculate )(xf at these and then head downhill until the function starts to rise and

we have a bracket. If we have no knowledge then use 1,0 21 xx and 3x either -1 or 2 depending on the

gradient (if )1()0( ff then 23 x else -1). Given three points we use a quadratic curve fit and see if a

minimum is predicted (2nd diff is +VE) and if so jump to the minimum predicted and evaluate there. If a maximum is predicted we simply increase the step size (by say a factor of 1.6180 – golden section) and go on downhill, keeping the three lowest values of )(xf in either case.

See code in Numerical Recipes for example. Another approach is just to keep doubling the step size in the

downhill direction until a bracket appears.

4.3 Golden section search Approach 3 – Golden section search (linear convergence – no use of gradients) Q – Given an initial bracket how do we trap the minimum efficiently.

A – Given 21,xx and 3x such that )()( 12 xfxf

)3()( 2 xfxf

min at 2x


15

We choose 4x so that it lies in the larger of the two intervals 1x to 2x and 2x to 3x and such that either

23

24

xx

xx

= 0.38197 or 12

14

xx

xx

= 1-0.38197 = 0.61803, i.e., 38.197% into the larger gap measured from the

centre

Even if the initial bracket is not in the ratio 0.38197:0.61803 this process rapidly settles on this ratio. This approach assumes nothing about the shape of the function and does not require gradients – it is not quite so good as Fibonacci but does not require us to fix the number of function calls a priori (which Fibonacci does).

4.4 Inverse parabolic interpolation Approach 4 – inverse parabolic interpolation, quadratic search or quadratic interpolation Q – Given an initial bracket and assume the function is smooth so that at its minimum it will behave quadratically.

A –.Here we fit a parabola to the bracket and use this to estimate the location of the minimum.

ie we assume ,)( 2 CBxAxxf so that A

Bx

2*

, Axf 2*)(

and we know 2211 )(,)( fxffxf and 33 )( fxf such that 3212 , ffff and 321 xxx

ie CBxAxf 1211

CBxAxf 2222

CBxAxf 3233

We solve these to get

)))(())(((2

))(())((*

23211223

22

2321

21

2223

xxffxxff

xxffxxffx

and

))(())((

))(())((

2321

2212

22

23

23211223

xxxxxxxx

xxffxxffA

(which is always +VE so hence minimum)

For example, consider the function

X1 X2 X4

X3

Ratio 0.38197:0.61803

X1 X4 X2

X3

Ratio 0.61803: 0.38197


16

42)( 34 xxxf with initial data at 2,1,2

1

x so that 42,31,8125.3

2

1321

fff

, ie a bracket.

then

214286.1

)12)(38125.3()2

11)(34(2

)14)(38125.3()4

11)(34(

*

x

This may be compared to the analytical solution given from

064)( 23 xxxf when 0x or 211x

01212)( 2 xxxf at 0x inflexion

=9 at 211x minimum

So the solution is improved ,5932164.2)214286.1( f ie less than all points in initial bracket.

So next triple is 1, 1.214286, 2 with 3, 2.5932164, 4 this leads to

317823.2,465064.1,385534.2,364454.1 *4

*4

*2

*2 fxfx

313928.2,482046.1,334451.2,427849.1 *5

*5

*3

*3 fxfx

OBSV 1.

Now you may ask why not start with x=0,1,2. Trouble with this is f=4,3,4 which is symmetric so 1*1 x which

does not help!! OBSV 2. This search approaches our goal from one side only and so is rather slow – there are better methods! When dealing with discrete variables we use the integers as pointers and either use integer programming or in mixed problems simply round variables to the nearest integer. When our discrete variables have no natural order (ie materials selection) we in the end are forced towards enumeration.

4.5 Newton’s method Approach 5 – Newton’s method All will be familiar with the Newton-Raphson method for finding the root (or zero) of a function. If we apply this to the derivation of a function we can find turning points instead, ie

)(

)(1

i

iii xf

xfxx

NB this needs the second derivative.

Example: 42)( 34 xxxf starting at 2x 23 64)( xxxf

)1(12)( xxxf

)1(12

64 23

1

ii

iiii xx

xxxx

= ix )1(6

32 2

i

ii

x

xx

6

682

)12(6

64222 21

xx


17

= 6667.13

21

52777.149

5

3

5

)3

2(6

5)9

25(2

3

213

x

*50000.1,50097.1 54 xxx


18

5 MULTI VARIABLE OPTIMIZERS

We next consider multiple variables. Here in addition to finding the size of step to make we must also fix the

direction.

5.1 Steepest descent

Perhaps the simplest approach to multi variable optimizing is to identify the direction of steepest descent and go

in that direction until the function stops reducing (the optimal step length *i ) and then recomputed the direction

of steepest descent, ie

)(*1 iiii xfxx where )( ixf are the gradients at ix

Example – minimize 2221

212121 22),( xxxxxxxxf

starting at 1x =

0

0

1

1)(

221

241

/

/1

21

21

2

1 xfxx

xx

xf

xff

To get the optimal step length we minimize ))(( 111 xfxf with respect to 1 , ie set 0/ 1 ddf

21

21

2111

1

11111 22)

1

1

0

0())((

ffxfxf

= 121 2

122 *11

1

d

df

so that

1

1

1

11

0

02x

now

1

1

221

241)( 2xf

1

1)

1

1

1

1())((

2

22222

ffxfxf


19

= 2222

2222 )1()1)(1(2)1(2)1()1(

= 22222422 222

222

22

= 425 222

51210 *22

2

d

df

2.1

8.0

1

151

1

13x and so on to

5.1

1as the answer.

5.2 Conjugate gradient

The problem with steepest descent is that unless our function has circular contours the direction of steepest

descent never points at the final optimum. The conjugate gradient approach seeks to improve over this with

improved directions.

We start as per the steepest descent but at the second step use a direction conjugate to the first, ie

)( 1*112 xfxx but afterwards use

iiii Sxx *1

NB 0jTi ASS if iS and

jS are conjugate directions for a quadratic problem of the form

CxBAxxxf TT 21)(

where 12

1

2

i

i

iii S

f

ffS

and )( 11 xfS

Here iS takes the place of if used in steepest descent. Notice that iS accumulates information from all

previous steps – this is good and bad – good as the direction is conjugate to all previous steps, bad as it can accumulate round off errors – in practice we restart from a steepest descent step after m steps where m is one more than the number of design variables. If our function is quadratic this process converges in as many steps as directions/dimensions in the problem.

Example: minimize 2221

212121 22),( xxxxxxxxf starting from

0

01x

Conjugate gradient search example.

Starting at

0

0and apply steepest descent gives as before

1

12x and 1*

1 and 21

1)(

2

1

fxf

So

1

11S 2

1

1)(

2

22

fxf


20

2

0

1

1

2

2

1

12S

2

0

1

1 *23 x

To find *2 we minimise )

2

0

1

1()( 2222

fSxf

12

1

2f

2222 )21()21(22)21(1

124 222

So 4128 *

222

d

df

5.1

1

2

04

11

13x which is the solution

If we try another step we just find 2

3f is zero and so the process stops, ie 3f is zero at the minimum.

5.3 Newton’s method Newton’s method allows for direction and step size and is built on looking for the roots of ).(xf

First we approximate our function as a Taylor series.

)()(21)()()( 2 ii

Tii

Ti xxHxxxxfxfxf

Where here iH is the matrix of second partial derivatives and is called the Hessian.

Now we set 0)(

jx

xf j=1,2,…..n for n variables

So this gives 0)( iii xxHff

or iiii fHxx

11 this requires a non singular Hessian of course. (So the Hessian both modifies the

search direction and sets the step length).

Example – minimize 2221

212121 22),( xxxxxxxxf starting at

0

01x

1

22

2

12

221

2

21

2

1

xx

f

xx

fxx

f

x

f

H

=

22

24 for all ix

411

1 H

42

22=

121

21

21

ix

i xf

xff

2

1

/

/=

1

1

221

241

21

21

ixxx

xx


21

So

12

12

12

1

0

02x

2

3

1

1

1

1

12f

2

4

0

0

3

3

Hence MINIMUM in 1 STEP This has converged in one step because )(xf is quadratic and so H is a constant. There are however

problems with this approach as we have to compute, invert and store H at each step and this is fraught with

difficulties on real problems. The most serious issue is obtaining the second derivatives as these are very rarely available directly.

5.4 Quasi-Newton methods The Quasi Newton methods work with an approximation of either the Hessian or its inverse. These are sometimes called variable metric methods We already have )(1

1 iiii xfHxx iH is the Hessian

Which we approximate by )(*

iiiii xfBxx

Here iB contains directional information and *i the optimal step length. Note that this is the steepest descent

method if iB = I

There are then a number of schemes for updating iB without using second derivatives but instead using

approximations. None is perfect and they are known by the names of those who proposed them such as BFGS – Broyden Fletcher Goldfarb-Shanno, which is

iT

i

iTii

iTi

Tiii

iTi

iiTi

iTi

Tii

ii gd

Bgd

gd

dgB

gd

gBg

gd

ddBB )1(1

where iii xxd 1

iii ffg 1

We do not pursue such methods further here, they are very popular however. See Figs 3.6,3.7 in book.

5.5 Non gradient based search methods

Pattern or direct search.

What do we do if we cannot calculate gradients (or do not wish to use finite differences – noise/speed).

This leads to Hook & Jeeves amongst others.

H&J method:

1 choose initial step length, set initial point to first base point

2 increase direction i by step and keep if better else decrease direction i by step and keep if better

3 loop over all directions, if none improve then half step size and repeat unless either step too small

or run out of time in which case stop

4 explore must have helped so set current point to new base point

5 make pattern move equal to vector from previous base point to new base point plus any previous

successful pattern move still in use


22

6 if pattern move helps keep it if not go back to new base point and forget pattern move

7 repeat from step two

There are several themes here.

1 steps change in size for exploration

2 directions and steps change for exploitation – if the pattern moves help then they accumulate so

that moves get bigger and bolder until they fail. Siddall provides full details and code, as does

Schwefel.

5.6 Stochastic/Evolutionary search

Run through flying circus slides on simple GA’s.


23

A basic GA flowchart.

Typical search patterns from a GA, Simulated Annealing, an Evolution Strategy and Evolutionary Programming.


24


25

5.7 Termination/Convergence

For local searches we stop at the optimum, ie when no further gains are being made-provided we can afford to

get that far.

For global search we use one of a fixed or limited number of iterations a fixed or limited elapsed time when the search has stalled after a given number of iterations when a given number of basins have been found and searched. We rank searches by steepness of gradient/rate of improvement, final result or a balance between the two.


26

6 CONSTRAINED OPTIMIZATION

In most real world engineering problems the designer has to satisfy various constraints as well as meeting the

desire for improved performance. Indeed performance is often set as a constraint, ie reduce weight to below x,

reduce drag to less than Y etc. Thus we need search schemes to deal with constraints, ie

forxf )(min

nx

x

x 1

subject to bounds on ,x UL xxx

and constraints 0)( xgi (inequality constraints)

0)( xh j (equality constraints)

Here we describe a number of approaches.

6.1 Constraint elimination by construction

The simplest is to try and eliminate constraints by construction – ie transform problem variables using the

constraints.

Example: minimise the surface area of a box of given volume.

ie min )(2),,( HWWBBHWBHf

where WBHv is fixed

so let BHvW

we have

Minimize )(2),,( HBBH

VBHVBHf

B

V

H

VHB

222

So 02

22

H

VB

H

fwhen B

VHBVH 2, or B= 2H

V

02

22

B

VH

B

fwhen H

VB or HVB 2

combining gives


27

34

2

VHH

V

H

V

3 VB

3 VW ie all sides of equal length as expected. Another way we deal with inequalities is by deciding if they will be active at the optimum or not. If so we replace by equality and if not we eliminate them. Often it is not possible to know which inequalities will be active or to eliminate using algebra even if we do! Nonetheless we should not ignore this. It can be done numerically sometimes, ie fixed LC calcs when angle of

attack is a design variable for a wing or aerofoil.

6.2 Lagrange multipliers

In just the same way as there are formal analytic solutions to unconstrained optimization problems the

equivalent constrained solutions are based on Lagrange multipliers. This approach essentially only works for

equality constraints so for inequality constraints a precursor step is to decide at any point if an inequality

constant will be active, and if so replace it with an equality.

So consider min ),( 21 xxf subject to 0),( 21 xxg

ie two variables and one equality constraint.

At a minimum it may be shown that

011

x

g

x

f

GEOM & ALPHA

CFD OPTIMISE with CL= fixed

CL,CD

GEOM

OPTIMISE Iterate on alpha

CFD

CL

OR

CD


28

and 022

x

g

x

f

and 0),( 21 xxg

Here is the so called Lagrange Multiplier.

Now if we write gfL we get

111 x

g

x

f

x

L

all equal zero at the minimum

222 2 x

g

x

f

x

L

from the previous equations

gL

Thus if we seek the unconstrained minimum of L (more precisely, turning points of L) we can locate the solution

to the constrained problem. L is known as the Lagrange function.

For example minimize ),( yxf 2xy

k

Subject to 0),( 222 ayxyxg (ie circle of radius a)

Here )(),,( 2222

ayxxy

kgfyxL

0222 xykx

x

L 423

22

xy

k

yx

k

2

yx

0222

ayxL

Here ya

x ,3 3

2a

022 31 yykx

y

L


29

Note however that we cannot simply minimize L as the approach would admit of saddlepoints or maxima for the gradients of L to be zero.

6.3 Penalty function methods

A more direct approach to dealing with constraints is via the use of penalty functions – we simply add penalties

to the objective function when constraints are violated. There are a number of ways of doing this, none of which

is perfect:-

FIXED PENALTIES Add a (very) large number to the objective if any constraint is broken

Add a (very) large number for each broken constraint

VARYING PENALTIES

FUNCTION OF DEGREE OF CONSTRAINT VIOLATION

Scales the penalties by the constraint violation

FUNCTION OF HOW LONG WE HAVE BEEN SEARCHING

Start with low penalties and gradually make more severe so that an essentially unconstrained search

becomes a fully constrained one

All these are taken to be exterior penalties, ie they only apply to broken constraints – we can also use interior

penalties which come into effect as the search nears a constraint and then gradually remove these as we

progress so as to ‘warn’ the search about nearby problems.

Sketch Penalty Types:

6.4 Combined Lagrange and penalty function method

It is possible to combine the Lagrange scheme with a penalty approach to overcome some of the difficulties of

pure Lagrange methods. This is sometimes called the Augmented Lagrange Multiplier method.

ie minimize )(xf subject to 0)( xh j pj ...2,1

)()(),(1

xhxfxL j

P

jj

is the Lagrangian

We augment this with an exterior penalty

OF

STEP

INTERNAL EXTERNAL X2

X1

OF OF

X1 X1 X1


30

)()()(),,(1

2

1

xhrxhxfrxAP

djkj

P

jjk

It now turns out that minimizing A solves the original problem if we have the correct for any kr . However we

can apply an iterative scheme now that will allow j and kr to converge on a solution provided kk rr 1 and we

use kjb

kj

kj xhr *)()1( 2

ie the new s' are added to by the (scaled) amount of violation of the constraints at the previous minimum of A

This approach can also be extended to inequality constraints by setting up as follows; min )(xf subject to

mixgpjxhj ...1,0),(,...1,0)(

)()(),,(11

xhxfrxA j

p

jjm

m

iii

p

djk

m

iik xhrr

1

)(2

1

2

where i max

k

ii rxg 2),(

6.5 Sequential quadratic programming method (SQP)

The use of sophisticated Lagrangian processes is now at its most complex and powerful in the class of methods

known as SQP – these use typically Quasi Newton methods to solve a series of sub problems. They are the

most powerful methods available for local minimization of constrained smooth problems. Academic codes are

available from the web. They are less good for non-smooth functions and also they are local methods and so

cannot find the best basin of attraction to search.

6.6 (Chromosome) repair

Repair is the process of dealing with a constrained optimization problem by substituting feasible designs

whenever infeasible ones occur during search. To do this a repair process is invoked if any constraint is

violated to find the nearest feasible design. Here nearness is usually in the Euclidean sense of design

variables. Having located such a design (perhaps by a local search where the degree of infeasibility is set as a

revised objective) the objective function of the feasible design is used instead of that at the infeasible point and

also (optionally) the corrected design vector.


31

Replacing the design vector absorbs most information but can cause problems with the search engine. This

approach is most favoured in evolutionary or other zeroth order methods where gradients are not used at all.


32

7 META-MODELS + RSM

So far we have considered optimizers working with results coming from the evaluations of design & constraint

functions that have been presumed to be directly coupled to search codes. These codes then build up a

‘picture’ of how the function is changing with changes in the design and seek improvements. Their internal

models (we will call them meta-models to distinguish from the actual user supplied design models) are implicit in

their working.

We next consider schemes where the building & use of the meta model is explicit and directly controlled by the

user.

At its simplest this consists of running a few designs, collecting the results and curve fitting to these. Then the

curve fits can be used for design search. This would be a natural approach for working with data from previous

designs or from experiments or field trials – it can also be used with computer analysis codes.

We first plan where to run the code to generate data. This can aim to build either a local or a global model

depending on the range of the design variables. We use formal DoE (Design of Experiment) methods for this

(cf Taguchii). Having run the design points, often in parallel, we curve fit. Here again we decide if we need a

local (simple) fit or a global (complex) shape & also if we need to regress (discuss noisy data). Curve fitting can

be fast for simpler models or very slow for large accurate ones.

We call the curve fit a Response Surface Model (RSM) or meta model. Examples include Polynomial

regression, radial basis functions and kriging and neural nets.

Having built a model we check its accuracy with test data (separate) or cross validation. We then use it to

search for a better design. Having found new candidate designs we run the full computer code to check if they

are good. If so we might stop. More usually we add these to the curve fit & iterate – updating, until we run out

of effect or we get convergence etc.

To summarise: the basic steps are

1) plan an experiment to ‘sample’ the design space

2) run codes & build ‘data-base’ of results (possibly in parallel)

3) choose & apply a curve fit, with or without regression

3a) refine curve fit by some tuning process

4) predict new ‘interesting’ design points by searching the meta-model

5) run codes on new point(s) & update data-base (again possibly in parallel)

6) check the results from update points against predictions & then either stop or move back to step 3)


33

Experience shows that for model building it can often take 10n initial designs to build a reliable global model

where n is the number of variables. There is also a trade between the cost of building a meta-model and the

usefulness of its predictions.


34

7.1 Exploitation versus exploration

When constructing and using RSM’s thought must be given to the balance of effort used between exploiting and

exploring the problem. Exploration is the placing of new calculation points in regions so far unsampled – if the

problem under study has more than a few dimensions there will be many areas where fresh sampling may

reveal new trends. Equally if one does not exploit the available information by closing in on promising areas the

search may simply end up as a random walk. This balance between exploring new areas and exploiting known

results can be illustrated by considering using a few downhill searches starting from the most promising results

from a small random DoE. If the DoE is too small then the best areas for search may be missed. Conversely if

the downhill search is limited in scope it may not effectively reach the best design in a local region. One set of

methods that explicitly deal with this dichotomy are the so called probability of improvement methods that use

measures of uncertainty in the predictions generated by the RSM.


35

7.2 Local trust region search

A very simple approach is to evaluate a small local experiments and then shift and shrink it until a certain effort

is used up.

1. choose initial area to search

2. sprinkle in 9 points use LP DoE

3. curve fit with quadratic regression polynomial†

4. search within area over RSM to get new candidate design

5. shift search region centre to new candidate point

6. shrink search region by say 10%

7. replace oldest design point with new result

8. go to step three unless run out of time

† to solve the regression we used SVD to get a least squares solution to the over constrained non square matrix equation

sAy where y are the function values, A are the design variable values and their powers and s are the

polynomial coefficients. So if the SVD of A is '.. VwU then it may be shown that

}{')]./1(diag.[}{ yUwVs j , see next section and also Matlab for simple examples of SVD.


36

This simple scheme has a number of faults:

it has no way of expanding the trust region if the data suggests it should be – this means it may be

a), slow and b), fail to find a local minimum of the function.

The point being replaced, the oldest, may not be the most sensible one to discard – what about

discarding the worst point for example.

7.3 Singular Value Decomposition To solve a general regression problem we can use SVD to get a least squares solution to the relevant over constrained non square matrix equation. This is possible because any matrix with more rows than columns (and also any square matrix) can be decomposed into three matrices as follows:

𝐴 𝑈 .𝑤 ⋯ 0⋮ ⋱ ⋮0 ⋯ 𝑤

. 𝑉 ,

that is ').(diag. VwU , where the prime indicates the transpose. The properties of these matrices are such that

IVVUU '.'. . That is U a column orthogonal matrix, V is square orthogonal matrix and w is a diagonal matrix. The inverse of A, or at least squares approximation to it if there are more rows than columns, may then be written as

𝐴 𝑉 .

1 𝑤 ⋯ 0⋮ ⋱ ⋮0 ⋯ 1 𝑤

. 𝑈 ,

or ')./1(diag. UwV j . This allows one to solve regression problems by setting up the problem as ysA . so that

the least squares solution becomes yUwV js ')./1(diag. .


37

So if for example there are four data points given at x=-1,0,1,2 with function values of y=1,0,1,2 at these x values and the aim is to find the coefficients of the fitting parabola a, b and c, we set up the matrix equation

2

1

0

1

.

1222

1121

1020

1121

c

b

a or

2

1

0

1

.

124

111

100

111

c

b

a.

Then using the SVD of the non-square matrix we get

2

1

0

1

.15.045.055.015.0

05.035.015.055.0

25.025.025.025.0

c

b

a

so that a=0.5, b=-0.1 and c=0.3 and the fitting parabola is 3.01.025.0 xxy .

7.4 Global RSM search 1) Here we first use 100 points in an LP array to sample the design space.

2) Then we construct a krig (stochastic or Gaussian process) RSM which has hyper parameters which

we tune.

3) We then search for peaks and return 10 likely locations

4) We add these to the original 100 pts to get 110 and we rebuild and retune the krig.

5) We then return to step 3 and repeat 3 times, ending with 130 points and the final model.

Points to note

the use of a large initial DoE is warranted here because of the multi-modality of the problem (the 20

points we might otherwise use).

the updates are added in groups of 10 because we wish to improve the model globally and not just

in one location, also krig training is costly.

the final surface is reasonable but still far from exact.


38


39

7.5 Meta Heuristics

It is clear from the two previous searches that what we have done is combined components such as DoE

sampling, RSM building and various searches to build a composite or ‘meta-search’. It is of course possible to

build more and bigger complexities into such approaches and this leads to a whole family of ‘meta heuristics’

By way of example we can consider combing various gradient descent schemes & then observing which works

best and rewarding this with more of our finite budget of compute resource. The development of such schemes

is an art and also one must bear in mind that the ‘no free lunch’ theorem show that, averaged over all possible

problems, all searches are as good or bad as each other – so unless our efforts are based on tuning a method

to the current task they will be futile – moreover there will be a trade between performance on a specialised task

and general applicability.

7.6 Visualization

Visualization of results can be very important, particularly in unfamiliar problems and where there are more than two or three variables. Parallel axis and Hierarchical Axis Technique (HAT) plots can be very helpful as can polar graphing.


40

0 .0 1

0 .0 2

0 .0 3

0 .0 4

0 .0 5

0 .0 6

0 .0 7

0 .0 8

5 0 1 0 0 1 5 0 2 0 0 2 5 0

5 0

1 0 0

1 5 0

2 0 0

2 5 0

CD

- k rig in g - 2 0 0 p o in t la t in h y p e rc u b e d o e , w ith c o n s t ra in t c o n to u rs

fa s t - , s lo w -

fast

- c

ambe

r, s

low

-


41

0

0.5

1

1.5Wing area (m2)

Aspect ratio

Kink position

Sweep angle (degrees)

Inboard taper ratio

Outboard taper ratio

Root t/c

Kink t/cTip t/cTip washout (degrees)

Kink washout fraction

Wing weight (N)

Wing volume (m3)

Pitch up margin

U/C bay length (m)

D/q (m2)

Cost (£m)

InitialLow dragDoE Point 157Low cost


42

8 MULTI-OBJECTIVE OPTIMIZATION

So far we have focussed on problems with a single goal or objective function. This is rarely how real design

problems occur, although it is quite common, even in industry, to treat them this way. In reality most real design

problems involve trading between multiple and conflicting goals. We therefore next turn to ways of tackling such

problems.

Perhaps the simplest approach (and that most commonly used in industry) is to set all goals except one in the

form of constraints, ie instead of aiming for low weight or stress we set upper limits on these goals and then

ensure that our designs meet them. The difficulty with this approach is deciding realistic but demanding targets:

if they are two severe we may not be able to satisfy them at all, if too loose they may not impact the design at

all.

The next most simple way of proceeding is to use an aggregate or combined objective. Typically we add all our

goals together with some suitable weighting functions and minimize this. This approach is a mimic of the

function of money – money is society’s way of allowing completely different things to be balanced against each

other (the cost of a holiday v a new car for example). It is the function of markets to establish the prices of items

and hence the weighting between them. Ideally the best approach to balancing competing goals to a business

engaged in design is to reduce all decisions to their impact on the company’s profits. Unfortunately this

calculation is almost never possible so some surrogate is used. This may be completely artificial or it may be

some physical quantity such as SFC (aero engine makers often use SFC).

It should be clear that if we consider two goals (say )(1 xf and ))(2 xf then depending on how we weight them

)()()( 21 xBfxAfxf

then our final optimum will vary. For example consider

min 21 )( xxf

min xxxf 4)( 22

then BxxBABxBxAxxf 4)(4)( 222

now we only really need one weight here ABC so we get

cxxcAxf 4)1()( 2

04)1(2)( cxcAxf when )1(2

4

c

cx


43

So for each value of C we get a different solution and two different values of 1f and 2f :

The best designs in our search space are said to be the dominant ones and these are defined formally by the

dominance test: x1 dominates x2 if 1) solution x1 is no worse than x2 in all objectives and 2) solution x1 is strictly

better than x2 in at least one objective. Given a set of solutions, the non-dominated solution set is a set of all the

solutions that are not dominated by any member of the solution set. The non-dominated set of the entire

feasible decision space is called the Pareto-optimal set. The boundary defined by the set of all points mapped

from the Pareto optimal set is called the Pareto-optimal front, or Pareto Front for short. Points laying on the

Pareto Front are said to have dominance rank one. If these are removed from the data and a second Pareto

Front established these designs are said to have dominance rank two and so on until all points have been

classified by rank. The quickest way of find this set when there are only two objectives is to simply plot the

points out and mark of the points that lie on the best side of the data (lower left for a dual minimization problem).

Rank can be effectively used when constructing multi-objective search engines, the best known of which is the

Non-dominated Sorting Genetic Algorithm (NSGA).

When we have more than two objectives and possibly very many solutions we need an efficient algorithm to

establish the Pareto Front. A simple way to do this is the method proposed by Mishra and Harit:

1. Sort all the solutions (P1...PN) in decreasing order of their first objective function (F1) and create a sorted

list (O). If any solutions have the same value for the first objective then sort on the second objective to

order these designs, similarly if the first two are equal use the third to sort and so on.

2. Initialize a set S1 and add first element of the sorted list O to S1.

3. For every solution Oi,i≠1 of list O, compare solution Oi with the solutions of S1:

a. If any element of set S1 dominates Oi, delete Oi from the list and place in the set of dominated

designs;

b. If Oi dominates any solution of the set S1, delete that solution from S1 and place in the set of

dominated designs;

c. If Oi is non-dominated by set S1, then update set S1 = S1 U Oi;

C *x )(1 xf )(2 xf

0 0 0 0

1 1 1 -3

2 3

11 971 9

53

21 3

2 94 9

22

Good designs

2f

PARETO FRONT

1f


44

d. If set S1 becomes empty add Oi to S1.

4. Print non-dominated set S1.

5. Repeat process with dominated designs to find next rank of designs to create sets S2, S3 and so on.

The following page shows this process as part of a spreadsheet, starting with 10 initial sample points of which

five lie on the Pareto front, two in the second rank, two in the third and one in the fourth.


45

Sample number

x f1 f2 RankSample number

xSorted by f1

and then f2

Dominated removed

Sorted by f1

Dominated removed

Sorted by f1

Dominated removed

1 -0.75 0.5625 2.0625 4 4 0 0.0000 0.0000 0.0000 0.0000 0.0625 0.5625 0.0625 0.5625 0.2500 1.2500 0.2500 1.25002 -0.5 0.2500 1.2500 3 5 0.25 0.0625 -0.4375 0.0625 -0.4375 0.25 1.25 0.5625 2.06253 -0.25 0.0625 0.5625 2 3 -0.25 0.0625 0.5625 0.5625 2.0625 2.2500 -0.7500 2.2500 -0.75004 0 0.0000 0.0000 1 6 0.5 0.2500 -0.7500 0.2500 -0.7500 1.5625 -0.938 1.5625 -0.93755 0.25 0.0625 -0.4375 1 2 -0.5 0.2500 1.2500 2.25 -0.756 0.5 0.2500 -0.7500 1 7 0.75 0.5625 -0.9375 0.5625 -0.93757 0.75 0.5625 -0.9375 1 1 -0.75 0.5625 2.06258 1 1.0000 -1.0000 1 8 1 1.0000 -1.0000 1.0000 -1.00009 1.25 1.5625 -0.9375 2 9 1.25 1.5625 -0.9375

10 1.5 2.2500 -0.7500 3 10 1.5 2.2500 -0.7500

List S1int List S1 List S2int List S2 List S3int List S30 0 0.0000 0.0000 0.0625 0.5625 0.0625 0.5625 0.2500 1.2500 0.2500 1.2500

0.0625 -0.4375 1.5625 -0.9375 2.2500 -0.75000.2500 -0.75000.5625 -0.93751.0000 -1.0000

‐1.5

‐1

‐0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5

f2(x)

f1(x)

f1(x) Vs f2(x)

Sample Points

Dominated removed

‐1.5

‐1

‐0.5

0

0.5

1

1.5

2

2.5

‐1 ‐0.5 0 0.5 1 1.5 2

f1(x) and f2(x)

x


46

This process is used recursively by removing the dominant set from the data to establish the lower ranking sets.

It also works extremely quickly for problems with two objectives, simply requiring that data be kept in sorted

order of the first objective as the dominant solutions are identified.

8.1 Methods for combining goal functions

It will be clear from considering Pareto fronts and simple weighted sums of goals that deciding how to combine

goals will define the final designs selected. Before looking at more advanced schemes for finding the front we

briefly explore how a design team might decide on a combined objective, either to reduce the problem to a

single goal or to allow selections to be made from those designs that are found to lie on the Pareto front. All

such methods attempt to formalise the process of assigning importance to the goals under review.

a) simple voting schemes – each design team member ranks the goals, the goals are then given points from

1(lowest rank) to n (highest rank in n goals) and then these are summed across the team to result in a

weighting scheme. Before application all goals are divided by the designers ideal value (or the ideal is

subtracted from the goal) to allow for the units in use. This simply ensures that all voices are heard.

b) The eigenvector method. Each pair of goals is ranked on a matrix by being given a preference ratio, ie if

goal i is three times more important than goal j we set 3ijp . Then say goal j is twice as important

as goal k we set 2jkp etc. To be consistent we should of course say that 6ikp but in fact the

method does not require this. In any case we then form all the p values into the matrix P and seek W so

that wPw max , ie the eigenvectors of p are found.

We then take the eigenvector with the largest eigen value and use this as our weighting scheme.

If we have

121

61

2131

631

p

We get w=

111.0

222.0

667.0

The largest eigen value should equal the number of goals if the p’s are consistent as here (we get a value

of 3).

Say however we were not consistent and used


47

121

51

2131

631

p

Then we get a largest eigen value of 3.004 and the weight vector becomes

122.0

230.0

648.0

w

That is we decrease the importance of goal 1 and increase that of goals 2 and 3.

This scheme is simple and easy to use up to around 10 goals at most and is quite useful for more than 4

goals where it is difficult to assign numerical values to the weights in a consistent fashion. It is still the

case however that the aggregate goal is a simple linear sum of the individual functions.

One way of combining goals that is more elaborate is via the use of fuzzy logic. Thus we define a series

of linguistic terms that describe our goal and map these to a score:-

Thus a bad value scores nothing and a good value scores 1 while those in the indifferent range have

intermediate scores. If we do this to all functions the resulting scores can then be combined by adding

(essentially an average) or multiplication (a geometric average). We then maximize the combined function.

Various shapes for the so called membership functions can be used but there seems little to be gained from

going above the linear form sketched here.

When used these functions essentially allow non-linear combinations of goal functions which clearly allow more

complex combined goals – however if taken too far they can obscure the overall problem!

Score

1

0

BAD INDIFFERENT GOOD


48

Below is an example of Fuzzy Logic for functions f1=1/x and f2=x2, where we say both functions are considered

unacceptable when above 2 and acceptable when below ½. We then use a linear scaling of the two functions

between these limits, i.e., the membership is proportional to the function and equal to zero when the functions

are 2 and unity when they are equal to ½. Thus the equations for the varying part of the membership functions

are: memb=4/3 - 2f/3 (i.e., when f is 2 memb is 0 and when f is ½ memb is 1 with a linear variation – here they

are both the same simply because the two sets of limits on the functions are both the same).

1

xa xb

memb f1

1 0Score

x

Score

1

0

f1

f2

x

xa xc xb xd

score = overall goal

2

1

x

xa xc xb xd 1f

AND SIMILAR FOR dc xxf ,,2


49

8.2 Methods for finding Pareto fronts Sometimes we do not wish to combine goals without first examining the Pareto front itself. Thus we need to

construct the front. We then have 3 goals.

1) the designs we study truly lie on the front, ie they are well optimised

2) the front has many designs that span its full extent, ie it is ‘well’ populated

3) the points are evenly spread on the front, ie we have smooth range of goals.

This is in fact an optimization task in its own right and may be tabled in a variety of ways.

A Perhaps the simplest is to construct a family of different combined goals with various weighting schemes

and then optimize these (including dealing with each goal on its own). Although this does not tackle point

3 above it focuses on 1 and gives as many points as desired for 2. It is however expensive and known to

fail to evenly populate the font, especially if the font is concave (using a linear sum of goals is equivalent

to finding the intersection of a target line and the front and targets only exist for convex fronts).

B The next best scheme is to use an optimizer to explore the design space placing any new non-dominated

points in an archive (and weeding out any dominated ones). Then all new design points are given a goal

value based on how much they improve the archive – ie how dominant they are. This means that the

objective function is non-stationary but provided our search is tolerant of this, this approach works fine (an

evolution strategy works quite well on this).

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0

1

2

3

4

5

6

7

0 0.5 1 1.5 2 2.5 3

function values f 1an

d f2

variable x

f1(x)=1/x

f2(x)=x^2

memb1

memb2

combinedupper limit on f(x)

lower limit on f(x)


50

C Use a multi objective population based search (such as a GA). Here we aim to advance the whole front

in one go so the points in the population are scored against each other and those that dominate most

score most – this is capable of meeting all three goals if some pressure to spread out points is included.

Its weakest aspect is probably in finding the extreme ends of the front but these can be found from single

objective searches directly on each goal.

It is also possible to use response surface schemes to help reduce run times when finding Pareto fronts.


51

9 ROBUSTNESS IN OPTIMIZATION AND UNCERTAINTY QUANTIFICATION (UQ)

Increasingly designers must also contend with the fact that their designs and analyses are subject to a range of

uncertainties and that designs optimized purely for nominal performance may suffer from significantly degraded

performance when subject to such variations. These uncertainties stem from a range of sources:

• limits on accuracy during manufacture,

• variations in operating conditions,

• wear and degradation in service,

• limited accuracy in the physics of the computational models invoked,

• limited convergence in any iterative numerical scheme used in computations,

• round-off and discretization errors.

It is therefore of increasing importance to study designs from a stochastic perspective and to formally quantify

the impact of such uncertainties on predicted design performance rather than relying on ad hoc factors of safety

and tolerance settings. One way of doing this is to invoke the formalism of Robust Design Optimization (RDO),

which leads naturally to a multi-objective problem where designers seek to improve the mean performance of a

design while guaranteeing that fall-off away from mean conditions is strictly controlled.

RDO can, however, be carried out in a range of ways and can additionally make very effective use of surrogate

approaches to model building. The following figures illustrate the basic idea of robustness.

Gaussian Noise

Design 1 is more sensitive than 2 even though f(x1) is better than f(x2)


52

Uniform Noise

This kind of lack of robustness in design variables (design 1 above) is common when dealing with constrained

problems - designers typically stay well away from critical stress limits if they can for example to avoid

unexpected structural failures.

More controlled ways of working all involve trying to simulate uncertainty in the design calculations being used –

so called Uncertainty Quantification or UQ. A prerequisite is to know something of the real uncertainties

anticipated. This requires DATA.

If however we know something of the uncertainties inherent in the design, manufacture and operation we can

attempt to account for this.

N.b., the mean or expected value of a function is given by integrating the function multiplied by the probability

density function (PDF, shown in blue in the figures above) from minus infinity to plus infinity, i.e.,

mean

dxxPDFxfxfExf xf )().()]([)( )( . An approximation to the mean may often be calculated

by averaging over an ensemble of appropriately spaced values of the function, so for example a typical way of

approximating the mean is by assuming

n

iixf xfn

1)( )()/1(~ as n , the so called Monte Carlo

Design 1 is more sensitive than 2 even though f(x1) is better than f(x2)


53

method. Note also that for a uniform distribution between lower limit a and upper limit b, the PDF has height

1/(b-a), mean (a+b)/2 and variance (1/12)(b-a)2.

9.1 Monte-Carlo method The most obvious, simple and direct scheme for UQ is the Monte Carlo approach. We simply generate a series

of scenarios using suitably biased random numbers and run our design calculations at each before working with

mean or worst case designs – unfortunately generating such means or worst cases requires 100’s of

simulations – this is usually way too expensive. Costs can, however, be reduced by adopting sampling plans

that are not purely random. Schemes such as LPτ sampling can give much better convergence that pure

random ones. It is also possible to design a sampling sequence specifically to match the problem being studied

so as to accelerate the convergence of statistics – the so called quasi-Monte Carlo approach.

9.2 Design of Experiment methods The next approach is to replace full Monte Carlo sampling with limited size DoE variations around each design

point to gain some idea of local sensitivities. Typically one uses between five and 30 variations at each design

to try and characterize the issues. Taguchii arrays are typical of this kind of approach and have been widely

used in industry.

9.3 Noisy Phenotype A third approach is the so called ‘noisy phenotype’. In this case a standard design search is carried out but at

each iteration noise is added to the design variables. Then the resulting perturbed design results are used to

characterise the design. This makes all derived qualities non-stationary during the search so a suitable method

must be used that is tolerant of this. Sometimes the nominal and perturbed designs are both evaluated and the

worst used as the characterising design. When used with a GA for example this tends to mean only robust

designs survive the evolutionary process.

9.4 Response surface approach A method increasingly popular in industry is to run a medium sized DoE and then build a global response

surface through the resulting data. This response surface is then used for large scale Monte Carlo sampling to

build models of robustness. The main weakness of this scheme is that the more unusual designs and events

that lead to extremes of behaviour may not be captured by this process. Therefore as design decisions focus in

on promising areas update points should be run and the RSM rebuilt and re-explored so that the surrogate

model is well set up where it needs to be.

N.b. as we have seen response surfaces can also be used to support optimization so these two approaches can

be used together for Robust Design Optimization (RDO) where quantities such as the mean performance of a

product and the variance in that performance are the goals being optimized. In such circumstances a number of

different ways can be used to assist the search with surrogates. The most obvious approach is to build

surrogates of surrogates, i.e., using surrogate models to speed up the process of estimating the mean and

variance of performance and then building a further set of surrogates to model how these (surrogate derived)

quantities change as design variables are altered, the aim, of course, being to find designs with good mean

performance and simultaneously low variance in that performance. Such nesting of surrogates may not be the


54

most efficient use of computing effort, however, since it does not make use of the fact that uncertainty behaviour

is highly likely to be correlated between designs that differ only modestly in their configurations. It does however

offer simplicity and separates the two activities of DSO and UQ.

A better approach may be to treat the design (control) and uncertainty (noise) variables in a single response

surface that could then be repeatedly sampled over the noise variables at any given set of control variables to

establish response statistics. Since design intent tends map to large scale variations in the control variables and

significant consequent variations in performance, while uncertainty generally stems from small scale variations

in the noise variables and more limited changes in performance, this poses the problem of how best to sample

the available computer codes and to construct surrogates that maximize the effectiveness of the overall design

process. For example, many deterministic surrogate based optimization studies make use of only a few hundred

performance simulations in total, while it is quite common to use at least this many simulations to propagate

uncertainty for a single design configuration in assessing robustness. Should, therefore, the density of sampling

to build a combined surrogate be uniform over the combined control and noise space or should it favour the

noise space?

Other combinations of surrogates are possible, including using multiple levels of fidelity to address the issue of

correctly sampling uncertainty while not using excessive numbers of function calls. Additionally, in some cases

the robustness problem being tackled arises from the need to guard against an inability to guarantee that the

design variables set during product definition are achieved in practice. This can arise where manufacturing

variations impact on the chosen geometry. In such circumstances design variables play the role of control and

noise variables and may be studied both for their overall impact on mean performance and also for variation in

that performance.

Pots illustrating the convergence of Monte-Carlo sampling using random numbers (–), using LPτ pseudo

random numbers (--) and Krigs based on LPτ pseudo random numbers (-.) for the test functions of equations (1)

and (2) as sample sizes change, along with the actual values taken from the equations shown as horizontal

lines; left means, right standard deviations.


55

9.5 Stochastic Solvers The most complex approach to robustness is to build so called ‘stochastic solvers’. These are codes like finite

element packages which instead of reading in deterministic problem statements for geometry and loadings can

accept these specified in probabilistic form. They then directly compute probability measures for the response

quantities of interest. Such methods are currently in their infancy but can be expected to become prevalent over

the next 10-20 years.

In whatever form robustness is considered it invariably leads to a multi-objective design problem because the

designer will desire good performance for the nominal geometry AND robustness to likely variations. This tends

to lead to Pareto fronts with mean and standard deviation as axes. Thus multi-objective search tools should be

used for robust design, rather than simply examining robustness as an afterthought.

9.6 Robustness

Robustness may be described as insensitivity to change; thus a design is said to be robust if its performance is

relatively unaffected by any uncertainty in its design, manufacture or use. This characteristic is often in direct

conflict with optimal nominal performance – a design that has been heavily optimized to operate well when

perfectly made and operated in ideal circumstances may turn out to be very near to non-linearities in its

performance – at the very least, if truly optimal, any change in manufacture or use will, by definition, result in

degradation in performance.

Robust design methods all involve trying to simulate uncertainty in the design calculations being used, so as to

evaluate the impact of these uncertainties on the designer’s goals. Here we characterize the quality of any

design by its location on the Pareto front and the nature of other designs lying on that front. A prerequisite is, of

course, to know something of the actual uncertainties likely to be encountered in practice, ideally using real

world data. If, however, we just know something of the sensitivities inherent in the design, manufacture and

operation, we can attempt to account for this in DSO, even in the absence of actual variability measurements –

in general sensitive design tend to be less robust than insensitive ones.


56

Construction of Pareto front from variance analysis.

Dotted lines show variance

f(x))

xa xb

Range at x2

Robust

FragileRange at x1

Range at x2

Robust

Range at x1

Fragile


57

Uniform Noise

Gaussian Noise

9.7 A Simple Example

Consider the performance of a manufacturing process to be characterised by the performance index f(x)=x+1/x

where x is a control variable set by the users in the range 0.75<x<2. If f(x) is to be as low as possible what is the

optimal setting of x? If x is subject to uniform random noise such that the probability density function of the noise

takes the form of a unit square centred at the nominal value, can we derive an expression for the mean value of

the performance index and hence determine what value of x that will give the lowest mean value? What

percentage deterioration in nominal performance must be accepted when using this optimal setting?

First consider the PDF in use here – this is a rectangle centred at the design point of interest of width unity and

height unit (so that the area under the PDF is one). So to establish the mean value at any design point we have

to integrate from 0.5 below that point to 0.5 above it, i.e., the mean of f(x) varies with x and is defined by

𝜇 𝑥 𝑥1𝑥

d𝑥.

.

𝑥 ln 𝑥 0.5 / 𝑥 0.5

Which we can plot out as:

Locus of expected value and variance as x increases

PARETO FRONT

PARETO FRONT

Locus of expected value and variance as x increases


58

Notice that the mean performance is always worse (higher) than the nominal and more importantly its minimum

is in a different place. The minimum value of the nominal design curve is at (1, 2.0125) while that for the mean

curve is at (1.1180, 2.0805), i.e., some 3.4% worse if the correct, noise adjusted, design choice is made or

4.3% worse if no adjustment is made for the uncertainty in the system.

We can see this effect more clearly by plotting the Pareto front of standard deviation versus mean behaviour for

the system.

In this case there is a single design that dominates all other choices so there is no need to consider trade-offs

by comparing different points along the front. This is because there is a single minimum in the nominal and

mean performance curves.


59

9.8 Two variables and two objectives

When we have more than one design variable the problem becomes more involved: consider a twin objective

problem made up from the Branin function

10cos8

11106

5

4

1.51

2

12

21

2

x

xxx

(1)

augmented by a second function of the form

2

2

2

1

2

2212

2

2414

78

218

78

cos18

cos78

cos18

cos201

3

50000

xx

xxxx

. (2)

These two functions are illustrated as contour maps below. The Branin function has three equal minima at x* =

(-π , 12.275), (π , 2.275), (9.42478, 2.475), where f1(x*) = 0.3978 while equation (2) has a single minimum at x*

= (5.1116, 8.0054), where f2(x*) = 11.1484. Also shown on these two plots are the results of a series of multi-

objective searches which are discussed later. It turns out that these functions can be used to construct an RDO

test problem where exact results can be obtained for the uncertainty propagation, thus allowing a true Pareto

front of mean performance versus standard deviation in that performance to be constructed.

Contour maps illustrating the test functions of equations (1) and (2), along with the results from exhaustive

searches based on direct use of equations (1) and (2) – solid line; and the results of multi-objective searches: +

– direct NSGA2 search, × – Kriging based NSGA2 search

Robustness almost invariably leads to a multi-objective design problem because the designer will desire good

performance for the nominal or mean design and robustness to likely variations. As already shown, this leads to

Pareto fronts, typically with mean and standard deviation as axes and it is rare for these to show just a single


60

dominating design. Thus multi-objective search tools should be used for robust design, rather than simply

examining robustness as an afterthought.

To illustrate these ideas consider the random process given by

10cos8

11106

5

4

1.51

2

1

2

21

2

x

xxx

+ 2

2

2

1

2

2212

2

2414

78

218

78

cos18

cos78

cos18

cos20

1

50000

xx

xxb

xxa

(3)

where a and b are independent random variables uniformly distributed in the range ±1. It is relatively easy to

show that this function has a mean value given by equation (1) and a standard deviation given by equation (2),

i.e., as already illustrated in the figure above. Note that the function of equation (2) therefore shows how much

variation we expect in the design performance as the two design variables change – essentially a central area of

modest uncertainty which rises rapidly towards the edges of the design ranges, as often occurs in real

manufacturing processes or operational environments. If we carry out a robust design analysis of this problem

the two extremes of the Pareto front now represent a design with best mean performance at one end and the

smallest standard deviation at the other (the two ends of the solid line in the figures). In this case the best mean

performance can only be achieved at the expense of moving outside of the central area of low variance in

performance – again a not unrealistic outcome.

If the same LPτ pseudo random number sequences are used when sampling all locations in the design space

following equation (3) (i.e., with the mean and standard deviations changing as per equations (1) and (2)) it is

possible to plot out the errors in the estimated response statistics as contour maps, see figures below, which are

drawn for an ensemble size of five. Note that these errors are merely typical results and would differ for each

random number sequence used in their construction and for differing sample sizes.


61

Contour maps illustrating the errors (left – mean, right – standard deviation) in pseudo Monte-Carlo models

based on five LPτ samples of equation (3), along with the results of a multi-objective exhaustive direct search

with varying weights on equations (1) and (2).

The consequences of the errors inherent in quantifying uncertainty with limited sample sizes are seen when one

attempts to carry out robust design optimization. If uncertainties are not estimated accurately any search may

be misled and the resulting Pareto fronts lie away from the true results, thus leading designers to make poor

choices when deciding robustness trade-offs. Consequently we next consider the various ways in which

surrogates can be used to help support such work, first in uncertainty quantification and then to speed up design

search and optimization runs.

9.9 Using Surrogates to Support Uncertainty Quantification

Although low-discrepancy sequence sampling plans can often speed up the process of estimating uncertainty

statistics they still do not capitalize on all the information present in the sample data being used. In particular,

when used naively, no account is taken of the locations of the samples with respect to each other when deriving

the statistical moments. An alternative is to build a surrogate that relates the desired performance quantity of

interest to the noise parameters. Then the surrogate can be integrated in lieu of the original problem to calculate

the required moments, either in closed form or via very dense sampling across the cheap-to-evaluate surrogate.

In the example just introduced there are two noise parameters (a, b) and these impact linearly and additively on

the performance function f(x1, x2, a, b) of equation (3). So if we take sets of five LPτ samples and instead build a

Krig relating f(x1, x2, a, b) to a and b at each individual value of x1 and x2 we can establish a much more

accurate model of the uncertainties in the problem, see figures below. Now the errors in the response statistics

are between one and two orders of magnitude less than for the simple direct calculations, albeit that a separate

Krig has been built and tuned for every point evaluated in these plots; in this case as 51x51 sets of samples

have been used, this means 2601 Krigs have been constructed and tuned in total, each of which has then been

sampled 500 times to establish the mean and standard deviation values.

Clearly such an approach can only be justified if the costs involved in building, tuning and sampling the Krigs

are substantially less than for evaluating the function itself1. This in turn depends on the complexity of the

relationship between the noise variables and the performance function being studied. If this relationship is

straightforward and involves relatively few variables the cost of Krig construction is low and the results can be

startling accurate. If, however, this is not the case and a significant number of function evaluations are needed

to characterize the relationship over many noise variables these costs may become unaffordable, even

compared to the cost of complex CFD or structural analysis. Alternative, faster modelling methods such as GPU

trained neural networks may then be the only affordable approach.

1 Here we tune each Krig by first running a global genetic algorithm or ant colony search of the log likelihood and then take the best result and improve it with a local gradient based search – each step in the searches requiring a solution of the dense matrix equations containing the sample data, (see Toal et al (23)).


62

Contour maps illustrating the errors (left – mean, right – standard deviation) in Krig models based on five LPτ

samples of equation(3), along with the results of a multi-objective exhaustive direct search with varying weights

on equations (1) and (2).

9.10 Robust Design Optimization with Basic Surrogates

We next examine the effects of these various ways of carrying out uncertainty quantification on multi-objective

searches to establish the Pareto front that trades mean performance against standard deviation in that

performance. To begin with, and to establish appropriate datum results, we show three search methods applied

to the analytical expressions for the mean and standard deviation defined by equations (1) and (2).

First we establish the true Pareto front using a series of single objective searches where the two functions are

simply added with varying weights (this produces the solid line in the previous figures). Here we use some

10,000 separate search runs (involving around one million functions calls in total). This exhaustive search

guarantees the correct front in this case because it is non-concave throughout and so a simple sum produces a

continuous series of correct results. Clearly such an approach is not practical in real-world problems and is not

so reliable on more complex functions.

Having established the true Pareto fronts by exhaustive search we consider a number of more realistic and

affordable approaches, see the figure below. In each case this requires the evolution of the front using a multi-

objective search engine. Here our searches are all based on the NSGA2 paradigm. NSGA2 is a long

established global multi-objective search method that has a good track record of being able to establish high

quality Pareto fronts. Our first approach is a direct NSGA2 search of equations (1) and (2) (the + markers in the

figures) that uses 30 generations with a population size of 100 and therefore 3,000 evaluations of the equations.

NSGA2 is clearly capable of correctly recovering the Pareto front as would be expected given a suitable budget

and access to the analytically correct functions for the objectives.


63

Left, Pareto fronts found from exhaustive searches based on direct use of equations (1) and (2) – dashed line;

along with the results of multi-objective searches: + – direct NSGA2 search, × – Kriging based NSGA2 search.

Right, convergence metrics of direct NSGA2 and Kriging based NSGA2 searches.

Then we use a Kriging based NSGA2 search (the × markers in the figures). To do this an initial surrogate model

of the exact mean versus standard deviation space is established by selecting design variables according to a

30 point LPτ sequence, evaluating equations (1) and (2) at each combination. This is followed by five updates in

batches of 10 keeping no more than the most promising 50 designs for surrogate construction, i.e., a total of 80

evaluations of equations (1) and (2). Note that the response surface is not being used to help quantify

uncertainty (i.e., to estimate mean and standard deviation), rather it is used to model how the uncertainties

given by the exact expressions of equations (1) and (2) vary with design changes. The power of the Kriging

based approach in this search is immediately clear in that it is directly competitive with the direct NSGA2 search

that uses nearly 40 times as many evaluations of these equations.

Of course, in real problems one does not have the luxury of closed form equations for the mean and standard

deviation; instead some form of uncertainty quantification as set out in the previous sections must be employed.

So next we repeat our searches but using five-point LPτ sampling to establish the response statistics from

equation (3), again using an exhaustive search, NSGA2 directly or with surrogates at each design point and

NSGA2. The results of these searches in terms of the Pareto front trade-offs can be seen in the figure below.

Now the searches all trend to an incorrect model of the Pareto front.


64

Pareto fronts found from exhaustive searches based on five point LPτ sampling of equation (3) – solid line; and

equations (1) and (2) – dashed line; along with the results of multi-objective searches on the five point sampling

combined with direct noise variabilities a and b: + – direct NSGA2 search, × – Kriging based NSGA2 search.

It might be concluded from this that there is little point in using such limited sample sizes during robust design

searches. However, one must recall that the purpose of robust design is not to build accurate response

prediction models per se – rather the aim is to choose values of the free design variables that give robust

behaviour in practice. Just because the limited sampling fails to yield a completely accurate model, this does not

mean that design values based on these models will fail to behave as desired, since the search process has

sampled a considerable quantity of useful data. To see this one needs to take designs from the Pareto fronts in

the previous figure and insert the relevant values of x into equations (1) and (2) to establish whether or not

these designs have any merit. The figure below shows the resulting Pareto fronts and it is seen that in fact the

results of the searches are useful in locating the correct Pareto optimal designs, despite the prediction errors in

the small sample sizes used for UQ.


65

Pareto fronts found from exhaustive searches based on five point LPτ sampling of equation (3) – solid line; and

equations (1) and (2) – dashed line; along with the results of multi-objective searches on the five point sampling

but evaluated using equations (1) and (2): + – direct NSGA2, × – Kriging based NSGA2 search.

Lastly, before moving on to more advanced techniques we add in the use of Krigs to help propagate uncertainty

with our two basic search methods (direct NSGA2 and surrogate assisted NSGA2), see figure below. In this

case the approach completely repairs the damage caused by limited sample size for the direct noise

variabilities. However, training the surrogate for each set of design variables considerably adds to the cost of

working in this way – essentially if the uncertainty behavior is benign and not dependent on too many noise

variables using surrogates for UQ may be worthwhile, otherwise more advanced methods will be needed, which

we will turn to next. Even when it does help, we need to recognize that we are training surrogates on the

behavior of lower level surrogates in the third of these searches, rather suggesting that more sophisticated

methods should offer greater promise.


66

Pareto fronts found from exhaustive searches based on Krigs built from five point LPτ sampling of equation (3) –

solid line; and equations (1) and (2) – dashed line; along with the results of multi-objective searches on Krigs

built from the five point sampling combined with direct noise variabilities a and b: + – direct NSGA2 search, × –

Kriging based NSGA2 search.

9.11 Robust Design Optimization with Advanced Surrogates

We finally introduce two more advanced ways of dealing with the combined problem of uncertainty propagation

and Pareto front search.

9.11.1 Co-Kriging

Our first advanced method makes use of the formalism of co-Kriging where results with multiple levels of fidelity

can be combined during the search. To do this we combine the results from limited sample size UQ with those

for more expensive UQ with many more samples. To make this approach worthwhile we can only use the high-

fidelity calculation very sparingly – here we start the response surface based search with a DoE of 30 design

vectors but we then calculate the 100 point LPτ sample results for just the first four of these design points while

we calculate five-sample results at all 30. We then build a pair of multi-fidelity co-Krigs (one for mean and one

for standard deviation) with all 34 results and use these to estimate the response statistics of the functions being

searched, see figure below.


67

Flow chart illustrating the update sequences used in co-Krig approaches.

In co-Krigs the inputs to the low fidelity (cheap) and high fidelity (expensive) calculations, xc and xe, are taken to

be related to the outputs (responses) yc and ye by computational functions, yc=fc(xc) and ye=fe(xe). The

responses resulting from the DoE over these codes are used to construct an approximation

)(ˆ)(ˆˆ xx dce ffy (4)

which is the sum of two Gaussian process models, each of which depends on the distances between the

sample data used to construct them. Here the hat symbols indicate the models are approximations, the

subscript d indicates a model of the differences between the low and high fidelity functions (all the high fidelity

evaluations are carried out at locations where low fidelity calculations have already been run) and ρ is a scaling

parameter. The distance measure used here is


68

hpjh

ih

k

hh

ji xxd )(),( )()(

1

)()(

xx (5)

where h and ph are hyper-parameters tuned to the data in hand and k is the number of dimensions in the

problem. The correlation between points )( ix and )( jx is then given by

ijjiji d )],(exp[),( )()()()( xxxxR (6)

where is a regularization constant that governs the degree of regression in the model (when set to zero the

Krig strictly interpolates the data supplied) and ij is the Dirac delta function. When the response at a new point

xnew is required, a vector of correlations between the new point and those used in the DoE is formed

),(),(

),()(

222

2

newedd

newecc

newcccnew

xxRxxR

xxRxc

(7)

where the σ2 are the variances in the cheap and difference Gaussian models. The prediction is then given by

)1()(ˆ 1 yCcx Tnewey (8)

where 1C1yC1 11 / TT and .),(),(),(

),(),(2222

22

eeddcecccecc

eccccccc

xxRxxRxxR

xxRxxRC

When building

co-Krigs it is still necessary to carefully tune the sets of hyper-parameters to match the data in use – for co-Krigs

this tuning is applied to the low fidelity data, data representing the differences between the low and high fidelity

series and the ratio ρ parameter to link the various data sets. Fortunately for the small numbers of results

typically available in such work this is not overly expensive.


69

Simple comparison of a Krig through expensive data and co-Krig built on a combination of cheap and expensive

data.

The resulting co-Krigs are then typically searched with the NSAG2 algorithm and updated, say 10 times using

10 new design vectors taken from the approximated Pareto front at each stage, but of these 10 using only one

result midway through the update set analysed using both levels of fidelity (sample sizes). Thus at the end of

the search in this case we have evaluated the low-fidelity model 130 times but now we have also used the high

fidelity model 14 times (four in the original DoE and one during each of the 10 update cycles), leading to a total

number of individual design calculations of 130×5+14×100=2,050. This needs to be compared to the previous

Kriging search where only low-fidelity calculations were used leading to 80×5=400 calls or one based

completely on high fidelity sampling of say 80×100=8,000 calls. The aim is to achieve results comparable to

using 8,000 calls with costs more comparable to working solely at the lower fidelity – this approach is intended

to mitigate the problem that, when working solely with limited sized (Monte-Carlo) uncertainty samples on real

world problems, searches typically returned designs that failed to fulfil their promise when evaluated with larger

sample sizes.

Using the co-Kriging approach, the results for the test problem are as shown in the figures below (a typical final

Pareto front is shown along with the convergence behaviour of a series of independent runs of the search).

When searching the base formulas of equations (1) and (2), the figure shows that the final Pareto points are

now almost as tightly clustered around the true Pareto front as and already shown above – similar HV metrics

are obtained with the budgets used. Of particular interest are the high fidelity results shown in the figure where

now these samples accurately reflect the true trade-off between mean and standard deviation, while the low

fidelity results continue to track the low fidelity front, i.e., the co-Kriging approach has achieved the desired


70

outcome, identifying the location of the knee in the Pareto front and ensuring that the correct trade-off

information is being used.

Pareto front found from exhaustive searches on equations (1) and (2) – dashed line; along with the results of

multi-objective searches on the five / 100 point LPτ sampling of equation (3) combined with direct noise


71

variabilities a and b: co-Krig based NSGA2 search – solid line; + – low fidelity points, × – high fidelity points.

Pareto front normalized hypervolume metric convergence for reference point (100, 200) based on high fidelity

points only, for nine independent runs.

9.11.2 Combined Kriging

The second advanced approach we demonstrate requires a slightly more intrusive change to the problem

handling: we build a single combined Krig that is used to both support search and carry out uncertainty

quantification, an approach we term combined Kriging. Clearly, if the low level Krig can accurately model the

underlying functions it will allow very accurate statistics to be computed albeit at the cost of sampling the Krig

multiple times. So to begin with, an LPτ DoE is carried out where both the design variables and noise variables

are all varied simultaneously. Next a four dimensional Krig is constructed through this data. Then 500 point low

discrepancy sequence sampling over the two noise variables is carried out on the combined Krig, at any desired

pair of design variables, so as to return the desired predictions of mean and standard deviations of

performance, see figure below. Notice, that when evaluating new update points, the search engine can no

longer simply specify pairs of design variables for each evaluation; it also has to manage the noise sampling at

the same time – as each pair of design variables is added, information also has to be supplied on what values

the noise variables should take for these update samples – here we use a space filling approach where we

carry out a short search to best place each new noise sample in the existing data set, maximizing its Euclidean

distance from the existing samples. Also, that once the search is completed it will be necessary to confirm the


72

statistics of the final Pareto front with UQ as the combined Krig will be unlikely to be able to supply completely

accurate statistics directly.

Flow chart illustrating the update sequences used in combined (level-1) Krig approaches.

This approach increases the number of design variable combinations being sampled as compared to the

number of noise variables. Since it is realistic to assume that design variables play a greater part in influencing

final performance as compared to noise variables this approach has obvious appeal. But it does, again, depend

on how non-linear any noise effects are. Moreover, the sizes of the Krigs being built becomes significantly larger

as now they contain sample information on both design and noise variable changes. Since the cost of tuning

and sampling Krigs rapidly rises both with the number of variables and the number of samples, this can become

the limiting factor in using the combined approach. Here the Krigs are limited to just 840 data points at any one

time, 30 points in an initial DoE sample plus 20 updates each of 40 more points. This is less than half the total

Build Data‐base

30(100)pt DoE

Initial Geometry

CFD CFD CFD … CFDCFDCFD

CFD CFD

CFD

… … …

CFD CFD CFD

Individual LPτ Sampling

CFD CFD CFD … CFDCFDCFD

CFD CFD CFD

… … …

CFDCFDCFD

500 point LPτ Krig Sampling

YesNo

Integrate to estimate

SD

Combined Krig

Construct

Combined Krig

Evaluate

Search using combined Krig and NSGA2

Combined Krig

Tuning

Finished ?

Final Pareto Front

Select 40(50) update points

Integrate to estimate

MN

Estimated Pareto Front

100 point LPτ Sampling


73

number of function evaluations used when using co-Krigs to represent the design variables and uncertainty

quantification separately, but for this case, is sufficient to converge the process. It is, however, an order of

magnitude slower than the co-Krig approach when using simple test functions. Having completed the search

process, it will typically be necessary to re-evaluate any final designs being selected by additional UQ; here 12

such points can be checked using 100 point sampling when using the same budget as the co-Krig process (for

the test problem we can, of course, simply use the exact equations to evaluate the solutions).

The figures below show the results from adopting this approach. As expected the figures show that the accuracy

of the UQ depends on the total number of samples taken. For the simple case studied here the combined Krig

becomes reasonably accurate once around 200 samples have been taken over the four dimensional space. The

resulting Pareto fronts are significantly better than achieved using the co-Krig approach, being almost

indistinguishable from the exact solutions. It turns out that this very much depends on the dimensions of the

problem being studied as well as its inherent non-linearities – for this low dimensional problem the combined

Krig would clearly be the best way to proceed, although as already noted, it is much more expensive to carry out

in terms of surrogate construction and sampling.

Pareto front found from exhaustive searches on equations (1) and (2) – dashed line; along with the results of

multi-objective searches based on four-dimensional combined Kriging built from LPτ sampling of equation (3)

and direct noise variabilities a and b: + – initial sample points, × – update points, solid line – final Pareto front.


74

Pareto front normalized hypervolume metric convergence for reference point (100, 200) based on equations (1)

and (2), for nine independent runs.

These approaches have been applied to a number of industrial strength case studied of varying complexity. The

following two graphs are for CFD analysis of a 2D compressor section subject to manufacturing uncertainty and

in=service degradation. Clearly the rate of convergence is much reduced but nonetheless, useful results are

obtained which show worthwhile improvement over the initial design.


75

Combined Krig assisted NSGA2 results for a 2D compressor blade CFD problem: estimated results for 500-

point LPτ pseudo Monte-Carlo sampling on the combined Krig, * final generation, + initial generation and ×

intermediate generations; along with the estimated Pareto front (solid line), “true” Pareto front (dotted blue line)

and ○ initial base-line design.


76

Combined Krig assisted NSGA2 results for a 2D compressor blade CFD problem: Pareto front normalized

hypervolume metric convergence for reference point (5, 0.5) based on the estimate of the “true” Pareto front, for

nine independent runs.


77

10 GETTING STARTED

Assuming one has a reasonable toolkit of search methods, a parameterisation scheme and an automated (or at

least mechanistically repeatable) design analysis process it is then possible to make some plans as to how to

proceed. These will be dominated by the number of designer chosen variables and the run time to evaluate a

design. Other important aspects will be the number of goals, the number and type of constraints and whether or

not stochastic measures of merit must be constructed using an essentially deterministic code. The following

diagram gives initial advice.


78

)

A.J. Keane & N.W. Bressloff, January 2020, Faculty of ...ajk/DSO/DSO Course Notes V2.pdf · case...

Documents

Transcript of A.J. Keane & N.W. Bressloff, January 2020, Faculty of ...ajk/DSO/DSO Course Notes V2.pdf · case...