A Matrix-Free Algorithm for Multidisciplinary Design …...Abstract A Matrix-Free Algorithm for...

A Matrix-Free Algorithm for Multidisciplinary DesignOptimization

by

Andrew Borean Lambe

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of Aerospace Science and EngineeringUniversity of Toronto

Copyright c© 2015 by Andrew Borean Lambe

Abstract

A Matrix-Free Algorithm for Multidisciplinary Design Optimization

Andrew Borean Lambe

Doctor of Philosophy

Graduate Department of Aerospace Science and Engineering

University of Toronto

2015

Multidisciplinary design optimization (MDO) is an approach to engineering design that

exploits the coupling between components or knowledge disciplines in a complex system

to improve the final product. In aircraft design, MDO methods can be used to simul-

taneously design the outer shape of the aircraft and the internal structure, taking into

account the complex interaction between the aerodynamic forces and the structural flex-

ibility. Efficient strategies are needed to solve such design optimization problems and

guarantee convergence to an optimal design.

This work begins with a comprehensive review of MDO problem formulations and so-

lution algorithms. First, a fundamental MDO problem formulation is defined from which

other formulations may be obtained through simple transformations. Using these funda-

mental problem formulations, decomposition methods from the literature are reviewed

and classified. All MDO methods are presented in a unified mathematical notation to

facilitate greater understanding. In addition, a novel set of diagrams, called extended

design structure matrices, are used to simultaneously visualize both data communication

and process flow between the many software components of each method.

For aerostructural design optimization, modern decomposition-based MDO meth-

ods cannot efficiently handle the tight coupling between the aerodynamic and structural

states. This fact motivates the exploration of methods that can reduce the computational

cost. A particular structure in the direct and adjoint methods for gradient computation

ii

motivates the idea of a matrix-free optimization method. A simple matrix-free opti-

mizer is developed based on the augmented Lagrangian algorithm. This new matrix-free

optimizer is tested on two structural optimization problems and one aerostructural opti-

mization problem. The results indicate that the matrix-free optimizer is able to efficiently

solve structural and multidisciplinary design problems with thousands of variables and

constraints. On the aerostructural test problem formulated with thousands of constraints,

the matrix-free optimizer is estimated to reduce the total computational time by up to

90% compared to conventional optimizers.

iii

Acknowledgements

First and foremost, I would like to thank my supervisor, Dr. Joaquim Martins. I first

met Dr. Martins as an undergraduate looking for summer research project ideas. His

interests in aircraft design and optimization left a strong impression on me and led me,

ultimately, to his research lab and this thesis. Even after his departure for Michigan, he

continued to provide extraordinary support and encouragement to myself and the other

students from MDO Lab Toronto. Dr. Martins also deserves credit for introducing me

to this branch of mathematics called “optimization,” which has evolved into passion of

mine over the last couple of years.

I would like to thank the other members of my UTIAS committee, Dr. David Zingg

and Dr. Tim Barfoot. Their questions and insights in the annual committee meetings

greatly improved both the cohesion of this thesis and my presentation skills.

I would like to thank my many colleagues in both MDO Lab Toronto and MDO Lab

Michigan for providing a welcoming and stimulating research environment. In particular,

I would like to thank Dr. Graeme Kennedy for his insights into structural analysis and

his early and eager support of the matrix-free method contained in this thesis, and Dr.

Gaetan Kenway for his help on the software engineering side. Without their efforts, the

main structural and aerostructural results in this thesis would not be possible.

I would also like to thank Sylvain Arreckx and Dr. Dominique Orban for their assis-

tance and mathematical insights into the optimization algorithm developed in this thesis

and for hosting me at École Polytechnique de Montréal on several occasions.

Finally, I would like to thank my parents and my brother Geoff for their constant

encouragement and for reminding me that “if it were easy, someone would have done it

already.”

iv

Contents

1 Introduction 1

2 MDO Problem Formulations 82.1 Notation and Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 All-at-Once (AAO) Formulation . . . . . . . . . . . . . . . . . . . . . . . 102.3 Simultaneous Analysis and Design (SAND) . . . . . . . . . . . . . . . . . 142.4 Individual Discipline Feasible (IDF) . . . . . . . . . . . . . . . . . . . . . 162.5 Multidisciplinary Feasible (MDF) . . . . . . . . . . . . . . . . . . . . . . 172.6 Computing Gradients for IDF and MDF . . . . . . . . . . . . . . . . . . 192.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Visualizing MDO Architectures 273.1 Diagram Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 The Extended Design Structure Matrix (XDSM) . . . . . . . . . . . . . . 293.3 Monolithic MDO Architectures . . . . . . . . . . . . . . . . . . . . . . . 343.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Current MDO Architectures 384.1 Motivation for Decomposition . . . . . . . . . . . . . . . . . . . . . . . . 394.2 Distributed Architecture Classification . . . . . . . . . . . . . . . . . . . 404.3 Distributed MDF Architectures . . . . . . . . . . . . . . . . . . . . . . . 434.4 Distributed IDF Architectures . . . . . . . . . . . . . . . . . . . . . . . . 454.5 Architecture Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . 514.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Monolithic MDO Problem Structures 545.1 Sparsity in MDO Problem Formulations . . . . . . . . . . . . . . . . . . 545.2 Exploiting Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.3 Exploiting Analytic Gradient Structures . . . . . . . . . . . . . . . . . . 665.4 Constraint Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6 Development of a Matrix-Free Optimizer 766.1 Algorithm Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776.2 The Augmented Lagrangian Method . . . . . . . . . . . . . . . . . . . . 78

v

6.3 Estimating Second Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 846.4 The Split Quasi-Newton Strategy . . . . . . . . . . . . . . . . . . . . . . 896.5 The Approximate Jacobian Strategy . . . . . . . . . . . . . . . . . . . . 906.6 Implementation Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7 Structural and Aerostructural Wing Design 977.1 Analysis and Optimization Software . . . . . . . . . . . . . . . . . . . . . 987.2 Benchmark Problem: Plate Design . . . . . . . . . . . . . . . . . . . . . 997.3 Structural Optimization of an Aircraft Wing . . . . . . . . . . . . . . . . 1067.4 Aerostructural Optimization of an Aircraft Wing . . . . . . . . . . . . . 1197.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8 Conclusions and Recommendations 128

Bibliography 135

vi

List of Tables

7.1 Average run times for specific computations in the plate optimization prob-lem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

7.2 Average run times for specific computations in the wing structure opti-mization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.3 Aircraft specifications from [3, 101] . . . . . . . . . . . . . . . . . . . . . 1217.4 Average run times for specific computations in the aerostructural opti-

mization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

vii

List of Figures

1.1 Historical trends in processor clock speed, core count, and number of tran-sistors [71]. Recent gains in computational rates come from greater parallelprocessing, not higher clock speed. . . . . . . . . . . . . . . . . . . . . . 5

2.1 Groups of variables and constraints that are eliminated from AAO to ob-tain SAND, IDF, and MDF. . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1 Example design structure matrix for an automobile engine from Browning[33] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Generic, three-discipline, fully-coupled, multidisciplinary system. Eachdiscipline analysis i shares its state yi with other disciplines and requiresthe states of other disciplines in its own analysis. . . . . . . . . . . . . . 30

3.3 Gauss–Seidel MDA procedure. Each discipline analysis is evaluated insequence using the most recent state information from other disciplines anda fixed choice of design variables. The MDA block measures convergenceof the discipline states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4 Jacobi MDA procedure with parallel execution of discipline analyses. Thesystem being analyzed is identical to that in Figure 3.3. . . . . . . . . . . 32

3.5 Jacobi MDA procedure with parallel execution of discipline analyses usingour convention for parallel diagram structure. The MDA process shownhere is identical to that shown in Fig. 3.4. . . . . . . . . . . . . . . . . . 33

3.6 Optimization algorithm where the optimizer requires gradients of both theobjective and the constraints. The gradients are calculated by a separatecomponent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.7 XDSM for the SAND architecture. The locations in which the functionsof Problem (2.3) are evaluated are noted in the diagram. . . . . . . . . . 34

3.8 XDSM for the IDF architecture. The locations in which the functions ofProblem (2.4) are evaluated are noted in the diagram. . . . . . . . . . . . 35

3.9 XDSM for the MDF architecture. The locations in which the functions ofProblem (2.5) are evaluated are noted in the diagram. . . . . . . . . . . . 36

4.1 Classification and summary of the MDO architectures. . . . . . . . . . . 42

4.2 Diagram of the ASO architecture. . . . . . . . . . . . . . . . . . . . . . . 43

4.3 Diagram of the CO architecture. . . . . . . . . . . . . . . . . . . . . . . . 47

4.4 Diagram of the ATC architecture. . . . . . . . . . . . . . . . . . . . . . . 50

viii

5.1 Jacobian sparsity of the SAND problem formulation with three disciplines.The sparsity structure with respect to the problem disciplines is clearlyvisible. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.2 Jacobian sparsity of the IDF problem formulation with three disciplines. 57

5.3 Jacobian sparsity of the alternative IDF problem formulation (5.1). Notethe similarity with Figure 5.1. . . . . . . . . . . . . . . . . . . . . . . . . 58

5.4 Jacobian sparsity of the MDF problem formulation with three disciplines.This formulation has no obvious sparsity structure to exploit. . . . . . . 58

5.5 SAND Jacobian sparsity of Figure 5.1 reordered to group disciplinary vari-ables together. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.6 IDF Jacobian sparsity of Figure 5.3 reordered to group disciplinary vari-ables together. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.7 Computational times needed to solve problem (5.6). . . . . . . . . . . . . 64

5.8 Function calls needed to solve problem (5.6). . . . . . . . . . . . . . . . . 65

5.9 KS aggregation of two bound constraints for various values of ρKS. As ρKSincreases, the accuracy of the boundary to the feasible region improves,and the optimal objective value decreases. . . . . . . . . . . . . . . . . . 72

5.10 For ρKS = 2, the gradient of the KS function changes gradually near theconstraint intersection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.11 For ρKS = 30, the gradient of the KS function changes abruptly near theconstraint intersection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.1 Performance profile comparing our matrix-free optimizer to LANCELOTon a collection of test problems. Among the optimizers tested, our matrix-free optimizer (AUGLAG-SBMIN) was able to solve 90% of the problemsin the test set and is competitive with LANCELOT when the LSR1 quasi-Newton method is used to estimate second derivatives. . . . . . . . . . . 87

7.1 Geometry and load condition of plate mass minimization problem . . . . 100

7.2 Final thickness distributions for the 400-, 1600-, and 3600-element plateproblems. These solutions were all obtained by the approximate Jacobianversion of the matrix-free optimizer. . . . . . . . . . . . . . . . . . . . . 101

7.3 Stress distributions as a fraction of the local yield stress for the 400-, 1600-, and 3600-element plate problems. These solutions were all obtained bythe approximate Jacobian version of the matrix-free optimizer. SNOPTsolutions are similar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.4 Number of forward and adjoint linear solves required to solve the plate de-sign optimization problem and the corresponding run time for each prob-lem size. By construction, the number of variables is equal to the numberof constraints in all instances of the problem. Across a range of problemsizes, both versions of AUGLAG are more efficient than SNOPT in termsof number of linear solves, but are not competitive in terms of run time,even with parallel processing taken into account. . . . . . . . . . . . . . . 103

ix

7.5 Comparison of wall time fraction spent in optimizer to solve the plate prob-lem. As a fraction of total computational time, the split-quasi-NewtonAUGLAG software requires little computation for large problems com-pared to SNOPT. This result suggests that the implementation languageof the optimizer does not play a significant role in the run time resultshown in Figure 7.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7.6 Outer geometry and layout of the baseline wing structure. . . . . . . . . 107

7.7 Illustration of patches on the wing to which individual design variablesand failure constraints are assigned. . . . . . . . . . . . . . . . . . . . . . 107

7.8 Overestimate of the optimal mass of the test wing for various aggregationschemes and ρKS values using SNOPT. The mass is normalized with re-spect to the case of 2832 constraints and ρKS = 100. A 10% spread inoptimum mass is observed at ρKS = 50 while a 4% spread is observed atρKS = 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

7.9 Top skin thicknesses for ρKS = 50 and ρKS = 100 when different ag-gregation schemes are employed. Using more failure constraints in theoptimization problem allows certain parts of the wing to be designed witha thinner skin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

7.10 Stress distributions on the top surface of the wing for the 2.5g load caseusing optimal solutions for ρKS = 50 and ρKS = 100 and two differentaggregation schemes. The ‘lambda’ value indicates the ratio of stress toyield stress of the material. The wing design obtained using ρKS = 100and a large number of failure constraints is more fully stressed, indicatinga more efficient structure. . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7.11 Number of linear solve operations and run time to optimize test wingusing SNOPT for several aggregation schemes and ρKS values. Constraintaggregation clearly reduces the computational effort to solve the designproblem, at a cost of a higher optimum mass shown in Figure 7.8. However,the relationship between run time and amount of aggregation is not simple.112

7.12 Trade-offs between mass overestimate and computational effort for optimalwing structure design. The lowest-cost estimates of optimum mass comefrom aggregating failure constraints as much as possible and using largevalues of ρKS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.13 Overestimate of the optimal mass of the test wing comparing two aggrega-tion schemes in SNOPT with AUGLAG. Despite using relaxed convergencetolerances, both versions of AUGLAG find optimum masses within 1-3%of the best estimate from SNOPT. . . . . . . . . . . . . . . . . . . . . . 114

7.14 Comparison of the number of linear solves and run time to optimize thetest wing for SNOPT and AUGLAG. AUGLAG can solve the design prob-lem with 2832 constraints using more than 80% fewer linear solves thanSNOPT for a range of ρKS values. However, only the split-quasi-Newtonversion of AUGLAG reduces the run time to solve the problem comparedto SNOPT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

x

7.15 Trade-offs between mass overestimate and computational effort for theoptimal wing structure design using AUGLAG. For a small increase in thenumber of linear solves, AUGLAG with the minimum amount of constraintaggregation can provide a lower optimum mass estimate than SNOPT withconstraint aggregation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

7.16 Comparison of the run time to optimize the test wing for SNOPT andAUGLAG Split QN using a selection of random starting points. On aver-age, the run time is independent of ρKS in this range of values and randomfluctuations are common. . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7.17 Overestimate of TOGW of the test aircraft comparing SNOPT with con-straint aggregation to AUGLAG Split QN without constraint aggregation.The TOGW is normalized with respect to the case of 4251 constraints andρKS = 100. For ρKS values higher than 50, the increase in TOGW causedby constraint aggregation is less than 1%. . . . . . . . . . . . . . . . . . . 123

7.18 Comparison of the wing deflection under 2.5g load for ρKS = 50 for thecases of 18 constraints (gray) and 4251 constraints (blue). Because of thesmall difference in computed TOGW, the difference in tip deflection iswithin the thickness of the tip airfoil. . . . . . . . . . . . . . . . . . . . . 124

7.19 Overestimate of the wing structure mass for the minimum-TOGW aircraft.The difference in structural mass caused by aggregation of the failure con-straints is similar to that found in Figure 7.8. . . . . . . . . . . . . . . . 124

7.20 Comparison of the number of linear solves and run time to optimize thetest wing for the split-quasi-Newton version of AUGLAG with the esti-mated equivalent cost in SNOPT. Even with parallel processing solvingthe problem with 4251 constraints is expected to take SNOPT more thantwo weeks. AUGLAG is, therefore, much more cost effective than SNOPTif a high priority is placed on avoiding constraint aggregation. . . . . . . 126

xi

List of Symbols and Abbreviations

Abbreviations

AAO All-at-OnceASO Asymmetric Subspace OptimizationATC Analytical Target CascadingAUGLAG A matrix-free augmented Lagrangian optimizerBFGS Broyden–Fletcher–Goldfarb–Shanno (quasi-Newton method)BLISS Bi-Level Integrated System SynthesisCO Collaborative OptimizationCRM Common Research ModelCSSO Concurrent Subspace OptimizationDFP Davidon–Fletcher–Powell (quasi-Newton method)DSM Design Structure MatrixECO Enhanced Collaborative OptimizationEPD Exact Penalty DecompositionGMRES Generalized Minimum ResidualIDF Individual Discipline FeasibleIPD Inexact Penalty DecompositionKKT Karush–Kuhn–Tucker (optimality conditions)KS Kreisselmeier–Steinhauser (constraint aggregation)LBFGS Limited-memory BFGSLSR1 Limited-memory SR1MDA Multidisciplinary AnalysisMDF Multidisciplinary FeasibleMDO Multidisciplinary Design OptimizationMDOIS Multidisciplinary Design Optimization based on Independent SubspacesNIP Nonlinear Interior PointQSD Quasi-Separable DecompositionSAND Simultaneous Analysis and DesignSBMIN A matrix-free optimizer for bound-constrained nonlinear problemsSNOPT Sparse Nonlinear Optimizer (software)SQP Sequential Quadratic Programming

xii

SR1 Symmetric Rank-One (quasi-Newton method)TACS Toolkit for the Analysis of Composite StructuresTOGW Takeoff Gross WeightTriPan A three-dimensional panel codeXDSM Extended Design Structure Matrix

Symbols, Chapters 2-5

c Vector of design constraint valuesC Vector of design constraint functionsCC Vector of consistency constraint functionsCJ Inconsistency objectives or constraints in CO architectureddx

Total derivative∂∂x

Partial derivativef Objective function valueF Design objective function (scalar)φ Penalty function value for ATC architecture constraintsΦ Penalty function for ATC architecture constraintsI Identity matrixKS Kresselmeier–Steinhauser (KS) aggregation functionm Number of constraintsM Number of state variablesn Number of design variablesN Number of disciplinesP p-norm aggregation functionr Vector of governing equation residual valuesR Governing equations of a discipline analysisρKS KS parameterw Penalty weight vectorx Design variable vectorx̂0i Discipline i copy of system design variables (CO and ATC architectures)y State variable vectorŷ Coupling state variable vector copyY Implicit function that computes state variables y

Common Superscripts and Subscripts, Chapters 2-5

Superscript (0) Initial dataSubscript 0 Vector or function is shared over the entire system

xiii

Subscript i Discipline index (MDO problems) or constraint index (gradientcomputation)

Subscript j Discipline index (MDO problems) or design variable index (gradientcomputation)

Subscript k State variable indexSuperscript ∗ Optimal value

Symbols, Chapter 6

A Approximate Jacobian matrixB Approximate Hessian matrixC Constraint function vector∆ Trust region radiusF Objective functionΦ Augmented Lagrangian functiong Augmented Lagrangian gradientH Exact Hessian matrixI Identity matrixI Infeasibility functionJ Exact Jacobian matrixL Lagrangian functionλ Lagrange multiplier vectorp Search direction vectorP Projection operatorQ Quadratic model of augmented Lagrangianρ Penalty parameters Search direction for quasi-Newton methodsσ Adjoint search direction for quasi-Newton methodst Slack variable vectorx Decision variable vectorxL Decision variable lower boundsxU Decision variable upper boundsy Change in gradient for quasi-Newton methods∇ Gradient operator∇2 Hessian operatorη Feasibility toleranceω Optimality toleranceΩ Bound-constrained region

xiv

Chapter 1

Introduction

Multidisciplinary design optimization (MDO) is a field of engineering that is concerned

with the application of numerical optimization methods to the design of engineering

systems. Here, “engineering systems” means any device with multiple interacting com-

ponents and where multiple knowledge bases, or disciplines, are needed to create design

solutions. In aeronautics, example disciplines include aerodynamics, structural mechan-

ics, combustion, heat transfer, vehicle dynamics, and control analysis. Knowledge of each

area is typically provided through computational models. Combining these models in a

suitable environment provides information about how decisions made by one discipline

affect other disciplines. By mating these models with mathematical optimization tech-

niques, we are able to tailor our designs to account for interdisciplinary interactions while

maximizing or minimizing a specific objective and satisfying design constraints. While

this thesis focuses on the application of MDO to aircraft design [13, 15, 88, 112, 128], we

emphasize that MDO methods can be applied to a wide variety of engineering systems,

and the methods developed herein are generally applicable. Example applications from

the literature include automobiles [108, 109], spacecraft [31, 36, 90], ships [91, 138, 145],

bridges [19], buildings [42, 70], wind turbines [17, 66, 67, 98], railway cars [64], en-

gines [133, 172], robots [86, 173], batteries [189, 190], and even microscopes [147].

1

Chapter 1. Introduction 2

MDO exhibits three principal advantages over traditional approaches to engineering

design. First, MDO makes extensive use of computational tools for analyzing and refining

design solutions. Using computational tools in place of physical experiments and proto-

type hardware greatly reduces the time and cost of the design process because the impact

of small changes to the design can be evaluated more quickly. Second, by combining the

computational tools, MDO allows designers to consider interdisciplinary interactions con-

current with the disciplines themselves. Designers thus have access to more information

about how local changes to a design can influence the behaviour of the whole system and

integration issues can be addressed earlier in the design process. Finally, by employing

numerical optimization tools, MDO allows designers to rapidly explore the design space.

Baseline designs can be automatically refined to exhibit better performance, and solu-

tions may be found that were not intuitive to the human designers. This last advantage

is particularly useful when the design being considered is a novel concept in which design

experience is lacking and empirical rules-of-thumb have yet to be developed.

In aeronautics, MDO can trace its origins back to the early work on structural op-

timization by Schmit [158, 183]. Initially, only simple truss structures were considered,

but the idea of applying optimization methods to structural design quickly spread. One

of the first truly multidisciplinary design problems solved by optimization was that of

an aircraft wing subject to constraints on the amount of lift generated and the allowable

stress on the structure [159]. Variations on this design problem are still solved to this

day, albeit with much more sophisticated wing models and optimization software. The

algorithms developed in this thesis are applied to one such version of this problem.

An aircraft wing provides an ideal example of the importance of considering multiple

disciplines of a system early in the design process. In conducting a purely aerodynamic

analysis of a wing, we implicitly assume that the shape of the wing is known and that

the wing is completely rigid. In other words, the computed aerodynamic load does

not change the wing shape. Likewise, in conducting a purely structural analysis of the


same wing, we assume that the load on the structure does not vary as the structure

is deflected. (This assumption is present even if we only consider the case of a linear

structure.) In practice, of course, neither of these assumptions hold true. A real wing

bends under aerodynamic loading and the change in shape itself changes the loading

on the structure. The more flexible the wing, the more strongly the aerodynamic and

structural analyses are coupled. It is only by considering the aerodynamic and structural

disciplines simultaneously, and transferring information between them that we are able

to accurately model and understand the behaviour of the wing system.

The use of MDO in aircraft design has taken on increasing importance in recent

years. In addition to reducing operating costs for the airlines, the aviation community is

increasingly concerned about the environmental impacts of aviation. Human industrial

activity, especially the burning of fossil fuels, is the widely-accepted cause of a rising level

of atmospheric carbon dioxide and subsequent warming of the planet [2]. While avia-

tion activities only contribute 2% of the carbon emissions worldwide [1], these emissions

necessarily occur at a high altitude, and the altitude of the emissions has been shown to

increase the effect on the warming process [1]. Furthermore, projected increases in the

demand for air travel mean that the number of aircraft in service is expected to double in

the next 20 years [4, 5]. These new aircraft must be even more efficient than the previous

generation to keep aviation’s relative contribution to global emissions constant.

Keeping the relative emissions constant or decreasing them can only be achieved

through developing new technologies to reduce drag, weight, and fuel consumption.

Equally important is the development of new design procedures for appropriately in-

tegrating these technologies into new aircraft for maximum efficiency [96]. Additional

research suggests that more efficient aircraft can be developed by moving away from the

traditional “tube-and-wing” configuration to more unconventional designs. For exam-

ple, Liebeck [122] proposed a concept called the blended wing body, in which the wing

and fuselage merge seamlessly to create a smoother aerodynamic shape [113, 124, 150].


Gallman et al. [68] studied a joined-wing aircraft, in which a second wing connects the

tips of the main wing to the top of a vertical tail. Gur et al. [81] studied strut-braced

and truss-braced wings, in which additional external structure is added near the wing

root to develop wings with higher aspect ratios. While all these concepts offer great

promise in improving efficiency, none of them have been built as a full-sized aircraft.

When these concepts are developed into full-sized aircraft, they have to be competitive

with the tube-and-wing configuration from the start, without all the accumulated years

of empirical design knowledge. It is in this environment of radical changes to design

objectives, aircraft configuration, and technology that MDO has become a promising

approach to the design of new aircraft.

A major consideration in MDO, and other computational methods in engineering, is

the computer hardware used to solve the design problems. Since the 1960s, the num-

ber of transistors that can be etched on a given circuit board has doubled every 24

months according to Moore’s Law [136]. For many years, this meant that the clock

speed of processors doubled at the same rate. However, to keep heat dissipation manage-

able on silicon circuit boards, the speed of sequential processing has stagnated in recent

years [171]. Rather than coming from higher clock speeds, the continued increase in the

rate of computation is expected to come from the increased use of specialized computer

architectures, such as multicore processors, that make greater use of parallel process-

ing [71, 171]. Figure 1 displays the historical trends in CPU technology from 1970 to the

present day. Special attention must be paid to developing those MDO methods that can

exploit parallel computing facilities to a high degree.

Outline and Contributions

This thesis is primarily concerned with the “how” of MDO: the formulation of the opti-

mization problem, the computation of design sensitivities, i.e., gradient information, and


Figure 1.1: Historical trends in processor clock speed, core count, and number of tran-sistors [71]. Recent gains in computational rates come from greater parallel processing,not higher clock speed.

the solution algorithm itself. In particular, we are interested in how the structure of the

optimization problem influences the selection of the optimization software and the larger

MDO architecture. We use the term “MDO architecture” to mean the combination of

the optimization problem formulation and the algorithm used to solve it. We focus our

discussion on MDO problems that contain no discrete or integer variables and whose

objective and constraint functions are smooth and have little noise.

The overarching goal of this thesis is to advance the state-of-the-art in MDO methods,

not just for aircraft design applications, but the design of complex engineering systems

in general. We start by surveying the range of approaches available to solving MDO

problems and develop a framework in which to describe them. This framework helps

to tie together key concepts in the literature and places individual architectures in the

proper context. We also introduce some new notation and diagrams to help newcomers

to the MDO field become acquainted with the various techniques. Both the survey itself

and a description of the diagrams are available in the literature [115, 131]. These papers


complement previous surveys of the MDO field by Sobieszczanski-Sobieski and Haftka

[165], and Agte et al. [8].

After developing this framework, we focus on a particular area of application: the

aerostructural design of aircraft wings. Using our framework, we show the typical archi-

tectures used to solve this problem and current challenges with using these architectures.

By observing a particular matrix structure in the gradient computation that is common to

many MDO problem formulations, we motivate and develop a numerical optimizer that is

“matrix-free” in that it does not require the computation of complete derivative matrices.

While matrix-free optimization algorithms have been considered by the optimization com-

munity for many years [49, 73, 87], robust implementations of these algorithms that solve

general optimization problems are essentially nonexistent. We conjecture that the lack of

a suitable application has hindered matrix-free algorithm development to this point. Our

work is the first time such an optimizer has been considered for MDO applications. We

apply our matrix-free optimizer to several test problems related to aerostructural design

and show how this matrix-free approach to MDO offers great promise for reducing the

computational effort required to solve large-scale MDO problems. These results are in

the process of being published in the peer-reviewed literature [16, 116].

This thesis is structured as follows. Chapter 2 outlines the three main problem for-

mulations in MDO including a common formulation from which they are all derived.

Chapter 2 also discusses issues surrounding the computation of gradient information in

MDO problems for the main problem formulations. Chapter 3 outlines a novel way of

visualizing MDO architectures, known as the extended design structure matrix (XDSM).

Chapter 4 surveys existing MDO architectures using the tools provided in the earlier

chapters, including a novel classification of MDO architectures that points out key simi-

larities between the architectures and areas for future research. Chapter 5 returns to the

three main problem formulations to identify specific ways of exploiting problem structure.

Chapter 6 motivates the use of matrix-free optimization methods for our design problem


of interest and outlines the main features of our matrix-free algorithm. Chapter 7 shows

the results of applying the matrix-free optimizer to structural and aerostructural design

problems, including trends in the computational effort needed to solve the problems. Fi-

nally, Chapter 8 summarizes our findings, the key contributions of this work, and future

research directions.

Chapter 2

MDO Problem Formulations

Many MDO problem formulations exhibit common structural patterns. In this chapter,

we show how these patterns can be unified into a single problem statement and how to

make specific transformations to adapt this problem statement to suit the information

available to the optimizer. In particular, the form in which each discipline analysis

is conducted, and whether or not the governing equations can be made available to the

optimizer dictates the form of the MDO problem to solve. We also discuss how to compute

derivatives for gradient-based optimization for cases in which the governing equations of

each discipline are solved separately from the optimizer. This chapter builds off the

seminal work of Cramer et al. [48] and synthesizes fundamental ideas in MDO problem

formulation and gradient computation [131]. Further details on derivative computation

methods are given by Martins and Hwang [130].

2.1 Notation and Terminology

We now define some basic terminology in MDO so that we can extend a basic nonlin-

ear optimization problem to apply to a typical engineering design problem. A design

variable is a variable that is always under the control of the optimizer regardless of the

problem formulation. These variables correspond to decisions or specifications made by

8

Chapter 2. MDO Problem Formulations 9

the designers of the system. Examples of these variables include component dimensions

and geometry specifications. In a multidisciplinary context, these variables may be local,

i.e., pertain to a specific discipline, or may be shared between multiple disciplines. A

variable like the sweep angle of a wing would affect both aerodynamic, structural, and

stability disciplines, so it is a good example of a shared design variable. We use the letter

x to denote a vector of design variables, x0 to denote a vector of shared variables, xi to

denote a vector of variables local to discipline i. We denote the complete set of all design

variables in the problem by x.

A discipline analysis is a simulation or computation that models the behaviour of one

aspect of a multidisciplinary system. Discipline analyses can range in complexity from

empirical rules-of-thumb or curve-fit data, to physics-based models that directly solve

a set of governing differential or integral equations. These analyses can be classified as

low-fidelity or high-fidelity, depending on how accurately and robustly they model the

real-world behaviour of a system. The output of a discipline analysis is a set of state

variables, also known as response variables. Examples include fluid density and velocity

at specific points in the flow field of an aerodynamic analysis; deformation, strain, and

stress in a structural analysis; and particle positions, velocities, and vibration frequencies

in a dynamics analysis. While state variables are often computed through the discipline

analysis process itself, some MDO problem formulations treat the state variables as

independent variables and the governing equations of the analysis as a set of constraints

in the optimization problem. (Sections 2.2 and 2.3 discuss those formulations.) We

denote the set of equations governing discipline analysis i by Ri and corresponding set

of state variables by yi. We denote the set of state variables computed by all disciplines

by y.

Like design variables, design objectives and constraints may also be treated as local

or shared. Local objectives and constraints may only depend on information available to

that discipline. For discipline i, the objective and constraint functions may only depend


on x0, xi, and yi. Shared objectives and constraints may only depend on shared design

variables x0 and the state information produced by all disciplines, y. Respectively, we

denote shared and local design objectives by F0 and Fi and the sets of shared and local

design constraints by C0 and Ci. We denote the complete set of both local and shared

constraints by C. For clarity, all functions are denoted by capital letters and all variables

are denoted by lower-case letters.

The key feature of MDO problems, as compared with single-discipline optimization

problems, is that the disciplines exchange state variables. The state variables of discipline

i become inputs to another discipline j and vice versa. In general, only a subset of all

state variables needs to be exchanged and we refer to this subset as the coupling variables.

To simplify the notation, we denote the coupling variables by the same symbol as the

state variables, yi. If individual discipline analyses are solved separately from each other,

the output of discipline i will not be the same as the coupling variable information input

to discipline j. We must, therefore, specify a copy of the discipline i coupling variables to

be used as input to the other disciplines. We denote this copy by ŷi. These variables are

sometimes referred to as target variables in the literature. Both the coupling variables

and their copies must converge to the same value at an optimal design, so we must specify

an additional set of consistency constraints in the problem formulation to enforce this

condition. These consistency constraints are denoted by Cci and take the form ŷi−yi = 0

for each discipline.

2.2 All-at-Once (AAO) Formulation

Having defined the terminology, we can now show the MDO problem in its most general

form. We refer to this problem as the all-at-once (AAO) problem because all the mathe-

matical relations needed to define the problem are present. We define the AAO problem


with N disciplines as

minimize F0 (x0, y) +N∑

i=1

Fi (x0, xi, yi)

with respect to x, ŷ, y

subject to C0 (x0, y) ≥ 0Ci (x0, xi, yi) ≥ 0 for i = 1, . . . , NCci (ŷi, yi) = ŷi − yi = 0 for i = 1, . . . , NRi (x0, xi, ŷj, yi) = 0 for i = 1, . . . , N, j 6= i.

(2.1)

To give a concrete example of an MDO problem, we refer back to the aircraft wing exam-

ple discussed in the introduction. A simple wing design problem contains two disciplines:

1) aerodynamics, and 2) structures. The governing equations of the aerodynamic and

structural analyses are contained in R1 and R2. The aerodynamic state y1 defines the

properties of the airflow over the wing. The structural state y2 defines the deflection

of the structure under the aerodynamic loads. The shared design objective F0 can be

an aircraft performance objective, such as range, endurance, or take-off gross weight

(TOGW) for a design mission, that requires both the aerodynamic and structural state

to evaluate. The problem objective could also be a linear combination of an aerodynamic

objective F1, such as minimum drag, and a structural objective F2, such as minimum

mass. The shared design variables of the problem, x0, define the wing geometry, such as

sweep, twist, taper, and span. Aerodynamic design variables x1 include the angle of at-

tack at each flight condition analyzed. Structural design variables x2 include thicknesses

of the ribs, spars, and skin. An example aerodynamic design constraint C1 is that the

wing must generate a prescribed lift for a prescribed flight condition. Structural design

constraints C2 include limits on the stress and deflection in different parts of the wing

structure. An example shared design constraint C0 is that the total aircraft weight, in-

cluding the structure, and the lift computed by the aerodynamics analysis must match

at a prescribed flight condition.


Note that Problem (2.1) is stated in a canonical form. Design constraints of a different

form, such as bounds on individual variables, general equality constraints, and general

range constraints, can be lumped into the appropriate vector of local constraints using

simple transformations. In addition, the discipline-specific objectives Fi are not always

necessary and may be set to zero. Similar general problem statements have appeared in

the literature before; Cramer et al. [48] designate one MDO problem formulation as AAO,

while a slightly different formulation is simply stated as “the most general form [48].”

While we have simplified the issue of coupling variable exchange discussed by Cramer et

al., Problem (2.1) is most like the problem statement of “the most general form.”

We emphasize that our nomenclature for Problem (2.1) is not the current standard in

the literature. Some literature attributes names like “all-at-once” or “all-in-one” to what

we refer to as the multidisciplinary feasible (MDF) formulation. This nomenclature comes

from the viewpoint that the discipline analyses and design optimization are handled

separately, which is not always the case. In other literature, Problem (2.1) is referred

to as simultaneous analysis and design (SAND) [20, 82]. While SAND and AAO are

identical for the case of N = 1, we have chosen to separate them to more naturally derive

the other fundamental problem formulations.

Because of the complexity of the minimization problems that arise in MDO, they

are solved numerically with specialized mathematical optimization algorithms and soft-

ware. This software requires the user to provide function values for all objective and

constraint functions given a choice of input variables. In the case of Problem (2.1), the

user would have to provide the output values of F0, Fi, C0, Ci, Cci , and Ri for all i

given a choice of x, y, and ŷ. Depending on the optimization software, first and second

derivative information may also be required. Algorithms that use derivative information

are called gradient-based algorithms, while algorithms that do not are called gradient-

free algorithms. Section 2.6 discusses how to obtain derivatives for MDO problems when

gradient-based algorithms are applicable.


AAO SAND

IDF MDF

Problem Formulationsminimize F0 (x, y) +

N∑

i=1

Fi (x0, xi, yi)

with respect to x, ŷ, y

subject to C0 (x, y) ≥ 0Ci (x0, xi, yi) ≥ 0 for i = 1, . . . , NCci = ŷi − yi = 0 for i = 1, . . . , NRi (x0, xi, ŷj 6=i, yi) = 0 for i = 1, . . . , N

Remove Cc, ŷ

RemoveR, y

RemoveR, y

Remove Cc, ŷ

Figure 2.1: Groups of variables and constraints that are eliminated from AAO to obtainSAND, IDF, and MDF.

In the broader optimization literature, Problem (2.1) may be categorized as a prob-

lem with complicating variables [43] or a quasiseparable problem [83]. Problems of this

form may be solved by decomposition methods. If we substitute ŷ for y in F0 within

Problem (2.1) and fix the values of the shared variables x0 and ŷ, we would obtain a

minimization problem of the form

minimizeN∑

i=1

Fi (xi, yi)

with respect to x, y

subject to Ci (xi, yi) ≥ 0 for i = 1, . . . , NRi (xi, yi) = 0 for i = 1, . . . , N.

(2.2)

Decomposing Problem (2.2) into N independent minimization problems is trivial. All the

objective and constraint functions depend on disjoint sets of local variables xi and yi, so

the N problems can be solved independently, possibly in parallel. Solving Problem (2.1)

with the shared variables included while exploiting the problem structure requires spe-

cialized decomposition methods. Chapter 4 discusses decomposition strategies in more

detail.

The power of the AAO problem statement as we have described it is that the SAND,

individual discipline feasible (IDF), and MDF formulations can be derived from it by

simply eliminating certain groups of constraints and corresponding sets of variables. Fig-


ure 2.1 sketches the transformations between AAO, SAND, IDF, and MDF. The elimi-

nated constraints are said to be closed within the formulation [10]. Satisfaction of these

closed constraints is not dependent upon the action of the optimizer but on some other

aspect of the problem. For example, the constraints Ri(x0, xi, ŷj, yi) = 0 can be closed

by solving discipline analysis i directly within the optimization process. In the following

sections, we detail the process of closing certain groups of constraints, how the prob-

lem formulation changes in each case, and the advantages and disadvantages of each

formulation.

2.3 Simultaneous Analysis and Design (SAND)

Problem (2.1) is never explicitly solved in practice. Due to the simple structure of the

consistency constraints in Problem (2.1), we can eliminate them by introducing a single

set of state variables. The resulting formulation is the Simultaneous Analysis and Design

(SAND) problem,

minimize F0 (x0, y)

with respect to x, y

subject to C0 (x0, y) ≥ 0Ci (x0, xi, yi) ≥ 0 for i = 1, . . . , NRi (x0, xi, y) = 0 for i = 1, . . . , N.

(2.3)

Because the optimizer maintains control over satisfaction of the discipline analyses di-

rectly, it has responsibility for simultaneously analyzing and designing the system. As

stated in Section 2.2, the SAND and AAO problems are identical if N = 1. If the disci-

pline analyses are discretized partial differential equations, (PDEs,) the SAND problem

is a general form of PDE-constrained optimization. (The texts by Biegler et al. [23] and

Borz̀ı and Schultz [26] provide comprehensive overviews of that field.)

The SAND formulation may be regarded as an “intrusive” problem formulation be-


cause it requires access to a lot of information from the disciplinary computational models

and control of many variables that would normally be handled by individual disciplines.

Rather than having each discipline complete its own analysis outside of the optimizer

given coupling and design variable inputs, the SAND formulation forces the optimizer to

choose the set of state variables that solve each discipline analysis. If specific software

was developed to solve the discipline analyses directly, this software may not be useful

in a SAND formulation without extensive modification. Furthermore, if the discipline

analyses themselves consist of millions of equations and state variables — a common oc-

currence when performing the highest-fidelity analyses, such as a three-dimensional CFD

simulation — then the optimization problem will be enormous.

If the discipline analysis software can accommodate the SAND formulation and the

computational environment can handle the size of the MDO problem under consideration,

the SAND formulation may result in the fastest times to obtain an optimal design. This is

because the optimizer is not restricted to searching regions of the design space where the

design must always be feasible with respect to certain sets of constraints. In other words,

at each new point chosen by the optimizer, we do not have to choose combinations of x0,

xi, and yi that solve Ri(x0, xi, y) = 0 in order to guarantee convergence to an optimal

design. The optimizer itself ensures that the governing equations are solved at the final

design solution. If the discipline analyses are nonlinear or require iterative methods

in their solution, this feature is particularly useful. Nearly all modern optimization

algorithms, especially the sequential quadratic programming (SQP) and nonlinear interior

point (NIP) algorithms, permit the exploration of infeasible regions of the design space

to converge to a solution more rapidly.


2.4 Individual Discipline Feasible (IDF)

Instead of eliminating the consistency constraints from Problem (2.1), we now choose

to eliminate the discipline analysis constraints. This nonlinear constraint elimination

results in the individual discipline feasible (IDF) formulation [48], given by

minimize F0 (x0, Y (x, ŷ))

with respect to x, ŷ

subject to C0 (x0, Y (x, ŷ)) ≥ 0Ci (x0, xi, Yi (x0, xi, ŷj)) ≥ 0 for i = 1, . . . , N, j 6= iCci (x0, xi, ŷ) = ŷi − Yi (x0, xi, ŷj) = 0 for i = 1, . . . , N, j 6= i.

(2.4)

The notation Yi is used to highlight the fact that the corresponding coupling variables yi

are no longer independent variables but are functions of other variables. (The distinction

between the output variable value yi and the functional mapping Yi becomes important

when discussing gradient computation.) Because the governing equations Ri are non-

linear in general, we use the Implicit Function Theorem to argue that the elimination

is valid in the vicinity of a local minimum. More precisely, if Ri(x0, xi, yi, ŷj) = 0 and

∂Ri/∂yi is nonsingular, then Yi is implicitly defined such that yi = Yi(x0, xi, ŷj) for j 6= i.

If N = 1, IDF is sometimes known as nested analysis and design (NAND) [20] to draw

a direct comparison with SAND.

Unlike the SAND formulation, IDF does permit separate discipline analysis software

to compute the state variables independent from the optimizer. IDF is therefore less

intrusive than SAND and is easier to use with existing analysis software. Another positive

feature of IDF is that the problem size may be much smaller than that of SAND. Not

only are large sets of state variables and constraints not present in the optimization

problem, but, of the state variables that remain, only the coupling variables are needed

in the problem formulation to enforce interdisciplinary consistency of the optimal design.


In other words, the optimizer can ignore state variables that are not exchanged between

disciplines or used to evaluate any design objectives or constraints. If the number of

coupling variables is small, the optimization problem is also small.

The drawback with using IDF is that, unlike with SAND, the discipline analyses

must be computed precisely to resolve the implicit constraints Ri(x0, xi, yi, ŷj) = 0 at

the optimal design. In principle, we could perform the discipline analyses inexactly

in the early stages of the optimization and gradually converge the analysis in concert

with the optimization. Some methods in the optimization literature do account for

inexact function evaluation [37, 93] but we are not aware of any applications in which

these methods have been applied. The näıve approach of solving each discipline analysis

precisely is still the recommended approach for IDF.

When gradient-based optimization is employed with IDF, the method used to com-

pute the gradients is an important factor in the total computational work to find an

optimal design. In SAND, only partial derivatives with respect to all variables are re-

quired because all variables are independent. In IDF, eliminating the governing equation

constraints results in some variables becoming functions of others. Therefore, the deriva-

tives of each function with respect to the independent variables must account for the

change in the dependent variables as well. These gradients are, in fact, the total deriva-

tives of the function output with respect to the independent variables. Because of the

importance of this subject, we will delve into it in more detail in Section 2.6.

2.5 Multidisciplinary Feasible (MDF)

If we eliminate both discipline analysis and consistency constraints from the AAO prob-

lem, we obtain the multidisciplinary feasible (MDF) formulation [48]. The MDF problem


statement is given by

minimize F0 (x, Y (x))

with respect to x

subject to C0 (x, Y (x)) ≥ 0Ci (x0, xi, Yi (x)) ≥ 0 for i = 1, . . . , N.

(2.5)

Like IDF, the nonlinear elimination of the constraint sets Ri(x0, xi, yi, ŷj) = 0 is based

on the governing equations satisfying the Implicit Function Theorem in some neighbour-

hood of the optimal solution. In this case, Ri(x0, xi, yi, ŷj) = 0 and yi is eliminated

simultaneously for all disciplines.

One way of interpreting the MDF problem statement is as a single-discipline opti-

mization problem in which the single discipline analysis is replaced by a multidisciplinary

analysis (MDA). Because of this simple structure, MDF has great intuitive appeal. If an

MDA procedure is already in place, the MDF formulation provides a simple approach to

adding optimization to the design process. Another advantage of MDF is that the opti-

mization problem is the smallest of the three fundamental formulations. The optimizer

is responsible for changing only design variables and satisfying only design constraints.

State variables and coupling variables can be computed with specialized software. Fur-

thermore, if the optimization procedure needs to be terminated early, the consistency

constraints are already satisfied so that, even if the design constraints are not all satis-

fied, the system behaviour is known at a new design point.

Similar to IDF, the main disadvantage lies in the need to compute the MDA accurately

at the optimal design. Once again, it should be possible, in principle, to solve the MDA

less accurately while the optimization is in progress and increase accuracy as optimality

is approached. At present, we do not know of any rules which can be used to specify the

accuracy of the MDA adaptively, so our general recommendation is to resolve the MDA

accurately at each iteration of the optimization process.


Like IDF, the computation of gradient information for MDF becomes more difficult

due to the presence of total derivatives rather than partial derivatives. However, in MDF,

the problem is compounded by the fact that we are now dealing with multidisciplinary

systems in which a change in any variable propagates through the entire system. For-

tunately, efficient gradient computation methods for the multiple-discipline case can be

constructed with knowledge of efficient methods from the single-discipline case. The

next section outlines how derivatives can be computed for both IDF and MDF problem

statements.

2.6 Computing Gradients for IDF and MDF

While the focus of this thesis is on MDO problems that can be solved using gradient-

based optimization methods, many interesting design problems contain discrete or in-

teger design variables or have objective or constraint functions that are very noisy.

For these problems, gradient-free optimization methods are the only option. Often,

the gradient-free algorithms chosen use heuristic approaches to optimization like sim-

ulated annealing [105, 175], genetic algorithms [55], particle swarm optimization [97],

ant colony optimization [63], and more obscure methods like biogeography-based op-

timization [7, 163] and grey wolf optimization [135]. More mathematically rigourous

gradient-free methods include mesh-adaptive direct search [119] and model-based inter-

polation methods [75, 156]. The textbook by Conn et al. [47] provides an introduction

to the state-of-the-art in derivative-free methods.

In some cases, researchers opt to use gradient-free optimizers to solve problems for

which gradients are available and can be computed reliably. Common reasons cited for

using a gradient-free algorithm are that gradients are expensive to compute, particularly

when discipline analyses are expensive, and that gradient-free algorithms can avoid get-

ting stuck in a local minimum. To address the former concern, we show later in this


section how gradients can be computed for a cost similar to that of evaluating the objec-

tive and constraint functions. Regarding the latter concern, Sigmund [162] points out,

using a topology optimization example, that gradient-free algorithms can also get stuck

in local minima on hard problems while still being far more computationally expensive

than gradient-based algorithms. Furthermore, sampling techniques can be used to choose

a range of starting points and efficiently search for multiple local minima [40, 126]. If

the optimization problems of interest have a large number of variables and gradients are

available, gradient-based optimizers are the most efficient way to solve them [125, 195].

We therefore prefer to use gradient-based optimization methods where possible.

We now present a review of the options for gradient computation, or local sensitivity

analysis, applied to single-discipline and IDF problems. The most straightforward way

to compute total derivatives for a single discipline analysis is using some type of finite-

differencing procedure. In this scheme, a design variable is perturbed by a small value

and the appropriate discipline analyses are evaluated at the new point to measure the

change in the discipline state. For n design variables, each discipline analysis would need

to be evaluated an additional n times to compute all the changes in state variables. If the

discipline analyses are expensive, or the problem contains many independent variables,

the finite-difference approach is inefficient. Furthermore, subtractive cancellation errors

can cause derivative estimates to be inaccurate if the design variable perturbation is

chosen to be too small. Nevertheless, the ease of implementation of finite-differencing

means that it is still a common approach to computing derivatives.

Several other approaches, requiring more implementation time, exist to improve

both the accuracy and efficiency of the derivative estimates. The complex-step ap-

proach [132, 168] provides a twist on traditional finite-differencing by using an imaginary-

valued variable perturbation rather than a real-valued one. If the discipline analysis soft-

ware can perform complex-number arithmetic, the resulting derivative approximation is

accurate to machine precision with a sufficiently small perturbation. Nevertheless, the


complex-step approximation still requires n discipline analyses to be executed. Algorith-

mic differentiation [78, 79] also achieves machine-precision accuracy, but by differentiat-

ing the discipline analysis code line-by-line. While this approach has a high overhead,

the total cost of computing all the derivatives of the problem is no more than a small

multiple of the cost of solving the discipline analyses once [139, Chapter 8]. Of course,

algorithmic differentiation cannot be applied without direct access to the source code.

One more alternative, if the discipline analyses are of a simple enough structure, is to

calculate derivatives symbolically and hard-code them for use by the optimizer.

For expensive discipline analyses, the best approach to computing total derivatives

in general is to compute them analytically by exploiting the structure of the optimiza-

tion problem. We do so by assembling matrices of partial derivatives obtained by any

of the methods described above. The following derivation is similar to that found in

Sobieszczanski-Sobieski [164] and Martins and Hwang [130].

Consider the matrix of total derivatives of a group of design constraints C with

respect to a group of design variables x. (The derivatives of the objective function

F0 are computed in an identical manner, replacing C with F0.) To be mathematically

precise, total derivatives only define relationships between variables, so let us define the

set of variables c = C(x, Y (x)) to represent the output of the constraint functions. By

convention, we will always use the upper-case letter for the function itself, and the lower-

case letter for the variable representing the output value. For the moment, we only focus

on a single discipline, as each analysis is uncoupled in Problem (2.4). We can compute

the total derivative of a single function Ci with respect to a single variable xj as

dcidxj

=∂Ci∂xj

+M∑

k=1

∂Ci∂yk

dykdxj

=∂Ci∂xj

+∂Ci∂y

dy

dxj,

(2.6)


where M is the total number of state variables. We adopt the shorthand notation

∂C

∂x=∂(C1, ..., Cm)

∂(x1, ..., xn)∈ Rm×n

to describe the Jacobian of a set of functions with respect to a set of variables. For

example, the ∂Ci/∂y term in Equation (2.6) is a row vector of length M containing all

∂Ci/∂yk values. In a finite-differencing scheme, the dy/ dxj vector would be computed

by perturbing xj and running the discipline analysis again to obtain a new state vector

y. Now, however, we recognize that the nonlinear system of equations r = R(x, y) = 0

has been solved for xj, and no change in xj alters this fact. Linearizing this system of

equations at the solution, we can state that

dr

dxj=∂R

∂xj+∂R

∂y

dy

dxj= 0. (2.7)

Rearranging the terms in Equation (2.7) yields

dy

dxj= −

[∂R

∂y

]−1∂R

∂xj. (2.8)

By direct substitution of Equation (2.8) into Equation (2.6), we obtain

dcidxj

=∂Ci∂xj−[∂Ci∂y

] [∂R

∂y

]−1∂R

∂xj. (2.9)

Finally, we drop the subscripts on c and x to obtain an expression for the full derivative

matrix.

dc

dx=∂C

∂x−[∂C

∂y

] [∂R

∂y

]−1∂R

∂x(2.10)

We emphasize again that dc/ dx is an m×n matrix, where m is the number of functions

(design constraints, in this case) and n is the number of design variables.

The only remaining issue with equation (2.10) is how to compute the action of the


inverse matrix [∂R/∂y]−1 on its neighbours in the formula. Here, the analytic approach

splits into two variations known as the direct and adjoint sensitivity methods [130]. In

the direct method, a sequence of linear systems of the form

[∂R

∂y

]dy

dxj= − ∂R

∂xj(2.11)

is solved to yield column vectors dy/ dxj which populate the dy/ dx matrix in equa-

tion (2.6). Another way to interpret the direct method is that the matrix dc/ dx is

assembled one column at a time, since the linear system (2.11) computes the change in

the entire state with respect to one design variable. In the adjoint method, an alternative

sequence of linear systems of the form

[∂R

∂y

]T [dcidr

]T= −

[∂Ci∂y

]T(2.12)

is solved to yield row vectors dci/dr. These vectors populate the matrix dc/dr in the

expression

dc

dx=∂C

∂x+

dc

dr

∂R

∂x. (2.13)

In contrast to the direct method, the adjoint method assembles the matrix dc/ dx one row

at a time, since each adjoint solution is associated with a function rather than a variable.

In both the direct and adjoint methods, the most costly operation is the solution of the

linear systems (2.11) and (2.12). We select the direct or adjoint method for a particular

problem based on which method requires fewer linear systems to be solved to compute

the entire matrix. If the problem has a large number of design variables but only a small

number of design constraints, fewer systems of the form (2.12) need to be assembled and

solved, so the adjoint method is the natural choice.

To summarize the discussion so far, numerous approaches exist for computing deriva-

tive information for both single-discipline optimization and IDF-type MDO problems.


Higher accuracy and efficiency in the derivative computation can be achieved with extra

upfront development time. We recommend combining either the direct or adjoint method

with an appropriate method to compute the partial derivatives to compute all gradient

information accurately and efficiently.

To compute total derivatives for the MDF problem statement, we could use any of the

techniques of finite-differencing, complex-step, symbolic differentiation, or algorithmic

differentiation on their own. However, as with IDF, specialized analytic methods are

preferred. We now generalize the direct and adjoint methods described above to the case

of multidisciplinary systems.

Unlike for single-discipline systems, there are two versions of the direct and adjoint

methods for computing multidisciplinary systems. Sobieszczanski-Sobieski [164] referred

to the two versions of the direct method as GSE1 and GSE2. We refer to both a functional

form and a residual form of both the direct and adjoint methods [130]. The derivations

of these methods are omitted here, but we refer the interested reader to the survey paper

by Martins and Hwang [130] for a more complete treatment. For a system with three

disciplines, the residual direct method is given by

∂R1∂y1

∂R1∂y2

∂R1∂y3

∂R2∂y1

∂R2∂y2

∂R2∂y3

∂R3∂y1

∂R3∂y2

∂R3∂y3

dy1dxjdy2dxjdy3dxj

=

−∂R1∂xj

−∂R2∂xj

−∂R3∂xj

(2.14)

while the functional direct method is given by

I −∂Y1∂y2

−∂Y1∂y3

−∂Y2∂y1

I −∂Y2∂y3

−∂Y3∂y1

−∂Y3∂y2

I

dy1dxjdy2dxjdy3dxj

=

∂Y1∂xj∂Y2∂xj∂Y3∂xj

(2.15)


for a particular variable xj. The functional form is most useful when partial derivatives of

the governing equations themselves are not available. In that case, we revert to computing

the appropriate partial derivatives in a “black-box” fashion, e.g., by a finite-differencing

procedure. The residual adjoint method is given by

∂R1∂y1

T ∂R2∂y1

T ∂R3∂y1

T

∂R1∂y2

T ∂R2∂y2

T ∂R3∂y2

T

∂R1∂y3

T ∂R2∂y3

T ∂R3∂y3

T

dcidr1

T

dcidr2

T

dcidr3

T

=

−∂Ci∂y1

T

−∂Ci∂y2

T

−∂Ci∂y3

T

(2.16)

while the functional adjoint method is given by

I −∂Y2∂y1

T

−∂Y3∂y1

T

−∂Y1∂y2

T

I −∂Y3∂y2

T

−∂Y1∂y3

T

−∂Y2∂y3

T

I

dcidr1

T

dcidr2

T

dcidr3

T

=

∂Ci∂y1

T

∂Ci∂y2

T

∂Ci∂y3

T

(2.17)

for a particular function ci. We note that the functional and residual forms of both

the direct and adjoint methods can be combined as the availability of partial derivative

information dictates by substituting the appropriate rows into the linear system. As

with the single-discipline direct and adjoint methods, to obtain the complete set of first

derivative information we must solve either Equation (2.14) or (2.15) once for every design

variable or solve Equation (2.16) or (2.17) once for every function.

2.7 Conclusion

In this chapter we presented a unified mathematical notation to describe a general MDO

problem and showed three forms of this fundamental problem statement. We also briefly

discussed the primary ways in which we obtain accurate first-derivative information even

for cases where the discipline analyses are solved outside of the optimization process.


Obviously, solving MDO problems requires many different software components to ex-

change information at appropriate times in the solution process. While it may be easy to

understand the solution process for the problem formulations described in this chapter,

it becomes much more difficult when that solution process involves decomposition of the

MDO problem and coordination among independent parallel processes. Before discussing

more complicated architectures for MDO, in the next chapter we present a novel diagram

to help us visualize these architectures and unify architecture presentation.

Chapter 3

Visualizing MDO Architectures

In researching the existing MDO architectures, we found that each author presented the

architectures in slightly different ways. While differences in mathematical notation are

common, they are straightforward to overcome. What is more difficult to communicate

is the sequence of operations in the solution algorithm itself, and the exchange of infor-

mation between distinct software modules in the computational framework. As with any

computational method, the implementation itself is critical to the performance of the

method. However, prior to this work, there had been no standardized way to describe

the solution algorithm and flow of data within the MDO architecture. Such a standard

would be of great use as both a research and educational tool to compare the approaches

of different architectures in a common framework. A common visualization approach

for MDO also has potential as a basis for the graphical interface of MDO integration

software, such as NASA’s OpenMDAO project [76].

This chapter details our visualization approach to MDO architectures. We call our

diagrams extended design structure matrices or XDSMs. We motivate the XDSM format

and compare it with other diagrams used in the literature. We then give examples of how

to apply the XDSM format to a variety of analysis and optimization processes, including

simple MDO architectures. We refer the interested reader to Lambe and Martins [115] for

27

Chapter 3. Visualizing MDO Architectures 28

a more detailed discussion of our design process and other applications of the diagram.

3.1 Diagram Motivation

To describe an MDO architecture, a candidate diagram should be able to track both

the flow of information and the flow of the algorithm at the same time. Therefore, fol-

lowing the basic philosophy of Tufte [181], we require a diagram that is reasonably easy

to understand, yet has a high density of information. We would also like the ability to

incorporate mathematical notation directly into the diagram so that the viewer can im-

mediately connect parts of the diagram with corresponding elements in the optimization

problem formulation.

Architecture diagrams in the MDO literature usually consist of some version of a

flowchart or unstructured block diagram. These diagrams are usually sufficient to display

either data flow or process flow, but not both. Adapting and standardizing these diagrams

to MDO architectures was considered, but the number of blocks and connections required

to display complex architectures could quickly turn these diagrams into “spaghetti.” The

unified modeling language (UML) [25] has the advantages of a standard notation and a

close connection with software development. However, depicting data and process flow

would require multiple diagrams and we want to avoid using more than one diagram for

one architecture.

Instead, we chose to develop a standard diagram based on the design structure matrix

(DSM) [33, 170] or N2 diagram [117]. Figure 3.1 shows an example DSM from the liter-

ature. The main attraction of the DSM is that the diagram is structured. In particular,

the relative position of each element of the diagram contributes to the interpretation of

the whole diagram. The components of a system are arranged along the diagonal of a

square matrix and the interfaces with any component are defined in the same row or

column as that component. Therefore, the components and their connections with each


Figure 3.1: Example design structure matrix for an automobile engine from Browning[33]

other can be rapidly identified. Furthermore, the DSM defines the direction of the in-

terface. The interface from component A to component B is on the opposite side of the

diagonal from the interface from component B to component A. If we treat the compo-

nents of the DSM as components of an MDO architecture, the DSM provides a natural

way of specifying the data communication between components. The XDSM extends

this notation to include a mechanism for specifying the solution process.

3.2 The Extended Design Structure Matrix (XDSM)

We now explain the XDSM notation using a series of increasingly complex examples.

Before modeling a more complex analysis or optimization process, let us present a generic

system with which to work. Figure 3.2 shows a generic, fully-coupled, three-discipline

system. Essentially, Figure 3.2 is just a DSM of our example system. The convention in

this diagram, and all subsequent diagrams, is that the outputs of the discipline analysis

are placed in the same row, while the inputs are placed in the same column. As an

added visual cue, thick, gray lines are used to denote data flow connections. The shapes

of each block were selected based on standard flowchart notation: rectangles for generic


Analysis 1 y1 y1

y2 Analysis 2 y2

y3 y3 Analysis 3

Figure 3.2: Generic, three-discipline, fully-coupled, multidisciplinary system. Each dis-cipline analysis i shares its state yi with other disciplines and requires the states of otherdisciplines in its own analysis.

processes, and parallelograms for input and output processes. Note, however, that the

choice of shapes and colors is redundant because the structure of the DSM dictates

whether the contents of a block in the diagram are a component or an interface.

To perform a multidisciplinary analysis (MDA) of the system in Figure 3.2, we add

to the XDSM an additional component that defines the iterative process, known as a

driver, and some notation to denote the order-of-execution. In a Gauss–Seidel-type

MDA process, each discipline is analyzed in sequence using the most up-to-date state

information from the other disciplines. Figure 3.3 depicts a Gauss–Seidel-type MDA

process for our generic system. The external inputs to the system — the design variables

and an initial estimate of the state variables — are placed in the top row and the system

outputs — the final, consistent, system state variables — are placed in the left-most

column. In Figure 3.3, the order-of-execution is denoted by the step numbers in each

component block, starting from zero. Loops within the sequence are defined by the

notation p→ q, where q and p are nonnegative integers with q < p. This notation means

that the sequence returns to step q until some completion criterion is satisfied, at which

point the algorithm proceeds to step p + 1. Step numbers are also introduced in the

data blocks to specify when the data is input to a particular component. In some MDO

architectures, it is possible for the same data to be taken from multiple sources, e.g., a


ŷ x0, x1 x0, x2 x0, x3

(no data)0,4→1:MDA

1 : ŷ2, ŷ3 2 : ŷ3

y1 4 : y11:

Analysis 12 : y1 3 : y1

y2 4 : y22:

Analysis 23 : y2

y3 4 : y33:

Analysis 3

Figure 3.3: Gauss–Seidel MDA procedure. Each discipline analysis is evaluated in se-quence using the most recent state information from other disciplines and a fixed choiceof design variables. The MDA block measures convergence of the discipline states.

discipline analysis and a surrogate model of that analysis, and this notation accounts for

that case. Note, however, that these numbers are not present in the external inputs and

outputs of the diagram because these data are fixed, respectively, at the beginning and

end of the process. Finally, for clarity, we use a different block shape to denote drivers

and thin black lines to connect consecutive components in the algorithm.

We can also depict parallel processes using the XDSM notation. As a simple exam-

ple, we will use a Jacobi MDA process. Unlike the Gauss–Seidel process, the Jacobi

MDA process forces each discipline to conduct their analyses using state information

from other disciplines from the previous iteration. While the state information used by

each discipline may be less up-to-date, it allows all discipline analyses to be executed in

parallel. Figure 3.4 shows the XDSM of the Jacobi MDA process. Note that processes

executed in parallel are all assigned the same step number. As a further simplification

to the diagram, we can “stack” similar parallel components to save space in a large sys-

tem. Figure 3.5 shows an example of this simplification. We adopt the convention that a

reference to component i implies a repeated structure across all disciplines. The stacked


ŷ x0, x1 x0, x2 x0, x3


1 : ŷ2, ŷ3 1 : ŷ1, ŷ3 1 : ŷ1, ŷ2

y1 2 : y11:

Analysis 1

y2 2 : y21:

Analysis 2

y3 2 : y31:

Analysis 3

Figure 3.4: Jacobi MDA procedure with parallel execution of discipline analyses. Thesystem being analyzed is identical to that in Figure 3.3.

analysis blocks in Figure 3.5 provide an added visual cue.

Displaying gradient-based optimization processes in an XDSM is easy to do with the

tools described above. We just need to define components for the optimizer itself and for

the function and gradient evaluation processes. Figure 3.6 shows this optimization pro-

cess. In this case, we have assumed that the gradients can be computed analytically given

only the current design point. If the gradients were computed by finite differencing, that

could also be depicted using a driver that would repeatedly call the objective and con-

straint functions to compute the derivatives. Finally, our choice of splitting the objective

and constraint components is arbitrary. Using a single “functions” component or even

a separate component for each function individually would be equally valid, provided

the optimizer is supplied with the same information. In practice, the grouping of com-

ponents that compute function and gradient information depends on the computational

environment.

As a final thought, we expect the XDSM to be applicable to a much wider range of

procedures than the examples presented in this thesis. Each component in the XDSM is

simply a computational element that produces an output based on some input. Drivers


ŷ x0, xi


1 : ŷj 6=i

yi 2 : yi1:

Analysis i

Figure 3.5: Jacobi MDA procedure with parallel execution of discipline analyses usingour convention for parallel diagram structure. The MDA process shown here is identicalto that shown in Fig. 3.4.

x(0)

x∗0,2→1:

Optimization1 : x 1 : x 1 : x

2 : f1:

Objective

2 : c1:

Constraints

2 : df/dx, dc/dx1:

Gradients

Figure 3.6: Optimization algorithm where the optimizer requires gradients of both theobjective and the constraints. The gradients are calculated by a separate component.


x(0), y(0)

x∗, y∗0,2→1:

Optimization1 : x, y 1 : x0, xi, y

2 : f0, c

1:Functions

F0(x0, y), C(x, y)

2 : ri

1:Residual iRi(x0, xi, yi)

Figure 3.7: XDSM for the SAND architecture. The locations in which the functions ofProblem (2.3) are evaluated are noted in the diagram.

are just components that perform additional computations to check a looping condition.

These components are then assembled into a matrix and connected together through

paths defined by the algorithm. We expect the nature of each process to determine what

components need to be defined and how they can be related back to the mathematical

statements of the problem and algorithm.

3.3 Monolithic MDO Architectures

Using the figures of Section 3.2 as a guide, we are now ready to depict full MDO archi-

tectures using the XDSM notation. Recall that an MDO architecture is defined by the

problem formulation together with the solution algorithm. This section details monolithic

architectures, those architectures that are based around a single optimization problem

statement. In particular, the three fundamental MDO problem formulations discussed

in Chapter 2 form the basis for three fundamental MDO architectures.

Figure 3.7 presents the SAND architecture. This architecture corresponds to the

SAND problem formulation in Equation (2.3). The SAND architecture contains compo-


x(0), ŷ(0)

x∗0,3→1:

Optimization1 : x0, xi, ŷj 6=i 2 : x, ŷ

y∗i

1:Analysis i

Yi(x0, xi, ŷj 6=i)2 : yi

3 : f0, c, cc

2:Functions

F0(x, y), C(x, y),Cc(ŷ, y)

Figure 3.8: XDSM for the IDF architecture. The locations in which the functions ofProblem (2.4) are evaluated are noted in the diagram.

nents that, rather than computing the state variables, evaluate the governing equation

residual values. Since these computations can be made independently from evaluating

design objectives and constraints, Figure 3.7 shows them being evaluated in parallel. If

the optimize

A Matrix-Free Algorithm for Multidisciplinary Design …...Abstract A Matrix-Free Algorithm for...

Documents

Transcript of A Matrix-Free Algorithm for Multidisciplinary Design …...Abstract A Matrix-Free Algorithm for...